An atlas of seabed biodiversity for Aotearoa New Zealand

The waters of Aotearoa New Zealand span over 4.2 million km 2 of the South Pacific Ocean and harbour a rich diversity of seafloor associated taxa. Due to the immensity and remoteness of the area, there are significant gaps in the availability of data to quantify and map the distribution of seafloor and demersal biodiversity, limiting effective management. In this study, we describe the development and accessibility of an online atlas of seabed biodiversity that aims to fill these gaps. Species 30 distribution models were developed for 579 taxa across four taxonomic groups: demersal fish, reef fish, subtidal invertebrates and macroalgae. Spatial layers for taxa distribution based on habitat suitability were statistically validated and then, as a further check, evaluated by taxonomic experts to provide measures of confidence to guide the future use of these layers. Spatially explicit uncertainty (SD) layers were also developed for each taxon distribution. We generated layer-specific metadata, including statistical and expert evaluation scores, which were uploaded alongside the accompanying spatial layers to the open 35 access database Zenodo. This database provides the most comprehensive source of information on the distribution of seafloor taxa for Aotearoa New Zealand and is thus an invaluable resource for managers, researchers and the public that will guide the management and conservation of seafloor communities.


Introduction
The identification of priority areas for resource management and conservation biology is often hampered by a lack of 40 comprehensive knowledge of biodiversity patterns (Ferrier et al., 2007;Arponen et al., 2008;Hortal et al., 2015).
Unfortunately, producing quantitative descriptions of large-scale biodiversity patterns is challenging, particularly in offshore marine environments, where sampling is both difficult and expensive, resulting in patchy and/or irregular coverage Thomson et al., 2014). Species distribution models (SDMs), which are correlative models that predict the occurrence of species in relation to environmental variables, can provide estimates of biodiversity patterns across broad spatial 45 scales where data are often sparse. Spatial predictions of species' distributions from SDMs have become an important part of resource management and conservation biology (Moilanen et al., 2011).
The development of SDMs has been further advanced by increasing accessibility to biological records from open-access data repositories such as OBIS (https://obis.org/) and GBIF (https://www.gbif.org/), or journals that specialise in providing descriptions of, and access to, datasets (e.g., Earth System Science Data). In addition, the spatially explicit environmental 50 variables that are used to predict species' distributions from the SDMs are also more accessible and at increasingly finer spatial resolutions (i.e., smaller grid cells size), e.g., Bio-ORACLE (Assis et al., 2018). Finally, the development of flexible machine learning methods (e.g., Boosted Regression Tree , Random Forest (Breiman, 2001), Maxent , among others) and advances in computing power means it is possible to automate the predictions of a large number of species distributions with less input required from ecologists or statisticians for model parameterisation. Species distribution 55 layers are therefore becoming more available to stakeholders and decision makers, thus facilitating their use in spatial management and systematic conservation planning.
Geographic information systems (GIS) provide a tool to store, visualise and explore relationships between spatial information describing taxa distributions (Hamylton, 2017). Increasingly GIS are available online (as online atlases, e.g., (Asaad et al., 2019), providing a platform to explore and download data without the requirement to have access to traditional desktop-based 60 GIS. Online atlases exist for several coastal regions, including Ireland's Marine Atlas (https://atlas.marine.ie/), the Oregon Coastal Atlas (https://www.coastalatlas.net/), the European Atlas of the Seas ( Barale et al., 2015)  access atlas of seabed biodiversity for the seas around Aotearoa New Zealand was developed. Aotearoa New Zealand's Territorial Sea (TS) and Exclusive Economic Zone (EEZ) extends over 4.2 million km 2 of the South Pacific Ocean (≈25 -57°S; 162°E -172°W) and contains high biological diversity, reflecting its wide latitudinal gradient (subtropical to subantarctic), range of water depths (intertidal to deep ocean trenches), and zones of high productivity (Costello et al., 2009, Gordon et al., 2010. In addition, the relatively large distance between Aotearoa New Zealand and other major land masses 70 has resulted in a high degree of species endemism (i.e., a high number of species are not found elsewhere on the planet) (Gordon et al., 2010). Such high biodiversity values mean that it is critical that spatial information of biodiversity is available for effective spatial management and conservation planning (Costello et al., 2017). Here, we provide an overview of the development and availability of spatial predictions for 579 taxa across four biotic groups (demersal fish, reef fish, subtidal invertebrates and macroalgae), their respective estimates of uncertainty, and the online atlas created to download these data. 75 In addition, we provide extensive metadata which includes information on the robustness of the predictions based on quantitative assessments (i.e., how closely the model predictions match taxa records) and expert assessment. Together, these assessments provide valuable information for guiding the appropriate use of biodiversity information accessed from the atlas of seabed biodiversity of Aotearoa New Zealand.

Methods 80
This atlas was developed to provide geo-referenced seafloor biodiversity information, and their associated uncertainty, for the waters surrounding Aotearoa New Zealand. The process began with the pooling of datasets on taxa occurrence and highresolution environmental variables that can be used to develop SDMs ( Figure 1). The methods used to produce the taxon distribution layers followed well documented procedures and are provided in the supplementary materials (supplementary materials 1). Briefly, spatial distributions were generated for each taxon from four biotic groups: demersal fish (n taxa = 235, 85 bottom dwelling fish occurring on, or near to, the seafloor throughout the study area), reef fish (n taxa = 51, reef associated fish, occurring on coastal rocky reefs), subtidal invertebrates (n taxa = 207, bottom dwelling or benthic invertebrates occurring throughout the study area) and macroalgae (n taxa = 86, occurring in the coastal margin within the photic zone) using ensemble SDM methods. These ensembles used the combined outputs from flexible machine learning Boosted Regression Tree (BRT) and Random Forest (RF) models as in Stephenson et al. (2021a). 90 Spatial predictions and associated uncertainty estimates were generated with differing geographical extents for the biotic groups to avoid predicting into completely unsampled parts of the study area where such extrapolation would be highly uncertain ( Figure 2). Demersal fish and subtidal invertebrate taxa were predicted from the coastline to depths of 2000 m at a 1 km grid resolution (Figure 2), whereas macroalgae and reef fish taxa were predicted to subtidal reef habitats only (defined by New Zealand Department of Conservation (DOC) national rocky reef layer) at a 250 m grid resolution. Macroalgae were 95 predicted to depths shallower than 40 m while reef fish were predicted to the full depth range of reef habitat (see supplementary materials 1 for further details). Spatial predictions were represented by a habitat suitability index (HSI) -the relative suitability of the environmental envelope for each taxon ranging from 0 to 1, with higher values indicating a more suitable environment.
To aid interpretation of HSI values we followed the subjectively defined categorization by Georgian et al. (2019): < 0.4 = low suitability, 0.4 -0.8 = moderate suitability, and > 0.8 = high suitability. Spatially explicit measures of uncertainty for each 100 taxon were represented by the standard deviation of the mean HSI (SD).

Figure 1
Infographic illustrating the development of the atlas described in this study. Presence and absence (p/a) records for four taxonomic groups were combined with over 20 spatially explicit environmental variables and used to run ensembled SDMs of Boosted Regression 105 Trees and Random Forest models. The number of unique spatial occurrences available for modelling ranged between 50-13926 unique spatial occurrences for demersal fish (collected from 1979 -2016), 36-339 unique spatial occurrences for reef fish (collected from 1986 -2014), 70-10804 unique spatial occurrences for subtidal invertebrate (collected between 1896 -2019) and 50-422 unique spatial occurrences for macroalgae taxa (collected between 1896 -2018). The models were statistically validated using best practice procedures and evaluated by taxonomic experts to further appraise model accuracy. These assessments were incorporated within the metadata of each layer and 110 uploaded, along with the layers themselves, to the Zenodo data portal. World imagery basemap utilised on inset (ESRI 2022). An important component of being able to appropriately use the predicted distributions for management or future scientific investigations is to estimate how well the models perform. That is, if a model is assessed as performing very poorly, it may be best to use the associated prediction wirandth caution or not at all. Historically, this model assessment was derived from 115 statistical model fit metrics (i.e., assessing how well the model predicts to the available biological data). However, these measures can sometimes result in over-confidence in the model results (Bowden et al., 2021). Therefore, for each taxon, two measures of model performance were provided in the metadata ( Figure 1): firstly, model fit metrics using standard statistical validation; and secondly an expert evaluation (see supplementary materials 1 for methods and further information). Combined, these assessments provide complementary information that highlights limitations and inaccuracies in the predicted taxa layers. 120 Statistical model fits provided in the atlas include model evaluation scores (True Skill Statistic score) and the number of unique spatial occurrences to develop the modelled distribution for each taxon. The True Skill Statistic provides an index ranging from -1 to +1, where +1 equals perfect agreement and -1 = no better than random. Taxa records were aggregated spatially to a 1 km grid resolution representing unique spatial occurrences. The number of unique spatial occurrences varied from 50-13926 for demersal fish, 36-339 for reef fish, 70-10804 for subtidal invertebrates, and 50-422 for macroalgae, with a minimum of 30 125 unique spatial occurrences required for modelling. Predicted distributions were also evaluated by taxonomists and ecologists with substantial expertise in the distribution and abundance of the various taxonomic groupings. The expert assessment focussed on the congruence between predicted taxa distribution and expert view of taxa distribution, with scores ranging from 1 (Very accurate -the predicted distribution reflects expert view of taxa distribution (> 80% overlap)) to 5 (Inaccuratethe predicted distribution does not match the expert's view of the taxa distribution (i.e., < 20% agreement)). Taxon layers with 130 expert evaluation scores of 4 or 5 (n=17) are likely to contain substantial inaccuracies (irrespective of the statistical model fits) and should therefore only be used with caution (this is highlighted in the metadata field "use limitations") (as an illustration, see examples of where model performance statistics and expert assessment agree and disagree in the supplementary materials 1).
Metadata were developed for each layer following standard metadata protocols (ISO 19139 Geographic Information -Metadata 135 -Implementation Specification, 2007)see the supplementary materials 1 for an overview and description of the metadata categories and an example for the demersal fish Chrysophrys auratus (Australasian snapper) (Figure 2). Metadata for all taxa available in the atlas of seabed biodiversity for Aotearoa New Zealand is available to download with this publication ( Figure   3) and includes information on both the statistical model validation and expert evaluation for each layer in the atlas.

Data availability 150
The atlas of seabed biodiversity for Aotearoa New Zealand is freely accessible via the open-access database Zenodo (Stephenson et al., 2022; https://doi.org/10.5281/zenodo.7083642). The atlas is represented by eight .zip folders (Table 1).
Four folders of spatial predictions from the SDMs of each of four taxonomic groups (demersal fish, reef fish, subtidal invertebrates and macroalgae). A further four .zip folders contain the associated uncertainty estimates for each SDM for each 155 of the four taxonomic groups. Within each folder, a GeoTiff raster file contains the associated information for each taxa, and the contents of each folder can be viewed using the 'preview' function or downloaded using the 'download' function ( Figure   3).
A further component of the atlas is a dataset containing the associated metadata for each spatial layer. Metadata is provided as an excel file with a unique tab for each of the four taxonomic groups (Figure 3). The metadata contains a wealth of information 160 on the development and use of each layer within the atlas including statistical validation and expert appraisal scores (see supplementary Materials 1).
An additional and interactive source for the atlas is via an online GIS database which can be assessed through a web frontpage: the DOC Marine Data Portal (https://doc-marine-data-deptconservation.hub.arcgis.com). This source, administered by 165 the NZ Department of Conservation, allows users to plot, interrogate and download individual layers of the atlas (see supplementary materials 2).  The front end of the Zenodo database for the atlas of seabed biodiversity for Aotearoa New Zealand. Also illustrated are the atlas's eight data folders and metadata file as viewed on Zenodo. Image created under standard terms and conditions © Zenodo. Searching and downloading specific taxa (rather than the whole dataset) can be undertaken from the NZ Department of Conservation's online geoportal available at: https://doc-marine-data-deptconservation.hub.arcgis.com/

Discussion
The atlas of seabed biodiversity for Aotearoa New Zealand represents the most comprehensive, freely available biodiversity 180 information for the area to date. The atlas is composed of predicted taxon distributions, associated uncertainty estimates and comprehensive metadata for 579 taxa. This resource provides managers and decision makers with tools to address environmental issues faced by subtidal ecosystems. The tools provide a detailed understanding of hotspots for key taxa (e.g., those with important functional roles, threatened taxa), as well as biodiversity (e.g., species richness) more generally. It should be noted, however, that the models used to develop taxa predictions are based on long-term correlations between taxa 185 occurrence and environmental variables and do not consider temporal variability (e.g., seasonal or decadal variation) in environmental conditions or anthropogenic stressors that may influence biodiversity patterns. Thus, the models provide a temporally smoothed indication of the distribution of seafloor taxa at a broad scale; a foundation that may be built upon by future modelling approaches that incorporate temporally dynamic variables and data on the distribution of stressors. to support ecosystem-based management. These data are particularly relevant to marine spatial planning approaches which are useful tools for marine ecosystem-based management (Peart, 2019;Domínguez-Tejo et al., 2016). Such analyses allow the optimisation of spatial scenarios for the conservation of biodiversity (e.g., MPA planning (Asaad et al., 2018;Zhao et al., 2020)) and for resource-use/biodiversity trade-offs (e.g., siting fishing corridors/aquaculture areas; Henriques et al., 2017)). Spatial planning using decision support tools requires spatial layers (typically raster datasets) on the 200 distribution of biodiversity values within a specified study area. Tools including Zonation (Moilanen et al., 2009) andMarxan (Ball et al., 2009) are then used to perform spatial prioritisations of a seascape to determine the best locations for management interventions that align with management objectives while considering the distribution of biodiversity values associated with each input layer. Thus, the layers provided by the atlas in this study represent key inputs for marine spatial planning. Users may construct spatial prioritisations using a broad suite of input layers to determine high priority locations for biodiversity 205 generally or select a subset of layers to represent, for example, key ecosystem functions (e.g., seafloor biogenic habitats), threatened taxa, or those of cultural/recreational importance.

Metadata
An important component of the atlas is the provision of detailed metadata to guide the use of each layer. Users of the atlas can consult layer-specific metadata to make informed decisions on the applicability of a given layer for a certain analysis. The 210 most critical pieces of information in the metadata are the expert-derived evaluation score and the statistical model evaluation score, with the former being used to inform the 'use limitations' field of the metadata. These two model evaluation scores provide complementary information for a particular layer that allows users to establish decision rules around which layers are deemed suitable for further analyses. For example, if users are interested in constructing analyses to determine the best locations for the management of taxa/taxa groups, they may choose to select layers that have high statistical (Table S3) and 215 expert derived (Table S6) evaluation scores (i.e., scores < 2), with only those layers being used in further analysis.
Alternatively, users may opt to use each layer in the atlas but weight their contribution to a spatial planning scenario by one or both evaluation scores. The subsequent weighting value can be applied to each layer to scale the HSI values according to the expert derived and/or statistical evaluation values (e.g., Stephenson et al., 2021b). Within the Zonation and Marxan decision support tools, differential weighting of data layers is achieved by providing the software with the weighting values associated 220 with each layer while coding a prioritisation. These tools then scale the HSI values automatically as part of the spatial prioritisation. If users wish to perform basic prioritisation 'outside' of decision support tools, they may scale the HSI values manually using GIS or statistical software (e.g., R).
Metadata fields on the spatial resolution and the prediction domain for each taxon provide additional useful information to 225 guide database users. The spatial resolution (in km 2 ) describes the size of the grid cells for each raster layer and will be either 1 km 2 (for demersal fish and subtidal invertebrates) or 0.06km 2 (250x250m) for reef fish and macroalgae. The prediction domain describes the extent (i.e., number of rows and columns) and distribution of cells that contain some information for a given taxa (e.g., all cells with HSI values). As mentioned above, the prediction domain is clipped to all areas within the EEZ and TS shallower than 2000m for demersal fish and subtidal invertebrates. For reef fish and macroalgae, the domain is further 230 clipped to a 'rocky reef' layer. Users should be aware that the rocky reef layer has known inaccuracies, and that many reef fish and macroalgae are not reef-obligates (i.e., also found over soft sediment habitats).
Each SDM in this database has an associated layer for spatially explicit uncertainty which holds important information for the utility of the SDMs. Due to limited sampling and/or uncertain taxa-environmental relationships, some regions within a 235 predicted distribution will have higher uncertainty than others (Beale and Lennon, 2012;Elith and Leathwick, 2009). Spatial planning approaches typically account for the spatial variability in uncertainty by including uncertainty layers within prioritisations that effectively discount the biodiversity values of cells by their associated uncertainty value, thus reducing contribution of areas with higher uncertainty. A similar approach may be achieved outside of decision support tools by subtracting the uncertainty layers from their associated SDM layer. The uncertainty layers, however, provide useful 240 information on where additional sampling may be needed to improve our understanding of the distribution of individual taxa and hence their inclusion in future management planning.

Updating of database
The development of the SDMs that underpin this database incorporated the best available information on seafloor taxa at the 245 time of this publication; however, substantial gaps in our knowledge of some taxa and in some areas of Aotearoa New Zealand's large EEZ still exist. As such, it is anticipated that the layers in this database will be iteratively updated when new information on the distribution of seafloor associated taxa become available (e.g., new taxa records, Figure 1). Surveys of deeper (> 2000 m) and remote areas of the EEZ are ongoing and will yield valuable new information on the presence and absence of seafloor taxa that can be used to update the SDMs. Substantial gaps in occurrence data are also apparent in shallow 250 (<200m) coastal habitat that should be addressed in future surveys. It is recommended that a review of the availability of new presence records for seafloor taxa be undertaken every five years, where an additional 50 presence records triggers the updating of a particular SDM. Further sampling can also address areas of model uncertainty and will also allow an expansion of this database as it is anticipated that new SDMs will be developed for taxa not currently included in this database (due to having less than 50 presence locations). 255 Even with new data, presence-absence analyses have major limitations where managers might want to prioritise areas of high biomass or high overall biodiversity. Standardised data on taxa abundance should be recorded, which would allow the generation of SDMs that predict the distribution the relative abundance/density that often provide a more accurate representation of biodiversity patterns (Howard et al., 2014;Young et al., 2015). It is anticipated that the database described in this publication will provide the framework and repository for a diverse suite of information on seafloor biodiversity for 260 Aotearoa New Zealand that can be used to inform management of the marine environment. Furthermore, future iterations of databases may also seek to include programmatic API access to commonly used statistical software (e.g, R statistical software) to facilitate use from biodiversity scientists.

Acknowledgements
We acknowledge the contributions of data and advice from many sources including: the National Institute of Water and Atmospheric Research (NIWA) for collections data and spatial environmental variable layers; Auckland Museum Tāmaki 290 Paenga Hira and National Museum of New Zealand Te Papa Tongarewa for providing collections data; the Department of Conservation for spatial data layers and Fisheries New Zealand for the research trawl data. We acknowledge the contribution of taxonomic expertise from a very large number of national and international colleagues, too many to individually name, that have provided decades of knowledge and the species names and identifications that we rely on for this work.