The CoralHydro2k database: a global, actively curated compilation of coral δ 18 O and Sr / Ca proxy records of tropical ocean hydrology and temperature for the Common Era

. The response of the hydrological cycle to anthropogenic climate change, especially across the tropical oceans, remains poorly understood due to the scarcity of long instrumental temperature and hydrological records. Massive shallow-water corals are ideally suited to reconstructing past oceanic variability as they are widely distributed across the tropics, rapidly deposit calcium carbonate skeletons that continuously record ambient environmental conditions, and can be sampled at monthly to annual resolution. Climate reconstructions based on corals primarily use the stable oxygen isotope composition ( δ 18 O), which acts as a proxy for sea surface temperature (SST), and the oxygen isotope composition of seawater ( δ 18 O sw ), a measure of hydrological variability. Increasingly, coral δ 18 O time series are paired with time series of strontium-to-calcium ratios (Sr / Ca), a proxy for SST, from the same coral to quantify temperature and δ 18 O sw variability through time. To increase the utility of such reconstructions, we present the CoralHydro2k database, a compilation of published, peer-reviewed coral Sr / Ca and δ 18 O records from the Common Era (CE). The database contains 54 paired Sr / Ca– δ 18 O records and 125 unpaired Sr / Ca or δ 18 O records, with 88 % of these records providing data coverage from 1800 CE to the present. A quality-controlled set of metadata with standardized vocabulary and units accompanies each record, informing the use of the database. The CoralHydro2k database tracks large-scale temperature and hydrological variability. As such, it is well-suited for investigations of past climate variability, comparisons with climate model simulations including isotope-enabled models, and application in paleodata-assimilation projects. The CoralHydro2k database is available in Linked Paleo Data (LiPD) format with serializations in MATLAB, R, and Python and can be downloaded from the NOAA National Center for Environmental Information’s Paleoclimate Data Archive at https://doi.org


Introduction
The global hydrological cycle is changing in response to ongoing anthropogenic climate change (Held and Soden, 2006;Cheng et al., 2020), yet regional trends in hydrology remain uncertain in many areas of the world (Song et al., 2021;Madakumbura et al., 2021;Ummenhofer et al., 2021).Observed and projected trends in large-scale hydrology are consistent with the "wet get wetter, dry get drier" paradigm (Held and Soden, 2006) as surface ocean fluxes increase as the planet warms.Rising global temperatures mean that the atmosphere can hold more moisture, which contributes to more extreme rainfall across a variety of spatiotemporal scales.In the tropics, many aspects of large-scale hydrology are tied to changes in large-scale coupled ocean-atmosphere dynamics associated with the El Niño-Southern Oscillation (ENSO; Power et al., 2013;Cai et al., 2014), tropical Pacific decadal variability (Gu and Adler, 2013;Dong and Dai, 2015), the Indian Ocean Dipole (Webster et al., 1999;Saji et al., 1999;Cai et al., 2019), and Atlantic Multidecadal Variability (Zhang et al., 2019), to name a few of the most prominent modes.
The detection of potential anthropogenic trends in regional hydrology against a rich background of natural regional hydrological variability is complicated by a dearth of instrumental climate data from across the tropics.In particular, instrumental sea surface temperature (SST) observations are sparse prior to the advent of satellites in 1979 (Reynolds et al., 2002;Rayner et al., 2003;Freeman et al., 2017;Huang et al., 2017;Kennedy et al., 2019), and the vast majority of sea surface salinity (SSS) observations only became available in the 1990s with the advent of the Global Tropical Moored Buoy array (McPhaden et al., 1998(McPhaden et al., , 2010) ) and the World Ocean Circulation Experiment (WOCE) (Good et al., 2013;Friedman et al., 2017;Cheng et al., 2020;Gould and Cunningham, 2021).Both natural and anthropogenic shifts in regional hydroclimate on interannual to multidecadal timescales have profound impacts on societies, economies, and ecosystems, such that resolving regional trends in past hydrological variability prior to available observational records is a scientific and societal priority.
Shallow-water corals have been extensively used to reconstruct past regional-to oceanic-scale climate variability at data-scarce locations in the tropical and subtropical oceans (as reviewed by Gagan et al., 2000;Corrège, 2006;Lough, 2010;Felis, 2020).Seasonally banded coral skeletons (e.g., Lough and Barnes, 1997) can yield monthly to annually resolved proxy records that can be calibrated to instrumental climate observations and thus used to extend the relatively short instrumental SST and SSS records back to the preinstrumental era.Most coral-based reconstructions are based on the oxygen isotopic composition (δ 18 O) and/or strontiumto-calcium ratios (Sr/Ca) of coral skeletal aragonite.Coral δ 18 O tracks changes in SST and the oxygen isotopic composition of seawater (δ 18 O sw ) (Epstein et al., 1953;Weber and Woodhead, 1972).Like salinity, variability in δ 18 O sw reflects the balance of precipitation and evaporation, terrestrial runoff, continental ice melt and formation, and ocean circulation and mixing (e.g., LeGrande andSchmidt, 2006, 2011;Hasson et al., 2013;Conroy et al., 2014).Coral Sr/Ca primarily tracks SST variability (Weber, 1973;Smith et al., 1979;Beck et al., 1992) and can be used to decouple the temperature and δ 18 O sw signals in coral δ 18 O records (e.g., Gagan et al., 1998;Ren et al., 2003;Corrège, 2006;Cahyarini et al., 2008).As such, paired coral δ 18 O and Sr/Ca records can be used to independently investigate trends in SST and hydrology (Hendy et al., 2002;Linsley et al., 2006;Quinn et al., 2006;Zinke et al., 2008;Felis et al., 2009Felis et al., , 2018;;Hetzinger et al., 2010;Nurhati et al., 2011;Cahyarini et al., 2014;Wu et al., 2014;Murty et al., 2017Murty et al., , 2018b;;Hennekam et al., 2018;von Reumont et al., 2018;Pfeiffer et al., 2019;Ramos et al., 2019Ramos et al., , 2020;;Sayani et al., 2019).Coral-based reconstructions have provided much-needed insights into local SST and SSS at many tropical sites; however, the utility of this archive in reconstructing regional-and global-scale signals has been limited by the scarcity of long-term paired coral δ 18 O and Sr/Ca records and the methodological challenges of deriving seawater δ 18 O changes from these records.
Recent data-synthesis efforts within the international paleoclimate community, under the auspices of the Past Global Changes (PAGES) 2k network, have produced several databases to contextualize modern climate change against the background of natural climate variability over the last ∼ 2000 years, a time interval known as the Common Era (CE) (e.g., PAGES 2k Consortium, 2013; Tierney et al., 2015;PAGES2k Consortium, 2017;Atsawawaranunt et al., 2018;Konecky et al., 2020;Comas-Bru et al., 2020).These data sets, combined with climate simulations, have been instrumental in improving our understanding of CE climate variability and its dynamics (e.g., Abram et al., 2016;Neukom et al., 2019;PAGES 2k Consortium, 2019).Notably, the PAGES Ocean2k project compiled a network of published coral δ 18 O, Sr/Ca, and extension-rate records to reconstruct tropical SST evolution over the past few centuries (Tierney et al., 2015).More recently, the PAGES Iso2k project compiled water isotope records from a variety of terrestrial and marine archives (Konecky et al., 2020), including corals, to investigate temperature-driven changes in the global hydrological cycle (Konecky et al., 2023).Building on these previous efforts, the CoralHydro2k project brought the global coral paleoclimate community together to address existing data-archiving needs and access issues as well as the lack of standardized, best-practice methodology for calibrating coral proxies to climate variables and deriving δ 18 O sw changes from paired δ 18 O and Sr/Ca records.
Here we present the PAGES CoralHydro2k database, a new, actively curated compilation of coral δ 18 O and Sr/Ca records from the last 2000 years that serve as proxies for near-surface conditions across the tropical and subtropical oceans.This new database employs metadata standards established by the Marine Annually Resolved Proxy Archives (MARPA; Dassié et al., 2017) and the Paleoclimate Community reporTing Standard (PaCTS 1.0; Khider et al., 2019) and is built using the Linked Paleo Data (LiPD) framework (McKay and Emile-Geay, 2016).This first paper from the CoralHydro2k project outlines this new database, its functionality, and plans for active curation of records and future updates.As this database represents the most comprehensive collection of coral records to date, we highlight the existing spatiotemporal coverage and identify opportunities for future data collection.

Collaborative model
CoralHydro2k was one of nine projects that made up Phase 3 of the PAGES 2k network, a long-standing effort to study climate variability over the last 2000 years (PAGES 2k Network Coordinators, 2017), and continues into Phase 4 of the working group.The CoralHydro2k project was established at the 2017 PAGES Open Science Meeting in Zaragoza, Spain, inspired by the PAGES Hydro2k Workshop in 2016 (PAGES Hydro2k Consortium, 2017).Recurring calls for participation were distributed within the international paleoclimate community to recruit a team with diverse expertise ranging from coral paleothermometry to paleodata assimilation.The resulting CoralHydro2k community is composed of more than 40 volunteer scientists from all academic levels, including undergraduate and graduate students, postdoctoral researchers, and early to senior-level scientists from a variety of international academic and research institutions.Data compilation, initial analysis, and interpretation were done collaboratively and subdivided among thematic working groups as the project progressed.The majority of the work was completed remotely and asynchronously across several virtual platforms (Google Suite, Slack, and Zoom).One in-person meeting with limited remote participation took place in 2019 as a side meeting at the 13th International Conference on Paleoceanography (ICP13) in Sydney, Australia (Hargreaves et al., 2020).

Record selection and aggregation
Record-selection criteria for the CoralHydro2k database were designed to be as inclusive and comprehensive as possible to develop a versatile database that supports the project's goal of reconstructing tropical hydroclimatic variability on seasonal and longer timescales.The database also supports the broader climate community's need for a uniform global database of coral records for comparison with climate model output over the past 2000 years, especially isotope-enabled models.The CoralHydro2k team selected Common Era coral records that were at least 10 years in length, measured either δ 18 O, Sr/Ca, or both, were published in a peer-reviewed scientific journal, and were archived with an absolute chronology (i.e., time in years CE).For studies where "composite records" or average time series of multiple cores from a sinhttps://doi.org/10.5194/essd-15-2081-2023 Earth Syst.Sci.Data, 15, 2081Data, 15, -2116Data, 15, , 2023 gle site were publicly available, we included either the composite record or its constituent time series but not both.Composite records are flagged as such in the database.Coral records were sourced from past PAGES 2k data compilations with more restrictive selection criteria, such as Ocean2k (Tierney et al., 2015) and Iso2k (Konecky et al., 2020), and from public repositories such as the World Data Center PANGAEA (https://www.pangaea.de/,last access: 19 May 2022) and the NOAA National Centers for Environmental Information (NCEI) World Data Service for Paleoclimatology (https://www.ncei.noaa.gov/products/paleoclimatology, last access: 19 May 2022).For a few studies where data were not archived in public repositories, we retrieved the records from publications and supplemental information or contacted the corresponding authors.In addition to being compiled in the CoralHydro2k database, 27 previously unarchived records were submitted to the NOAA's NCEI database for archival by CoralHydro2k project members.

Database organization
Coral records in the database are organized into seven groups based on the availability of paired proxy time series, temporal coverage, and record resolution (Table 1).Groups 1-3 contain records with paired Sr/Ca-δ 18 O time series.Group 1 records have monthly to bimonthly temporal resolution and cover at least 80 % of the 20th century.Records in Group 2 are similar in resolution to records in Group 1 but cover less than 80 % of the 20th century.Group 3 records contain any paired Sr/Ca-δ 18 O time series that have lower than bimonthly resolution.Group 4 records are δ 18 O-only time series with monthly to bimonthly resolution, while Group 5 records are δ 18 O-only time series with lower than bimonthly resolution.Groups 6 and 7 mirror Groups 4 and 5, respectively, but for Sr/Ca-only records.
Following the Iso2k database protocols (Konecky et al., 2020), each record in the CoralHydro2k database is assigned a unique nine-digit alphanumeric identifier.These unique identifiers are generated using the first two letters of the lead author surname (AN), the last two digits of the publication year (01), a three-letter code indicating the location of the record (ABC), and a two-digit core-ID number (02).The two-digit core-ID number begins at "01" by default and increases with each successive record from the same site and publication.Identifiers have the final format "AN01ABC02".For example, record AB08MEN01 was published by Abram et al. (2008), is a record from the Mentawai Islands, and is the first core from that study.

Metadata
The CoralHydro2k database contains 55 metadata fields that inform the use of each coral record: 32 metadata fields are standardized and quality-controlled, while 23 fields are un-structured.Standardized metadata fields use controlled vocabulary or numeric information with uniform units, making them easily searchable by database users.Unstructured metadata are free-form text entries that are less rigorously qualitycontrolled but are included to aid the interpretation of the coral records.The names of standardized metadata fields are italicized in Tables 2 to 6, while the names of unstructured metadata fields are not.
Metadata included in the CoralHydro2k database are organized into four categories (entity, publication, analysis, and calibration) based on standards recommended by MARPA (Dassié et al., 2017) and PaCTS1.0(Khider et al., 2019).Entity metadata provide identifying information for each coral record (Table 2), including geographic coordinates, location names, water depth of the coral colony, coral species, and any core names included in the original publications.Also included in entity metadata are resolution information and the start and end years of each record.Record resolution is provided as the minimum, maximum, mean, and median data points per year for each record.A nominal label for resolution (monthly, bimonthly, quarterly, biannual, annual, or > annual, described in Table 3), based on the modal resolution of a record, is also included to allow users to easily search for records.The term "_uneven" is appended to the nominal label for records that have a variable resolution.Care should be used when interpolating these records to even sampling resolution for analysis because, although most are relatively evenly sampled, some records have sections of substantially higher or lower resolution.
Publication metadata (Table 4) contain bibliographical information for each coral record, including digital object identifiers (DOIs) for publications and links to the public repository from which the data were retrieved.For records featured in multiple publications, bibliographical information for publications is stored in the order established by the source data repository.The first citations are found in the pub1 metadata fields, and subsequent citations are found in pub2 and pub3.
Analysis metadata (Table 5) provide information about the laboratory analysis of the samples, including (when available) information related to subsampling the cores, coral extension rate, and tissue thickness, the units of reported variables, and the analytical precision for geochemical time series.When available, information on the measurement of the international coral reference material JCp-1 (Okai et al., 2002;Hathorne et al., 2013) is included for Sr/Ca records.Calibration metadata (Table 6) include any proxy-SST slopes, intercepts, correlations, and information about regression methods used, as reported in the original publications.These calibration metadata may differ from the standardized calibration results that we calculate across the whole database and report in Sect.3.2 below.

Quality control and validation
As records included in the CoralHydro2k database are published in peer-reviewed scientific journals, our qualitycontrol efforts were focused on the consistency of metadata and the accurate integration of records into the database.More specifically, the quality-control team worked to ensure that (i) metadata and proxy time series were entered correctly into the database, (ii) metadata followed a standardized vocabulary or format, and (iii) records were sorted into the correct group based on the types of proxies available, length, and resolution.For sites where coral records were either extended or revised in subsequent studies, we include the most recent version of the record in the database and include citation information and other metadata from previous studies.A quality-control checklist was used to ensure each field was in a standard format and contained information consistent with that in original publications and other online repositories.When information was unavailable, the corresponding fields were left blank.Users of the database should not view the inclusion of a record as an endorsement of its fidelity by CoralHydro2k for reconstructing a climate parameter, as non-climatic factors (e.g., coral skeletal structure or growth rate) can complicate the extraction of climate signals from geochemical records (see Reed et al., 2021;DeLong et al., 2013DeLong et al., , 2016)).We strongly suggest users further assess records and original publications or consult the original author or a coral paleoclimate expert if they have questions or concerns.

Relation to other PAGES 2k products
CoralHydro2k was inspired by PAGES (2k) compilations of marine and hydrological proxy records such as Ocean2k (Tierney et al., 2015;McGregor et al., 2015), SISAL (Atsawawaranunt et al., 2018;Comas-Bru et al., 2020), and Iso2k (Konecky et al., 2020) but was created to address a different set of research questions.As the database is designed specifically for coral-based proxy records, we employ more inclusive record-selection criteria that allow us to include records that do not meet the length requirements of previous PAGES 2k data compilations but are important to contextualize ongoing climate change during the Common Era.The CoralHydro2k database also contains new, updated, or extended records that were published after previous PAGES efforts and will continue to be actively curated and updated annually.With a more comprehensive coral database, the Coral-Hydro2k project will investigate methodological differences in proxy-SST calibrations, explore methodologies for deriving coral-based δ 18 O sw reconstructions, refine proxy-system models for coral Sr/Ca and δ 18 O time series that enable proxy data and climate model intercomparison, and provide a denser proxy network for paleodata-assimilation efforts.

Spatial and temporal coverage
The CoralHydro2k database includes 233 proxy time series from 124 unique locations sorted into seven groups (Fig. 1a).The proxy time series are stored as "records", with 54 records containing paired Sr/Ca and δ 18 O time series, 79 records containing only δ 18 O time series, and 46 records containing only Sr/Ca time series.For 19 of the paired δ 18 O and Sr/Ca records, we also include in the database the coral-derived δ 18 O sw time series calculated by the authors of the original publication.Records in the CoralHydro2k database extend from 33 • N to 28 • S and across all tropical oceans.The majority of these records are concentrated in the Indo-Pacific Warm Pool and the western tropical Atlantic, as conditions there are favorable for coral growth and reefs are more accessible to researchers.Record density is low in the eastern tropical Pacific and eastern tropical Atlantic, where cooler and/or more variable ocean conditions are generally unfavorable for coral growth.
The majority of records in the database fall between 1800 and 2010 CE (Fig. 1b).Approximately 28 % of records in the database cover time intervals earlier than the 1800s, with most of these records coming from corals that are dead when collected (often referred to as "fossil" corals), which provide short, discrete time series often spanning several decades that have been used to reconstruct preindustrial tropical climate variability (e.g., Cobb et al., 2003a;Wu et al., 2017;Abram et al., 2020).The oldest such record in the database is a coral https://doi.org/10.5194/essd-15-2081-2023 Earth Syst.Sci.Data, 15, 2081-2116, 2023  paleoData_isAnomaly Anomaly data flag Logic This indicates whether the proxy record is anomaly data (residuals after the subtraction of a mean value).
Table 3. Nominal resolution descriptors.Defines the number of "data points per year" that were used to determine the nominal resolution label for each proxy record.Fields containing standardized data, e.g., uniform units or controlled vocabulary, are italicized.

Nominal resolution
Data points per year monthly; monthly_uneven Twelve data points per year; "_uneven" is added to records with variable resolutions that typically have over 12 data points per year.
bimonthly; bimonthly_uneven Six data points per year; "_uneven" is added to records with variable resolutions that typically have 6-11 data points per year.
quarterly; quarterly_uneven Four data points per year; "_uneven" is added to records with variable resolutions that typically have four to five data points per year.

biannual; biannual_uneven
Two data points per year; "_uneven" is added to records with variable resolutions that typically have two to three data points per year.
annual; annual_uneven One data point per year; "_uneven" is added to records with variable resolutions that typically have one data point per year.
> annual Less than one data point per year from Hainan Island in the South China Sea that covers 167-309 CE (Xiao et al., 2017).
A surge in coral-based proxy-record generation began in the early 1990s and is reflected in the most common core-top ages occurring in the period from 1990 to 2015 CE (Fig. 2ab).Peak record density occurs between the late 1980s and early 1990s (Fig. 1b), reflecting increased coral-coring efforts in all tropical oceans from 1985 to 2015 CE (Fig. 2) that increase data coverage across this interval.Record density precipitously drops after 1998 CE (Fig. 1b), which may simply reflect the 5-to 15-year delay between core collection and record publication.However, we observe that fewer new records are available from more remote regions of the tropics (Fig. 2c).The availability of Sr/Ca and paired records began in the late 1990s (Fig. 2b) with the development of a rapid, high-precision, and cost-effective method for measur-ing Sr/Ca using inductively coupled plasma optical emission spectrometry (ICP-OES) (Schrag, 1999).Sr/Ca and paired records in the database that have core-top dates prior to the late 1990s typically represent updates or extensions to previously published δ 18 O records (e.g., Felis et al., 2000Felis et al., , 2018) ) or fossil records.
A majority of the records included in the CoralHydro2k database offer seasonal or subseasonal resolution: 76 % of the records in the CoralHydro2k database have monthly or bimonthly resolution, 6 % have quarterly to biannual resolution, and 18 % have annual or lower resolution (Fig. 3).

Relationship with sea surface temperature
Proxy records in the CoralHydro2k database capture SST variability on seasonal and longer timescales.Hydro2k, we calculate Pearson correlation coefficients between the records and local SST (2 • grid area) from the NOAA ERSSTv5 data set (Huang et al., 2017).Significance is assessed here at the 90 % confidence level: for bimonthly average data, 92 % of Sr/Ca and 96 % of δ 18 O records have a significant correlation with SST.Significant absolute correlations range from 0.23 to 0.94 (Sr/Ca-SST) and from 0.13 to 0.89 (δ 18 O-SST) for the interval 1950-2020 CE, with a median correlation of 0.74 for Sr/Ca-SST and 0.59 for δ 18 O-SST (Fig. 4a, c).Bimonthly average correlations are generally stronger at higher latitudes, where the seasonal range in temperature is larger.
Significant absolute correlations between annual-average proxy time series and local SST range from 0.26 to 0.89 for Sr/Ca data and from 0.26 to 0.92 for δ 18 O data, with medians of 0.50 for Sr/Ca-SST and 0.57 for δ 18 O-SST correlations (Fig. 4b, d).For annual-average data, 43 % of Sr/Ca-SST and 56 % of δ 18 O-SST correlations are significant.Here, we use the April-March tropical year for annual averages to avoid splitting large-scale tropical variability between years (Ropelewski and Halpert, 1987).The higher annual proxy-SST correlations occur near the Equator, particularly in the central and western tropical Pacific, where the ENSO drives large SST changes on interannual timescales.We note that significant discrepancies exist among gridded SST data products (e.g., HadISST, ERSST, OISST) due to the scarcity of observations across space and time and the different statistical techniques used to infill missing data in each SST data product (e.g., Deser et al., 2010;Freeman et al., 2017;Kennedy et al., 2019).Thus, the proxy-SST correlations presented here may deviate from those stated in each record's original publication.
Patterns observed in proxy-SST correlations are also mirrored in the dominant frequency of variability displayed by each record.To examine the relative contributions of seasonal, interannual, and decadal variability in coral records, we apply a 13-month high-pass (seasonal), 2-7-year bandpass (interannual), and 10-year low-pass filter to all monthly and bimonthly records that are at least 30 years long.Filtering was performed using a sixth-order Butterworth filter in MATLAB, with the filter order used to optimize filtering in the decadal band.Variance for each filtered series is normal-ized by the proxy record variance determined for the entire record length to enable comparison between δ 18 O and Sr/Ca (Fig. 5).The seasonal variance in both proxies increases with latitude (Fig. 5a-b), with records in the subtropics exhibiting greater seasonal variance than records close to the Equator.Conversely, records in the Indo-Pacific Warm Pool contain higher proportions of interannual variance (Fig. 5c-d).This pattern is more apparent among longer coral δ 18 O records, as several Sr/Ca records in the database do not meet the 30-year length requirement for bandpass filtering.
For global or regional climate reconstructions, it is useful to consider the relationship of gridded SST data products with the proxy network as opposed to the relationship of individual records with the nearest grid point in those data products.To assess the reconstruction potential of the Coral-Hydro2k proxy network, we calculate the median absolute correlation between each ERSSTv5 grid box and all available records in the database within a 3000 km radius (Fig. 6).We https://doi.org/10.5194/essd-15-2081-2023 Earth Syst.Sci.Data, 15, 2081-2116, 2023 find significant (assuming a 90 % confidence interval) annual Sr/Ca-SST correlations across 56 % of the tropical and subtropical oceans (Fig. 6b) and significant annual δ 18 O-SST correlations across 60 % (Fig. 6d).Consistent with previous results, bimonthly correlations between SST and both proxies are higher (Fig. 6a, c) due to the seasonal cycle.While non-climatic factors, such as age-model errors (Comboul et al., 2014;Lawman et al., 2020b;Loope et al., 2020), may lower the correlation between coral proxies and SST in some regions, significant correlations observed here highlight the fact that the CoralHydro2k database captures regional to global patterns of climate variability and thus is suitable for reconstructing SST variability across much of the tropical and subtropical oceans.Reconstruction potential is limited in the eastern Pacific and eastern Atlantic due to the scarcity of corals from those regions.

Relationship with hydrology
Coral δ 18 O records capture combined changes in local SST and δ 18 O sw , with the latter reflecting the balance among hydrological processes (e.g., precipitation, evaporation, horizontal and vertical ocean advection).Since observed δ 18 O sw data coverage is limited through space and time (e.g., LeGrande and Schmidt, 2006;Boyer et al., 2018;Breitkreuz et al., 2018), we compare each coral δ 18 O record to SSS from the nearest Hadley EN4.2.1 grid box (Good et al., 2013;Gouretski and Reseghetti, 2010) as both SSS and δ 18 O sw variability are driven by similar hydrological processes.However, we do note that the relationship between these two variables may not be spatiotemporally constant (Conroy et al., 2014(Conroy et al., , 2017)).Significant absolute correlations between coral δ 18 O and SSS between 1970 and 2010 range from 0.16 to 0.69 at bimonthly resolution and from 0.28 to 0.79 at annual resolution (Fig. 4e-f).The highest correlations occur in the Western Pacific Warm Pool region, where there is stronger SSS variability due to factors that do not strongly covary with temperature such as terrestrial runoff and ocean mixing (Qu et al., 2014;Murty et al., 2017Murty et al., , 2018b)).For western Pacific sites further away from the Maritime Continent, higher SSS-δ 18 O correlations may reflect the strong covariance between SSS and SST, especially on interannual timescales.In contrast, coral δ 18 O records from sites close to the Equator in the Indian and central equatorial Pacific oceans exhibit lower δ 18 O-SSS correlations, which suggests that SSS variability at these sites is smaller relative to SST or may point to potential biases in gridded SSS data products.Many δ 18 O-SSS correlations at annual resolution are not significant; however, this may be more reflective of the SSS data set used here rather than the integrity of records in the database.Historical SSS observational records are much shorter and sparser than SST before the satellite era (Good et al., 2013;Boyer et al., 2018;Friedman et al., 2017), especially in the tropical and subtropical oceans.Consequently, much larger discrepancies exist among gridded SSS data sets than those found between gridded SST products (Carton et al., 2018(Carton et al., , 2019;;Zweng et al., 2019).New and emerging salinity products such as NASA's Soil Moisture Active Passive (SMAP) Sea Surface Salinity (Vazquez-Cuervo and Gomez-Valdes, 2018), the ESA's Soil Moisture and Ocean Salinity Mission (SMOS; Boutin et al., 2016), Aquarius (Drucker and Riser, 2014), and Argo (Schmid et al., 2007) will be important calibration data sets for future coral studies or reconstructions that cover the years since 2011 CE.Nonetheless, the lack of long, historical SSS records highlights the need for independent coral-based constraints on long-term hydrological trends across the tropical and subtropical oceans.

Local reproducibility of Sr/Ca and δ 18 O records
We assess the "local" reproducibility of coral records in the database by comparing each proxy record to records of the same type within a 50 km radius with at least 20 years in common (Fig. 7).As the CoralHydro2k database represents the most comprehensive coral-based proxy compilation effort to date, approximately 36 % of the records are within a 50 km radius of one to five contemporaneous records.Bimonthly absolute correlations for Sr/Ca records within 50 km of each other range from 0.11 to 0.95 (Fig. 7a), and bimonthly δ 18 O correlations range from 0.24 to 0.79 (Fig. 7c).Similarly, annual correlations for Sr/Ca records within 50 km of each other range from 0.04 to 0.59 (Fig. 7b), and annual correlations for δ 18 O records range from 0.01 to 0.87 (Fig. 7d).While we observe good reproducibility at most sites, the highest degree of local reproducibility among both Sr/Ca and δ 18 O records occurs in more open ocean settings (e.g., the central Pacific), where there is less spatial variability in growth environments, ocean advection patterns, local SST, and SSS across short distances.
While intercolony variability has mostly been studied in massive Porites spp.corals, which are widely distributed throughout the Indian and Pacific oceans and most commonly used in paleoclimate reconstructions, some species (e.g., Siderastrea siderea, found in the Atlantic Ocean) exhibit more reproducibility among coral colonies in Sr/Ca, δ 18 O, and calibration equations (Maupin et al., 2008;De-Long et al., 2014, 2016;Kuffner et al., 2017;Weerabaddana et al., 2021).More work is needed to both quantify inter-colony variability in different coral species and understand the impact of the calibration method on coral-based temperature reconstructions.

General applications
The CoralHydro2k database is the most comprehensive compilation of coral δ 18 O and Sr/Ca records to date.The database offers extensive coverage of monthly to annually resolved marine proxy records that can be used to investigate near-surface hydrology and temperature variability across the global tropics and subtropics.Comparable information at similarly high resolution is rarely available with other marine paleo-archives.Paired coral Sr/Ca-δ 18 O records allow for independent reconstruction and investigation of preindustrial temperature and hydrologic changes at seasonal, interannual, and decadal timescales.The inclusion of both unpaired and short proxy records, many of which did not meet the selection criteria of previous PAGES 2k data compilations, allows the CoralHydro2k database to be used for applications beyond large-scale temperature and hydrology reconstructions.This includes, and is certainly not limited to, proxy calibration studies, proxy-system model development, and paleodata-assimilation efforts.Records in the CoralHy-dro2k database can also be compared to model outputs, either by converting coral Sr/Ca into temperature for direct comparison or by using proxy-system modeling to estimate proxy composition from climate model output.Coral δ 18 O records and coral-derived δ 18 O sw records can also be directly compared with new simulations from isotope-enabled models.A brief overview of how to access, query, and cite the database is provided in the sections below.

Searching the CoralHydro2k database
All three serializations of the CoralHydro2k database store data in two main containers.The first container, labeled "D", stores records and metadata under each record's unique identifier as described in Sect.2.3.The second container, labeled "TS", is a "flattened" form of the database, where information for all records is stored in a format that resembles a spreadsheet.The unique identifier for the coral record each time series belongs to can be found in the dataSetName and paleoData_ch2kCoreCode fields within this flat format.MATLAB and R serializations also have an "sTS" container to be consistent with other PAGES2k data sets, which contains the same information as container "TS".Proxy records stored in the CoralHydro2k database can be searched for using a variety of keywords or parameters.Users can search, filter, or create subsets of the database in a number of different ways by creating their own scripts in MATLAB, R, or Python.One potential starting point is to filter or create a subset of records based on the paleoData_coralHydro2kGroup  d, f) resolutions.SST and SSS were taken from the grid box nearest to each coral record in the NOAA ERSSTv5 (Huang et al., 2017) and Hadley EN4 (Good et al., 2013) data sets, respectively.Significant correlations are denoted by circles (greater than 90 % confidence interval), and nonsignificant correlations are denoted by diamonds.We note that significance can vary based on the choice of gridded data set and grid box, annual-averaging period, and correlation interval, and as such, the values shown here may differ from those reported in the original publications for each record.Correlations are shown as absolute values for ease of visualization, but we note that the linear relationship between SST and coral Sr/Ca or δ 18 O is negative.field, which categorizes records based on the predefined proxy, resolution, and temporal coverage groupings (Table 1) described in Sect.2.3.Additional ways to query the database include but are not limited to the following.
-Proxy type, using the field paleoData_variableName to select either Sr / Ca or δ 18 O records.
-Temporal coverage, using minYear and maxYear to search for proxy records that fall within a specific time period.
-Location, using geographic coordinates (geo_latitude and geo_longitude), site name (geo_siteName), or ocean basin (geo_ocean).geo_ocean is the level-one ocean basin listed in the World Ocean Atlas (Boyer et al., 2018) for the geographic coordinates of the record.Coral species, using paleoData_archiveSpecies to search for records from a particular coral species, e.g., Porites spp.A Python demo and example MATLAB and R scripts are archived with the database to help guide users on how to access and search the database in their preferred programming language.
5 Data availability

The CoralHydro2k database
The development of the CoralHydro2k database was guided by FAIR data principles (Wilkinson et al., 2016) One of CoralHydro2k's core goals is to create an actively curated coral database.We encourage the community to submit newly published coral δ 18 O and Sr/Ca records using the data submission form located on the repository website linked above.Newly published records submitted by record generators and sourced by the CoralHydro2k team from public archives will be compiled and added to the database on an annual basis.Updates to the database will follow the versioning scheme used by the PAGES2k database (PAGES2k Consortium, 2017).The first release of the CoralHydro2k database is version 1.0.0.The version number has three counters in the following form: C 1 .C 2 .C 3 .The first counter, C 1 , is updated with each publication of a formal update of the data set.The second counter, C 2 , is updated when a record is added or removed.The third counter, C 3 , is updated when a modification is made to the data or metadata.It is anticipated that future versions and a change log describing updates with each new version will be made available at the same location as the original data release.
Researchers utilizing the whole CoralHydro2k database or a significant portion of the database should cite this paper and the paper describing the most recent version of the database.When using any subset of the CoralHydro2k database, researchers are strongly encouraged to cite the original and all associated publications for each record used, provided that this does not cause the publication to exceed the reference limit for the target journal.Citation information associated with each record in the database, including a full bibliography and DOIs and a link to the original public archive of each data set, is included in the metadata to facilitate users in crediting the original data generators in their use of the coral data.

Underlying data and sources
The figures and comparisons presented in this paper rely on data obtained from the following sources.Sea surface temperature data were obtained from ERSSTv5 (Huang et al., 2017), which is archived by the NOAA/OAR/ESRL Physical Sciences Laboratory in Boulder, Colorado, USA, and is available on their website at https://psl.noaa.gov/data/gridded/data.noaa.ersst.v5.html (last access: 19 May 2022).Salinity data were obtained from the Met Office Hadley Centre's EN4.2.1 dataset (Good et al., 2013) and are available on their website at http://www.metoffice.gov.uk/hadobs(last access: 19 May 2022).All maps displayed in Figs.1-7 were generated using the M_Map MATLAB package (Pawlowicz, 2020) available at https://www.eoas.ubc.ca/~rich/map.html(last access: 19 May 2022).

Conclusion
Shallow-water corals provide monthly to annually resolved climate records from data-scarce locations across the tropical and subtropical oceans and are incredibly useful for extending modern-day observations back into the preindustrial era, contextualizing anthropogenic climate trends, and improving the skill of future climate projections.The PAGES Coral-Hydro2k project was formed to facilitate the use of coral paleoclimate records by the broader scientific community.Our first effort on this front is the CoralHydro2k database: a mostly unfunded endeavor representing the collective efforts of more than 40 researchers across different career stages, institutes, and time zones, meeting monthly to biweekly and working asynchronously over the past 5 years.Subsequent publications from the CoralHydro2k project will use this database to evaluate proxy-SST calibrations and methodological differences used in coral-based climate reconstructions and investigate past tropical ocean hydroclimate trends using data assimilation and comparison to isotope-enabled models.Furthermore, the CoralHydro2k team has also been collecting instrumental seawater δ 18 O data as part of our database-compilation efforts.That database will be released in the near future -also following the FAIR standards -and will also be maintained with active curation (see DeLong et al., 2022).While the fruits of the CoralHydro2k database are likely to come over the next 5-10 years, continuing to invest as a community in compiling standardized data sets will inevitably elevate the utility of each record.
The CoralHydro2k database is a comprehensive, machinereadable, standardized, and actively curated database of coral δ 18 O and Sr/Ca records.Records in the CoralHydro2k database track large-scale regional SST and hydrology signals across seasonal, interannual, and decadal timescales with a high degree of reproducibility.As such, the records in the database can be used for investigating tropical and subtropical SST and hydrology variability on societally relevant timescales and can be combined with large networks https://doi.org/10.son et al., 2011;Dee et al., 2015Dee et al., , 2017;;Tardif et al., 2019) or directly in the case of isotope-enabled models (Konecky et al., 2020).The comprehensive and high-resolution nature of the CoralHydro2k database also makes it ideally suited as an input database for paleoclimate data-assimilation efforts such as the Last Millennium Reanalysis (Hakim et al., 2016;Steiger et al., 2018;Tardif et al., 2019;Sanchez et al., 2021).
Appendix A

pubX_firstauthorFigure 1 .
Figure 1.CoralHydro2k database records are divided across Groups 1-7 based on their available proxy information.See Table 1 for a summary of group descriptions.(a) Spatial distribution of all records in the CoralHydro2k database.(b) Temporal coverage of all records in the CoralHydro2k database.Inset shows earlier records (0-1750 CE).

Figure 2 .
Figure 2. Total number of records with core-top dates between 1900 and 2020 CE, sorted into 5-year bins and organized by (a) ocean basin and (b) available proxy data.Records with core-top dates prior to 1900 CE are not shown.The eastern and western Pacific Ocean are split at 180 • longitude.(c) Global core-top date spatial distribution of coral records in the CoralHydro2k database.Records with core-top dates prior to 1975 CE are not shown (31 records).

Figure 3 .
Figure 3. Resolution of coral records in the CoralHydro2k database.Spatial distributions of temporal resolution for (a) paired Sr/Ca-δ 18 O, (b) δ 18 O-only, and (c) Sr/Ca-only records.

Figure 4 .
Figure 4. Absolute correlations between coral Sr/Ca and δ 18 O and local sea surface temperature (SST) from 1950 to 2020 CE (a-d) and between coral δ 18 O and local sea surface salinity (SSS) (e-f) from 1970 to 2010 CE at bimonthly (a, c, e) and annual (April-March; b, d, f) resolutions.SST and SSS were taken from the grid box nearest to each coral record in the NOAA ERSSTv5(Huang et al., 2017) and Hadley EN4(Good et al., 2013) data sets, respectively.Significant correlations are denoted by circles (greater than 90 % confidence interval), and nonsignificant correlations are denoted by diamonds.We note that significance can vary based on the choice of gridded data set and grid box, annual-averaging period, and correlation interval, and as such, the values shown here may differ from those reported in the original publications for each record.Correlations are shown as absolute values for ease of visualization, but we note that the linear relationship between SST and coral Sr/Ca or δ 18 O is negative.

Figure 5 .
Figure 5. Percent variance of coral Sr/Ca (a, c, e) and δ 18 O (b, d, f) records calculated as the fraction of variance that each timescale of variability contributes to total time series variance.Variance is calculated across the full length of each coral record.(a-b) High-pass variability calculated using a 13-month filter.(c-d) The 2-7-year bandpass percent variability that includes interannual variance driven by the El Niño-Southern Oscillation (ENSO).(e-f) The 10-year low-pass variability.All percent variability was calculated only for records at least 30 years in length.

Figure 6 .
Figure 6.Median absolute correlation between SST (2 • × 2 • grid boxes, ERSSTv5) and coral proxy records within a 3000 km radius.Bimonthly (a, c) and annual (April-March; b, d) correlations shown are significant at the 90 % confidence level.Grid boxes with records within 3000 km but no significant correlations are shaded gray.Correlations are calculated using available data from 1950 to 2020 CE.Record locations are indicated by black asterisks.

Figure 7 .
Figure 7. Median absolute correlation between coral proxy records located within a 50 km radius.Correlation was calculated over the common time period between overlapping records, provided that there was a minimum of 20 years of overlap.Records that are not within 50 km of another record are not shown.Marker size indicates the number of records used in the median correlation calculation at each site (largest: N = 6).

Table 1 .
Summary table of group descriptions for the CoralHydro2k database.

Table 2 .
Entity metadata.Information relating directly to the coral proxy record, including location, core name, species, and time span.Fields containing standardized data, e.g., uniform units, format, or controlled vocabulary, are italicized.
ity/province 1], [island/city/province 2 (optional)], [country].Exceptions to this are reefs (reef, country) and other named, waterbased locations (e.g., named areas within the Red Sea).geo_secondarySiteName Site 2 Text Secondary location names.These may include regional names (e.g., Line Islands, Great Barrier Reef) or names of specific sites (e.g., Silabu).geo_ocean Ocean basin Text Ocean basin of the coral core as determined by its latitude and longitude according to the World Ocean Atlas (Boyer et al., 2018).geo_ocean2 Ocean basin 2 Text Secondary ocean basin names listed in publications that are not included in the official World Ocean Atlas designations.geo_elevation Elevation Numeric Elevation of corals.Values are negative to indicate corals were found below sea level.All elevation is expressed in meters (m).paleoData_coreName Core name Text Core name as specified in publications and data sets.It allows for the tracing of the coral record through past and future publications.paleoData_archiveSpecies Coral species Text Genus and species (if known) of the coral archive.Records where the species name is unknown or not given are notated as "[Genus] sp." geo_description Site type Text Any general description of the type of site in which the coral was found (e.g., fringing reef, open ocean).hasResolution_hasMaxValue Maximum resolution Numeric Minimum temporal resolution of the proxy record.Units: years.hasResolution_hasMeanValue Mean resolution Numeric Mean temporal resolution of the proxy record.Units: years.hasResolution_hasMedianValue Median resolution Numeric Median temporal resolution of the proxy record.Units: years.hasResolution_hasMinValue Minimum resolution Numeric Minimum temporal resolution of the proxy record.Units: years.Sr/Ca (SrCa), or seawater δ 18 O (d18O_sw).Annual averages have "_annual" appended to the proxy type, and uncertainty data have "Uncertainty" appended to the proxy type.paleoData_values Data Numeric A proxy or uncertainty time series vector.Data type is specified by paleoData_variableName. paleoData_units Data units Text Units for paleoData_values Earth Syst.Sci.Data, 15, 2081-2116, 2023 https://doi.org/10.5194/essd-15-2081-2023

Table 2 .
Continued.If no uncertainty time series is available, the field is blank.If a time series is available, the field contains the paleo-Data_TSid of the uncertainty time series.

Table 4 .
Publication metadata.Details on publication information for up to three publications associated with each coral record.Fields containing standardized data, e.g., uniform units or controlled vocabulary, are italicized.

Table 5 .
Analysis metadata.Coral sampling information, units used, and any additional notes on the coral record.Fields containing standardized data, e.g., uniform units or controlled vocabulary, are italicized.

Table 6 .
Calibration metadata.Any published information on the calibration of the coral record to sea surface temperature.There are no standardized metadata fields in this table.
TextPublished proxy-SST slope uncertainty for the coral record.Calibration equations take the form proxy = slope × SST + intercept (units: paleoData_units per degree Celsius).
Walter et al.:The CoralHydro2k database of terrestrial paleo-archives of climate variability such as tree rings, ice cores, or speleothems to investigate past and present ocean-atmosphere-land interactions.Moreover, the database enables global-scale comparisons of coral-based paleoclimate reconstructions with state-of-the-art climate models, either through the use of forward models (Thomp- 5194/essd-15-2081-2023 Earth Syst.Sci.Data, 15, 2081-2116, 2023 2096 R. M.

Table A1 .
Reference table of publications cited in the CoralHydro2k database.Citations in the Cited publications column are listed in the order presented in the database (pub1, pub2, pub3).

Table A1 .
Continued.The "CoralHydro2k Project Members" author group includes all named contributors as well as Sarah S. Eggleston (Past Global Changes, 3012 Bern, Switzerland), Nicholas T. Hitt (School of Geography, Environment, and Earth Sciences, Victoria University of Wellington, Wellington, 6012, New Zealand), Belen Martrat (Department of Environmental Chemistry, Spanish Council for Scientific Research (CSIC), Institute of Environmental Assessment and Water Research (IDAEA), 08034 Barcelona, Spain), Helen V. Mc-Gregor (School of Earth and Environmental Sciences, University of Wollongong, Wollongong, 2522, Australia), and Feng Zhu (School of Atmospheric Sciences, Nanjing University of Information Science & Technology, Nanjing, 211544, China).