The Iso2k database: a global compilation of paleo- δ 18 O and δ 2 H records to aid understanding of Common Era climate

. Reconstructions of global hydroclimate during the Common Era (CE; the past ∼ 2000 years) are important for providing context for current and future global environmental change. Stable isotope ratios in water are quantitative indicators of hydroclimate on regional to global scales, and these signals are encoded in a wide range of natural geologic archives. Here we present the Iso2k database, a global compilation of previously published datasets from a variety of natural archives that record the stable oxygen ( δ 18 O) or hydrogen ( δ 2 H) isotopic compositions of environmental waters, which reﬂect hydroclimate changes over the CE. The Iso2k database contains 759 isotope records from the terrestrial and marine realms, including glacier and ground ice (210); speleothems (68); corals, sclerosponges, and mollusks (143); wood (81); lake sediments and other terrestrial sediments (e.g., loess) (158); and marine sediments (99). Individual datasets have temporal resolutions ranging from sub-annual to centennial and include chronological data where available. A fundamental feature of the database is its comprehensive metadata, which will assist both experts and nonex-perts in the interpretation of each record and in data synthesis. Key metadata ﬁelds have standardized vocabularies to facilitate comparisons across diverse archives and with climate-model-simulated ﬁelds. This is the ﬁrst global-scale collection of water isotope proxy records from multiple types of geological and biological archives. It is suitable for evaluating hydroclimate processes through time and space using large-scale synthesis, model–data intercomparison and (paleo)data assimilation. The Iso2k database is available for download at https://doi.org/10.25921/57j8-vs18 (Konecky and McKay, 2020) and is also accessible via the NOAA/WDS Paleo Data landing page: https://www.ncdc.noaa.gov/paleo/study/29593 (last access: 30 July 2020).


Progress and challenges in the synthesis of Common Era hydroclimate
The past ∼ 2000 years, otherwise known as the Common Era (CE), are an important research target for contextualizing modern climate change. Decades of paleoclimate research have yielded numerous records spanning all or part of this time period, making it sufficiently data-rich to assess the range of natural (internal and forced) climate variability prior to the Industrial Revolution. These records are also used in conjunction with climate model simulations to detect and attribute anthropogenic climate change. Over the past sev-eral years, large-scale data synthesis efforts within the international paleoclimate community have produced important constraints on regional to global surface air and ocean temperature patterns during the CE McKay and Kaufman, 2014;PAGES 2k Consortium, 2013, 2019Tierney et al., 2015). However, progress on the synthesis of hydroclimate patterns has been limited (PAGES Hydro2k Consortium, 2017), despite the societal relevance of the changing water cycle (e.g., Kelley et al., 2015). The water cycle is a far more complex target than surface air and ocean temperature, and different proxy systems track different aspects of the water cycle in different ways (PAGES Hy-dro2k Consortium, 2017). For example, annual precipitation amount at any given location on the Earth's surface is gov-erned not just by atmospheric processes that deliver moisture to the region but also by topography, varying characteristics of storms and associated clouds, dynamics of the seasonal cycle, and variations in the contribution of extreme precipitation events to the water budget (Bowen et al., 2019). Individual paleoclimatic proxy types are often sensitive to multiple aspects of the water cycle that can be difficult to disentangle, making it challenging to directly compare among proxy types. For example, precipitation amount in the Arctic could be inferred from two common precipitation proxies: grain size from lake sediments and accumulation rates from ice cores. Grain size fluctuations in lake sediments can track extreme precipitation and runoff events, but inter-lake comparison requires knowledge of lake morphometry and competing moisture source regions (Conroy et al., 2008;Kiefer and Karamperidou, 2019;Rodysill et al., 2019). Comparison of sedimentary grain size to snow accumulation rates would be uninformative without understanding how annual precipitation and dry season ablation, which both affect accumulation rates, are related to moisture delivery from extreme precipitation events (Hurley et al., 2016;Thompson et al., 1986). Snow accumulation rates can be strongly affected by air temperature, whereas grain size is generally not. Thus, although comparison of such heterogeneous hydroclimatic proxies is certainly possible, the lack of a common environmental signal to serve as a reconstruction target has been a major hindrance to the global reconstruction of hydroclimatic variables. These challenges have been further exacerbated by archive-and record-specific standards for data formatting, sampling resolution, metadata availability, and public archiving. These limitations may be addressed by creating a metadata-rich, multi-proxy, and multi-archive database of hydrological proxies united through standardized formatting and a common environmental signal: water isotopes.
1.2 The potential for a network of paleo-water isotope records to track past hydroclimate variations In order to address these challenges, we focus here on the stable oxygen (δ 18 O) and hydrogen (δ 2 H) isotopic compositions of environmental waters such as precipitation, seawater, lake water, and soil and groundwater (Fig. 1). The stable isotopic compositions of such waters (here collectively referred to as "water isotopes") have long been used as integrative tracers of the modern water cycle (e.g., Bowen et al., 2019;Galewsky et al., 2016;Gat, 2010;Rozanski et al., 1993). The rare heavy isotopologues of water (e.g., 1 H 18 2 O, 1 H 2 H 16 O) fractionate from their lighter, more common counterpart ( 1 H 16 2 O) during evaporation, condensation, and other phase changes, capturing an integrative history of parcels of water as they move through and among oceans, atmosphere, and land ( Fig. 1). Global databases of isotopic measurements of modern precipitation (IAEA/WMO, 2019), rivers (Halder et al., 2015), seawater (LeGrande and Schmidt, 2006), and water vapor (Galewsky et al., 2016) have contributed consider- Figure 1. Schematic illustration of the global water cycle and key metadata fields in the Iso2k database. In the Iso2k database, the histories, including phase changes and transport ("Isotope Interpretation"; red text and arrows), of different pools of environmental waters ("Inferred Material"; bold text) can be inferred by interpretation of proxy records from different archives ("Archive"; italic text). Base illustration by Helen Xiu, Washington University.
ably to our understanding of the contemporary water cycle on scales from microscales (e.g., cloud microphysics) (Kurita et al., 2011), to mesoscales (e.g., hurricane dynamics) (Good et al., 2014;Kurita et al., 2011), and to global scales (e.g., residence time of atmospheric moisture) (Aggarwal et al., 2012). More recently, spaceborne measurements of 1 H 2 HO/ 1 H 2 O in multiple levels in the atmosphere have identified the critical role of poorly observed processes such as tropical rain reevaporation (Aggarwal et al., 2012;Worden et al., 2007) and forest-atmosphere feedbacks (Wright et al., 2017). Together with climate and Earth system model simulations, which increasingly incorporate sophisticated water isotope tracers into their hydrologic schemes (Brady et al., 2019;Haese et al., 2013), water isotopes offer observational constraints on processes that are otherwise difficult to identify or constrain (Brady et al., 2019;Nusbaumer et al., 2017).
In the paleoclimate realm, hydroclimate proxy records using water isotopes are commonly obtained from a variety of natural archives, including glaciers, ground ice, cave formations, corals, sclerosponges, mollusk shells, tree wood, lake sediments, and marine sediments. Of all of the proxy types that are used to reconstruct past hydroclimate changes, water isotopes are arguably the most common and certainly the most widely distributed geographically. A global, spatially distributed network of water isotope proxy records therefore has the potential to capture features of large-scale circulation patterns while minimizing site-specific influences from individual locations (Evans et al., 2013). Paired with an understanding of water cycle processes from modern observations and isotope-enabled model simulations, reconstructions of paleo-δ 18 O and δ 2 H from these archives can provide critical information about water vapor source and air mass transport history, precipitation amount and other characteristics, glacial ice volume changes, and temperature, prior to the beginning of instrumental climate observations (Bowen et al., 2019;Dayem et al., 2010;Galewsky et al., 2016;Konecky et al., 2019b). Further, proxy system models (Evans et al., 2013) are available for most water isotope proxies, facilitating direct comparison with paleoclimate model output and thus an improved understanding of the climate dynamics responsible for observed (spatial and temporal) water isotope variability (Dee et al., 2015Dolman and Laepple, 2018;Jones and Dee, 2018;Konecky et al., 2019a;Thompson et al., 2011).
One of the obstacles to synthesizing hydroclimatesensitive paleoclimate records has been a lack of standardized metadata at the proxy system level that systematically encodes the important variables that are necessary for integrating records into a multi-proxy synthesis and interpreting the results. Although the paleoclimate community is in the process of defining and adopting metadata conventions (Khider et al., 2019), the "bare minimum" current standards (e.g., ISO 19115 for geographic metadata) used by World Data System (WDS) repositories (e.g., NOAA Paleoclimatology, PANGAEA) are insufficient for characterizing water isotope proxy systems in a way that can be reliably applied to large-scale paleo-hydroclimate syntheses. One key example of this challenge is the temperature dependence of O-and Hisotopic fractionation, which has frequently been exploited to reconstruct past temperature changes in locations where air or water temperature exerts first-order influence on isotope ratios in precipitation and/or seawater (Kilbourne et al., 2008;Meyer et al., 2015;Porter et al., 2014). Yet in most places, the influence of temperature on isotopic fractionation is only one of many factors that influence the δ 18 O and δ 2 H of precipitation (Liu et al., 2012;Thomas et al., 2018) and seawater (Conroy et al., 2017;Partin et al., 2012;Russon et al., 2013). A network of water isotope records will inevitably contain information about air and water temperature but also other key hydroclimatic variables such as atmospheric moisture source changes and surface water evaporation. In order to tap the full potential of water isotope proxy records in a large-scale synthesis, the metadata associated with such records must be sufficient to capture at least a bare minimum of the complexity of the environmental signals that the records contain.
Additional metadata challenges have hindered progress in paleo-water isotope synthesis thus far. Most published datasets shared outside WDS repositories follow nonuniform metadata standards or contain minimal metadata. Datasets are often catalogued using different conventions (often at the authors' discretion), stored in varying formats (e.g., text, CSV, PDF), and uploaded to different public or private (i.e., behind journal paywalls) repositories. Furthermore, datasets are frequently archived without the raw chronological information that would be required to propagate age uncertainties if desired. These challenges are common to any paleoclimate synthesis effort and are not unique to water isotopes (Atsawawaranunt et al., 2018;Emile-Geay and Eshleman, 2013;PAGES 2k Consortium, 2017), but they exacerbate the challenge of hydroclimate-specific metadata needs.

The PAGES Iso2k database
Here we introduce the Past Global Changes (PAGES) Iso2k database, a collection of 759 water isotope proxy records (i.e., individual time series) from 506 sites (geographic locations) covering all or part of the CE. The database has been assembled by the PAGES Iso2k Project (hereafter "Iso2k"). The Iso2k database contains δ 18 O and δ 2 H-based paleoclimate records from 10 different archives: glacier and ground ice (210 records); speleothems (68 records); corals, sclerosponges, and mollusks (143 records); wood (81 records); terrestrial and lake sediments (158 records); and marine sediments (99 records). Of these, 606 records are considered to be the primary time series for each site (Fig. 2) (see Sect. 2.4 and Table S1 in the Supplement). To address the complexity of environmental signals preserved in these proxy records, the database contains detailed metadata about each record's isotope systematics and proxy system context, as well as details about the original authors' climatic interpretation, chronological and analytical uncertainties, and other information required for robust data synthesis and interpretation. Iso2k has developed a uniform framework suitable for all proxy archives in the database. The architecture of the Iso2k database therefore provides a scalable foundation on which future multi-proxy hydroclimatic databases can be built, for example, by incorporating non-isotopic proxy records, such as the grain size and ice accumulation example in Sect. 1.1.
The Iso2k database is the latest in a series of communityled paleoclimate data synthesis efforts endorsed by PAGES (Atsawawaranunt et al., 2018;Kaufman et al., 2020;Mc-Gregor et al., 2015;McKay and Kaufman, 2014;PAGES 2k Consortium, 2013Tierney et al., 2015). The main distinguishing feature of the Iso2k database is that it is not organized around one archive type, climate variable, or region; rather, it contains a systematic representation of the suite of environmental signals preserved in the water isotopic composition of diverse paleoclimatic archives, with no a priori assumptions about the underlying climatic interpretation of those signals. This novel approach yields a database that is flexible enough to evaluate many different environmental parameters and processes during the CE, depending on investigator interest. The Iso2k database also contains even more comprehensive metadata descriptions than previous PAGES compilations (e.g., PAGES 2k Consortium, 2017). Database users can therefore filter for and process only the records required for their research question of interest.
This data descriptor presents version 1.0.0 of the PAGES Iso2k database. We describe the collaborative process of assembling the database (including quality control and valida- tion) and outline the structure and contents of the database (including data selection criteria, metadata, and chronological information). All data are provided in the Linked Paleo Data (LiPD) format (McKay and Emile-Geay, 2016) and are machine readable across different platforms and operating systems. We provide files with sample code to quickly explore the database using various programming languages and platforms (R, MATLAB, Python). The Iso2k database is available for download at https://doi.org/10.25921/57j8-vs18 (Konecky and McKay, 2020). The database can also be accessed via the NOAA NCEI World Data Service for Paleoclimatology (WDS-NOAA) landing page: https://www.ncdc. noaa.gov/paleo/study/29593 (last access: 30 July 2020). The WDS-NOAA landing page contains links to download the serializations for R, MATLAB, and Python, as well as information on submission of new or revised datasets and other instructions. More information on versioning, submission of new datasets, and other database updates can be found in Sect. 6.3.

Collaborative model
Iso2k is a contribution to Phase 3 of the PAGES 2k Network (PAGES 2k Network Coordinators, 2017). Calls for participation in Iso2k were widely distributed, ensuring a representative cross section of scientists from various disciplines (Konecky et al., 2017(Konecky et al., , 2018Partin et al., 2015). Iso2k built on the successes and challenges of previous PAGES 2k projects (Anchukaitis and McKay, 2014;Kaufman, 2014;PAGES 2k Consortium, 2017;PAGES Hydro2k Consortium, 2017) when deciding on the selection criteria (i.e., requirements for inclusion of records) and metadata fields necessary to make the database suitable for a wide range of applications. Most work was done remotely via teleconferences, with one in-person meeting at the 2017 PAGES Open Science Meeting in Zaragoza, Spain.
The workload for assembling the data and metadata was subdivided among working groups, representing one of the following archive types: marine sediment, marine carbonates (corals, mollusks, sclerosponges), glacier ice, ground ice, lake sediments, speleothems, and wood. This archive-based approach ensured that data were collated by researchers with an in-depth, process-based understanding of each proxy system.

Data aggregation and formatting
The database comprises publicly available water isotope proxy records that span all or part of the CE and meet the criteria outlined in Sect. 2.3. The database was compiled in two main stages. During the first stage, the archive teams obtained records, entered data, and compiled the extensive metadata outlined in Sect. 4. During the second stage, the data and metadata were extensively quality-controlled following the procedure outlined in Sect. 2.4.
We used a variety of sources to identify records for inclusion in the database. We first extracted records that met our selection criteria (described in Sect. 2.3.1) from existing data compilations, including the PAGES 2k temperature database (PAGES 2k Consortium, 2017), the Arctic Holocene Transitions database (Sundqvist et al., 2014), and the SISAL database (Atsawawaranunt et al., 2018). Archive teams then searched the literature and online data repositories (WDS-NOAA and PANGAEA) for additional suitable datasets. For records that had been published but that had not previously been made available in an online public repository (referred to as "dark data"), datasets were digitized from publication tables, appendices, and supplementary materials. Datasets that were not available in their original publications were requested from the authors by email. If two or more email requests went unanswered, the dataset was deemed not publicly available and therefore did not meet that criterion for inclusion in this database. Of the 606 primary time series in the database, more than 20 % (128 records) are dark datasets that were added by Iso2k members and are now available in a public, online, machine-readable format for the first time. The vast majority of those datasets were from wood or from lake and terrestrial sediments (58 and 52, respectively), with an additional 14 from glacier and ground ice, 2 from marine sediments, and 2 from corals.
In addition to isotopic datasets, raw age control data (e.g., 14 C ages) were obtained for records where age-depth modeling is required (i.e., non-annually resolved records). Many isotopic datasets that were available through data repositories did not contain raw age control data, in which case we followed the dark data procedure described previously to obtain appropriate chronological data from the authors. For dark age control data, authors were emailed with a request for the data and a spreadsheet template in which chronological information could be added. Age control data from authors who did not respond to these requests could not be added to the database. Again, the majority of "dark" age control data added to the Iso2k database was from the lake sediments archive (over 40 age control datasets are now publicly available for the first time).
Metadata (Sect. 4) were obtained from the data source, extracted from the original publication, or requested from the original data generators (again following the dark data procedure above). We note that even for datasets that were previously publicly available, the Iso2k database has expanded on these data by adding chronological data and compiling an extended suite of metadata not previously available in a consolidated format.

Record selection criteria
Records were screened by their respective archive teams to ensure that criteria for inclusion in the database were met. Criteria for inclusion in the database were formulated to optimize spatiotemporal coverage of the data, with the goal of building a comprehensive database of water isotope records that can be subsampled as needed to address diverse scientific questions. The selection criteria for data records to be included in the Iso2k database are as follows.

Record resolution and duration
The duration and temporal resolution of records included in the Iso2k database varies by archive type. For ∼ annually or ∼ sub-annually banded archives (corals, shells, sclerosponges, tree wood, varved lake and marine sediments, and glacier ice), the minimum record duration for inclusion in the database is 30 years. For all other archives (speleothems, non-varved lake and marine sediments), records must have a minimum duration of 200 years and contain at least five data points during the CE.

Chronological constraints
The PAGES 2k temperature database (PAGES 2k Consortium, 2017) was used as a guide for minimum chronological control criteria. Records from annually banded archives must be either cross-dated or layer-counted; records from non-annually banded archives must have at least one age control point near both the oldest and youngest portions of the record, with one additional age control point somewhere near the middle required for records longer than 1000 years.

Peer review and public availability
To qualify for inclusion in the database, isotope records must be published in a peer-reviewed journal (i.e., not universitypublished theses and dissertations). Records included in version 1.0 of the database had to be published and publicly available (see definition in Sect. 2.2) before 4 May 2018.

Ancillary data
In some cases, paired geochemical measurements are also included in the Iso2k database to complement interpretation of the isotopic data, such as paired trace elemental measurements (e.g., Sr/Ca or Mg/Ca) that accompany some carbonate δ 18 O records from corals, sclerosponges, and planktonic foraminifera, or δ 13 C data that accompany some carbonate records. Derived isotopic data for deuterium excess (dexcess) are also included for glacier and ground ice, where paired measurements of δ 18 O or δ 2 H allowed the original authors to calculate this additional hydroclimatic indicator. Similarly, derived values for the δ 18 O of seawater are available for coral and marine sediment records in cases where an independent temperature reconstruction was available for the same archive (e.g., Sr/Ca for corals and Mg/Ca for planktonic foraminifera). Where the paired carbonate δ 18 O and Sr/Ca or Mg/Ca records can be used to infer the δ 18 O of seawater (Cahyarini et al., 2008;Elderfield and Ganssen, 2000;Gagan et al., 1998), both time series (δ 18 O measured directly on carbonate and δ 18 O seawater calculated from paired records) and ancillary, non-isotopic geochemical records are included in the database (Sect. 4).

Quality control procedure
Records considered to be a primary time series for their respective sites (Sect. 4; Table 6) were quality-controlled to the highest degree possible, as described below. Primary time series were judged to be the one or two time series upon which the original authors based their main climatic interpretations. For archives such as corals and speleothems, the primary time series are typically a composite of multiple records from a site or the latest of a series of modified records from a site, whereas for other archives the primary time series is one deemed to have the most robust climatic signal (e.g., for lake sediments, a biomarker of terrestrial vs. mixed terrestrial and aquatic origin). Non-primary time series were qualitycontrolled as much as possible and are included because they may contain valuable information for database users. Both data and required metadata fields were screened for accuracy and completeness by one or more project members, with the initials of the project member performing the final quality control (QC) check included in the Iso2k_QC_certification metadata field. Metadata fields that required standardized or controlled vocabularies were double-checked to ensure those terms were adhered to (Sect. 4). During the quality control certification process, project members used a web-based data viewer (lipdverse.org) and other visualization tools to display the raw data and metadata.
Each metadata field in the database (Tables 1-7) has a quality control certification "Level" from 1 to 3, defined as follows.
-Level 1 fields are required metadata for inclusion in the Iso2k database. These fields are generalizable enough to be suitable for all archive types, and they are recommended as primary fields for filtering, sorting, and querying records in the database. Level 1 required fields were subject to the highest QC standard. They follow standardized Iso2k vocabularies, where appropriate (Table 7); geographical data were checked against maps, and interpretation fields were checked against the original publication. Examples of Level 1 metadata include geographical (ISO 19115) and publication information (DOI), and the minimum required subset of isotope and proxy system interpretation metadata fields (see Sect. 4).
-Level 2 fields are highly useful (but not required) metadata fields in the Iso2k database. They may be used as secondary fields for further filtering, sorting, and querying records in the database; these fields may be particularly useful for certain archives or to refine interpretations after an analysis has been performed. Examples of Level 2 fields include species name (marine and lake sediments and corals) and compound chain length for compound-specific δ 2 H measurements (lake sediments). Terminology was standardized only where necessary and appropriate. In other cases, these fields contain freeform text with direct quotes from the original publications. During the QC certification process these fields were checked against the original publication for clarity and consistency.
-Level 3 fields may be useful to some users of the Iso2k database but are not generally recommended as fields for filtering and sorting records in the database. Level 3 fields are not entered as standardized vocabularies and the information is sometimes not available in the original publications (thus, these fields are blank for many records). Examples of Level 3 fields include information pertaining to the integration time of a proxy sensor.
-Automatic fields are the automatically generated fields that were computed directly from the data records following QC certification. Fields use standardized vocabularies and units. Examples include binary fields for whether the dataset contains raw chronological control data.
Ancillary data are not quality-controlled but are included in LiPD format for reference.

Archive types within the Iso2k database
The Iso2k database contains data from a variety of geological and biological archives. Following proxy system terminology (Evans et al., 2013), each archive has one or more sensors that directly sense and incorporate environmental signals, i.e., the δ 18 O and δ 2 H of environmental waters, into their structures. Over time these sensors then form, are deposited into, or are otherwise imprinted upon an archive that is then subsampled and subjected to isotopic measurements or observations. In this section, we describe the key characteristics of the archives and sensors that are important for the interpretation of the paleohydrological signals that they preserve.

Corals, sclerosponges, and mollusks
Corals, sclerosponges, and mollusks (predominantly bivalves and gastropods) form hard body parts of calcium carbonate (aragonite or calcite) that record the conditions of the aquatic environment in which they live (see Black et al., 2019;Corrège, 2006;Druffel, 1997;Evans et al., 2013;Lough, 2010;Sadler et al., 2014;Surge and Schöne, 2005). Further, except for sclerosponges (which are dated using U/Th geochronology), these aquatic carbonates contain annual banding structures, enabling precise chronology development. Reef-building corals represent the bulk of annually resolved marine archives included in the Iso2k database. These corals are distributed in warm shallow waters throughout the tropical oceans, whereas sclerosponges (i.e., coralline sponges or Demospongiae) and mollusks are found worldwide, the latter in both estuarine and freshwater environments. Micro-sampling and laser ablation technologies allow for sub-annual to annual sampling resolution in corals, mollusks, and sclerosponges for elemental (e.g., Sr/Ca, Mg/Ca) and isotopic analysis (δ 18 O and δ 13 C). When living samples are collected in modern waters, they contain environmental archives of the recent past (decades to several centuries), whereas dead, fossil, and archeological material can be radiometrically dated to provide windows of past isotopic variability, some of which have been cross-dated with modern records (Black et al., 2019, and references therein). The δ 18 O signal in these archives represents a combination of linear, temperature-dependent isotopic fractionation, as well as changes in the isotopic composition of the surrounding water (δ 18 O w ) (Grottoli and Eakin, 2007;Rosenheim et al., 2005). In some regions, the temperature component dominates the δ 18 O signal, whereas in other regions δ 18 O w variability is the primary driver of the δ 18 O variability and reflects hydrological and/or oceanographic processes such as vertical and horizontal advection or the freshwater endmember (Conroy et al., 2017;Russon et al., 2013;Stevenson et al., 2018). In some ocean settings, the close coupling between ocean and atmosphere variability leads to co-occurring cool and dry (or warm and wet) anomalies that produce complementary isotopic anomalies (Carilli et al., 2014;Russon et al., 2013;Stevenson et al., 2015Stevenson et al., , 2018) (e.g., ENSO variability; Cobb et al., 2003). In estuarine or freshwater settings, mollusk δ 18 O values are closely linked to the local precipitation-evaporation budget (Azzoug et al., 2012;Carré et al., 2019). Coral δ 18 O and δ 13 C contain a vital effect and coral δ 18 O is offset from δ 18 O w , whereas mollusk and sclerosponge δ 18 O is generally precipitated in equilibrium with environmental water. Some coral δ 18 O records in the Iso2k database have had their mean δ 18 O removed by the original authors for comparison and cross-dating with other coral records, and this is noted in the metadata.

Glacier ice
Climate records from glacier ice are found primarily at high latitudes (Antarctica, Arctic) and high elevations (e.g., Andes, Himalayas) (Eichler et al., 2009;Meese et al., 1994). Glacier ice is formed from the accumulation of snow, which over time compacts into a section of chronologically continuous layered ice. Cores drilled through layers of glacier ice preserve sub-annually to centennially resolved climate information, with resolution varying among records due to snow accumulation rates and laboratory sampling and analysis methods (Rasmussen et al., 2014). Ice cores are dated through a variety of methods; annual layer counting and alignment to volcanic horizons are the most common approaches for records spanning the CE (Sigl et al., 2014). This database contains records of δ 18 O, δ 2 H, and/or d-excess of glacier ice. These proxies reflect the isotopic composition of precipitation (snowfall and ice), which is highly correlated to local temperature but additionally reflects changes in moisture source and condensation processes (Goursaud et al., 2019). Physical processes such as isotopic diffusion in the firn, melt and infiltration, and compaction of ice layers generally smooths the seasonal to interannual signal of climate variability in glacier ice, and the potential influence of these processes is site specific.

Ground ice (wedge ice and syngenetic pore ice)
Ground ice includes all types of ice found in permafrost; wedge ice and syngenetic pore ice hold the largest potential for paleoclimate reconstructions (Opel et al., 2018;Porter et al., 2016;Porter and Opel, 2020). Ice wedges in permafrost landscapes form via repetitive thermal contraction cracking in winter and infilling of frost cracks mostly by snowmelt in spring (with potential minor contribution of snow and/or depth hoar). The integrated isotopic composition of the previous winter's snow pack is transferred into a single ice vein without additional isotopic fractionation due to rapid freezing in the permafrost. Thus, ice wedges preserve precipitation of the meteorological winter and spring, with δ 18 O and δ 2 H commonly interpreted as proxies for local air temperature (Meyer et al., 2015). Ice-wedge records are temporally constrained by radiocarbon dating of macrofossils or dissolved organic carbon in the ice. Conversely, pore ice in syngenetic permafrost integrates precipitation that reaches the maximum thaw depth in late summer. The pore ice seasonality is a function of the local precipitation climatology and residence time of active-layer pore waters, and pore ice is enriched in heavy isotopes relative to the initial pore waters due to equilibrium fractionation during freezing (O'Neil, 1968). Because syngenetic pore ice formed within accumulating surface sediments, its age can be modeled based on a radiometrically constrained sediment age-depth profile. Syngenetic pore ice can be cored and subsampled in the same way as glacier ice (Porter et al., 2019).

Lake sediments
Lake sediments may provide long and continuous records of past environmental change Mills et al., 2017) and preserve a number of sensors for oxygen and hydrogen isotopes (e.g., Leng and Marshall, 2004). Carbonate minerals -precipitated inorganically from lake waters or in the shells of aquatic invertebrates -have been used as sensors for the isotopic composition of lake water (e.g., Hodell et al., 2001;Morrill, 2004;Von Grafenstein et al., 1998). Additional proxies analyzed with increasing frequency include biogenic silica, mostly from diatoms; (e.g., Chapligin et al., 2016;Swann et al., 2018), cellulose (Heyng et al., 2014), chitinous invertebrate remains (Van Hardenbroek et al., 2018), and lipids (Konecky et al., 2019a;Sachse et al., 2012). Of these proxies, the oxygen isotope composition of carbonates and silicates is subject to temperature-dependent isotope fractionation during mineralization, whereas the isotopic composition of organic materials is generally not influenced by temperature (Rozanski et al., 2010). The compound-specific hydrogen isotopic composition of a lipid reflects the environment in which the organism producing the lipid grew. Lipids produced by aquatic macrophytes or algae reflect the isotopic composition of the lake water, whereas lipids produced by terrestrial plants reflect the isotopic composition of soil or leaf water (which is, in many cases, highly influenced by the isotopic composition of precipitation). Both types of lipids are preserved in lake sediments (Castañeda and Schouten, 2011;Rach et al., 2017;Thomas et al., 2016). For sensors that record the δ 18 O or δ 2 H of lake water, the climatic or hydrological change recorded in δ 18 O or δ 2 H depends primarily on the degree to which evaporation influences the lake's hydrological balance relative to other factors (Gibson et al., 2016;Morrill, 2004). In turn, the effect of evaporation on lake water isotopes largely depends on the residence time of water within the lake system, and the degree of hydrological "closure" of the lake. In open lake systems -which often have surface water inflows and outflows, with a resulting short water residence time -lake waters often reflect the isotopic values of the inflowing waters, which itself generally approximates a (sometimes) lagged signal of the weighted mean of the isotopic composition of local precipitation (Jones et al., 2016;Tyler et al., 2007). In hydrologically closed lakes -often without surface outflows and where more water leaves the system through evaporation -the initial isotopic composition of inflowing waters is altered due to this evaporation, with the δ 18 O or δ 2 H of water increasing with increasing evaporation (Dean et al., 2015;Leng and Marshall, 2004).

Wood
The wood in tree rings (tree-ring cellulose) is one of the few terrestrial proxy archives that can be directly constrained to calendar years (McCarroll and Loader, 2004;Schweingruber, 2012). Information about climatic and environmental changes on seasonal-to-annual timescales is recorded in treering cellulose δ 18 O. The δ 18 O of tree-ring cellulose is influenced by (i) the δ 18 O of source waters, (ii) factors influencing δ 18 O of the leaf water, and (iii) a fractionation factor related to the isotopic exchange of carbonyl oxygen of cellulose intermediates with cellular waters. This fractionation is derived from enriched leaf water and unaltered xylem or source waters and results in an overall ∼ 27 ‰ enrichment of cellulose δ 18 O relative to cellular waters (Barbour et al., 2004;Gessler et al., 2014;Roden et al., 2000). This fractionation is regarded as a constant in mechanistic models (e.g., Cernusak et al., 2005;Roden et al., 2000), such that cellulose δ 18 O variability mainly reflects the δ 18 O of source water and leaf waters. The δ 18 O of the source water is closely related to the δ 18 O of precipitation-derived soil water (Bowen et al., 2019). As such, the primary climatic signal that controls δ 18 O of tree-ring cellulose varies by location, depending on the climatic signals controlling precipitation δ 18 O (Sect. 1.2). For example, tree cellulose δ 18 O records have been interpreted to reflect temperature at midlatitude to high-latitude sites (e.g., Churakova (Sidorova) et al., 2019;Porter et al., 2014;Saurer et al., 2002;Sidorova et al., 2012) and precipitation amount in tropical or monsoonal sites (Brienen et al., 2013;Managave et al., 2011). As the δ 18 O of the soil water is also affected by evaporation of the soil water, precipitation minus evaporation (P − E) influences δ 18 O tree cellulose (Sano et al., 2012;Xu et al., 2018). The extent of evaporative enrichment of the source water in 18 O in the leaf (and hence δ 18 O of the leaf water and tree cellulose) is controlled by the water vapor pressure deficit between the leaf intercellular space and the ambient atmosphere, as well as leaf physiological traits (Kahmen et al., 2011;Szejner et al., 2016).

Speleothems
Speleothems are secondary cave deposits that form when water percolates through carbonate bedrock. Both atmospheric CO 2 and CO 2 generated by plant root respiration and organic matter decomposition are dissolved into rainwater as it percolates through the soil, producing carbonic acid that rapidly dissociates to produce weakly acidic water. As this acidic water percolates through the bedrock, it dissolves carbonate until the water becomes supersaturated with respect to calcium and bicarbonate (Fairchild and Baker, 2012). When the percolating waters emerge in a cave, CO 2 degassing from the drip water to the cave atmosphere induces CaCO 3 precipitation, resulting in the formation of stalagmites and stalactites (Atkinson et al., 1978) that preserve the δ 18 O signal of the waters that have percolated through from the surface (Lachniet, 2009). The δ 18 O of the deposited carbonate therefore reflects the δ 18 O of soil and groundwater that it infiltrates, which is strongly influenced by the δ 18 O of precipitation but with additional influences of aquifer mixing times, seasonality of infiltration, and in some cases extreme events (Moerman et al., 2014;Taylor et al., 2013). Processes within the karst and cave, such as calcite precipitation prior to speleothem deposition and/or kinetic isotope effects, can alter the δ 18 O of the deposited carbonate.
Although there are hydroclimatic limits on speleothem growth, speleothem distribution is largely constrained by the presence of carbonate bedrock (Fairchild and Baker, 2012). Speleothems form in a wide range of hydroclimate conditions, from extremely cold climates in Siberia to arid regions in the Middle East and Australia. The temporal resolution of speleothem paleoclimate series ranges from sub-annual to centennial, and primarily depends on the karst and cave environment. Due to the high precision of uranium series dating, speleothems provide opportunities to determine the timing of regional hydrological response to global events and links to external forcing mechanisms (e.g., insolation changes) (Fischer, 2016). The different types of measurements made on speleothems -including δ 18 O, δ 13 C, and various trace elements -and their fluid inclusions can be used to reconstruct past changes in the hydrological cycle.

Marine sediments
Marine sediments contain two types of sensors that have widely been used for measuring water isotope variability: planktonic foraminifera and biomarkers. Planktonic foraminifera are unicellular zooplankton living in the upper hundreds of meters of the ocean. They build a calcite skeleton, which is preserved in the sediment. The δ 18 O of planktonic foraminifera calcite reflects a spatially (and temporally) variable combination of temperature and δ 18 O sw (Urey, 1948) and to a lesser degree the seawater carbonate ion concentration as well (Spero et al., 1997), although changes in the latter parameter are likely negligible during the CE. The temperature effect on the δ 18 O of foraminifera calcite is systematic, i.e., the δ 18 O sw can be reconstructed using (species-specific) paleotemperature equations in conjunction with an independent estimate of calcification temperature based on Mg/Ca (Elderfield and Ganssen, 2000). Planktonic foraminifera have a short life cycle (about a month) and species-specific seasonal and depth habitat preferences (Jonkers and Kučera, 2015;Meilland et al., 2019), such that any planktonic foraminifera record bears an imprint of the ecology of the sensor (Jonkers and Kučera, 2017).
Biomarkers in marine sediments are lipids synthesized either by marine photoautotrophs, which track past changes in surface seawater isotopic values, or from vascular plants, which track soil water isotopic values on an adjacent land mass (Sachse et al., 2012). Biomarkers are strongly affected by isotopic fractionation during lipid biosynthesis, and that fractionation is often assumed to be constant (Sachse et al., 2012). However, as for planktonic foraminifera, biomarker δ 2 H values are also affected by a combination of environmental parameters. The δ 2 H values of C 37 alkenones (syn-thesized by coccolithophorids) are impacted by fractionation that changes with salinity and growth rates (Schouten et al., 2006), which can mask changes in the δ 2 H of seawater. The sources of leaf waxes are terrestrial plants, and the processes affecting leaf waxes in marine sediments are the same as in lake sediments but generally have longer associated time lags between the sensor recording the δ 2 H of soil water and ultimate deposition in the marine sediment archive.

Description of Iso2k metadata fields
The Iso2k database contains over 180 metadata fields. The 55 main fields are described in Tables 1-6; 23 of these were strictly quality-controlled following the Level 1 definition in Sect. 2.4. Entries for some required metadata fields were standardized with controlled vocabulary to allow users to easily query the database for records based on archive type, isotope ratio (O or H), waters from which the isotope ratios are derived, materials on which the isotope ratios were measured, or the environmental parameter that controls isotopic variability (Fig. 1). Metadata fields describe the primary isotopic variable being inferred, i.e., the "isotope interpretation" (e.g., the δ 2 H of precipitation); the water from which it was inferred, i.e., "inferred material" (e.g., soil water); the material that was actually measured, i.e., "measured material" (e.g., long-chain n-alkane components of leaf waxes); and information about the original climate interpretation. Distinctions between the archive type (Fig. 2), inferred material (Fig. 3), and the isotope interpretation (Fig. 4) allow for advanced analyses and straightforward data-model comparisons using the database. These metadata interpretation fields were derived from interpretations reported in the original publications. Below and in Tables 1-6, we describe key metadata fields in the database, including all Level 1 and Level 2 fields (see Sect. 2.4 for a description of levels). Table 7 provides standardized vocabularies and common terminologies. Table 8 provides selected chronological control metadata. Table S1 gives key metadata for each primary time series (Sect. 2.4), including all Level 1 fields and selected additional Level 2 fields, and references to original publications (citations also listed in Tables S2 and S3).

Entity metadata
The entity metadata fields provide basic information for each record, including the isotope measured, the archive type, location (longitude, latitude, and elevation), start and end dates of each record, and both the DOI and citation for the original publication. Entries for the archiveType, pale-oData_variableName, and paleoData_units metadata fields are standardized (Table 7) across all archive types to facilitate easy querying and analyses. Each record is assigned a unique LiPD identifier, and all isotope records are assigned a unique Iso2k identifier. The alphanumeric Iso2k identifiers contain 11 characters and digits as follows: archive type (2 char-  acters), year published (2 digits), first author's last name (2 characters), site name (2 characters), sample number (e.g., 00, 01, 02, 03 . . . ) for different cores or core composites from the same site, and letter (A, B, C . . . ) for multiple time series derived from the same core. The paleoData_variableName indicates the variable measured for each archiveType, usually δ 18 O or δ 2 H. In some cases other paired geochemical measurements are included in the database to complement interpretation of the isotopic data (Sect. 2.3.4). A list and detailed description of key entity metadata fields is provided in Table 1.

Paleodata metadata
The paleodata metadata fields provide information for each proxy record; a detailed description of key paleodata metadata fields is provided in Table 2. Measured and derived water isotope time series are identified using the paleo-Data_variableType and paleoData_description fields and should not be confused with the isotope interpretation metadata fields (Sect. 4.3), which more broadly refer to the way each proxy record is interpreted (e.g., speleothem carbonate interpreted as a proxy for the δ 18 O of precipitation). The variable description (paleoData_description) is the general category of material that was measured for its isotopic ratio (e.g., carbonate or terrestrial biomarker). Further details Elevation geo_elevation Site elevation in meters relative to mean sea 1 level (− below sea level, + above sea level).

Site name
geo_siteName Name of the site and locality of nearest 1 geopolitical center or municipality if applicable (i.e., islands retain their names).

Dataset ID
dataSetName Iso2k-specific identifier assigned to all 1 isotope records from a given site and publication.

Unique record ID
paleoData_iso2kUI Unique Iso2k identifier assigned to each 1 isotope record to distinguish among records when more than one record exists in the original publication.

LiPD ID
paleoData_TSid Unique LiPD file identifier for each time 1 series in the database.

LiPD link
lipdverseLink Link to LiPDverse webpage. 1 Maximum year maxYear Maximum (most recent) year of each isotope auto record in calendar year (CE). See Table 8 for more chronology metadata.
Minimum year minYear Minimum (earliest) date of each isotope auto record in calendar year (CE). See Table 8 for more chronology metadata.

Publication DOI pub1_doi
Digital object identifier for the first 1 publication presenting the isotope record.

Publication citation pub1_citation
Citation for the first publication presenting 3 the isotope record.
Dataset DOI datasetDOI Digital object identifier for dataset assigned 3 by original authors if available.
Dataset URL paleoData_WDSPaleoUrl URL linking back to records obtained from 3 the NOAA NCEI data repository are given by measurementMaterial, which is a more specific description of what was measured (e.g., coral, glacier ice, lake sediment), and measurementMaterialDetail, which provides further specificity of the measurementMaterial, such as mineral, species, or compound. In contrast, the inferredMaterial field indicates the environmental source waters whose isotope variability is inferred (e.g., precipitation, lake water, groundwater) (Fig. 1). The environmental source waters in the inferredMaterial field are not meant to be highly specific (e.g., intracellular leaf water) but are instead broad pools of environmental waters that have direct analogs or counterparts in climate models.

Isotope interpretation metadata
The isotope interpretation metadata fields compile critical information about environmental variables that influence isotopic variability within each record (Table 3). These fields indicate the environmental variable thought to exert dominant Inferred paleoData_inferredMaterial Source water whose isotope variability is 1 material inferred (e.g., surface seawater, lake water, precipitation). See Table 7.
Inferred paleoData_inferredMaterialGroup Super-group of inferred material; see were present when the archive formed.
Variable type paleoData_variableType Indicates whether the isotope value was 3 measured directly, temporally interpolated (e.g., from age tie points for annually banded archives), or inferred (e.g., seawater isotopic variability, inferred from paired δ 18 O and Sr/Ca or δ 18 O and Mg/Ca in marine sediments). This information is also incorporated into paleoData_description.
control on isotopic variability of the inferred environmental source waters (inferredMaterial) of each record, the mathematical relationship between the isotope interpretation variable and the isotope record, and the season(s) during which this interpretation applies. All isotope interpretation fields in the database are prefaced by isotopeInterpretation. The iso-topeInterpretation1_variable field lists the primary driver of isotopic variability in the environmental source waters according to the original publications, for example air temperature or relative humidity (Table 7). For records where multiple variables can explain some fraction of the variability, the isotopeInterpretation2 and isotopeInterpretation3 fields are also populated. The isotopeInterpretation1_direction is a field that gives the sign (positive or negative) of the relation-ship between the isotope measurements and the environmental variable.
The isotopeInterpretation1_variableGroup field is a simplified super-grouping of terms in the isotopeInterpreta-tion1_variable field in order to facilitate comparisons across different archives and realms, with three options (temperature; isotopic composition of precipitation, i.e., "P_isotope"; or effective moisture). Controlled vocabulary for metadata fields isotopeInterpretation1_variable and isotopeInterpre-tation1_variableGroup are standardized across all archive types ( Table 7).
The isotope interpretation metadata fields reflect the isotope systematics of the environmental source waters and as such are distinct from the climatic inferences that one can make from a proxy record (Sect. 4.4). In some publications, this distinction is explicitly spelled out. For example, the cave drip water that becomes incorporated into the δ 18 O of speleothem carbonate in Borneo reflects the δ 18 O of water mixed throughout an aquifer system over many months, which ultimately reflects a smoothed version of precipitation δ 18 O (Moerman et al., 2014). In that case, the inferred-Material is soil and groundwater and the isotopeInterpreta-tion1_variable is δ 18 O precipitation ("P_isotope"). Separately, δ 18 O precipitation at that same study site reflects multiple hydroclimatic processes, such as moisture transport and precipitation amount, that lend it a regional imprint of the El Niño-Southern Oscillation (ENSO) (Moerman et al., 2013), and so the climate interpretation of speleothem δ 18 O is related to ENSO, which would be described separately in the climate interpretation fields (Sect. 4.4). In many publications, the isotope systematics of the environmental source waters and the climate interpretation are stated implicitly rather than explicitly (e.g., by stating that the δ 18 O of speleothem carbonate reflects monsoon intensity or by stating that it reflects local precipitation amount via the amount effect; Dansgaard, 1964). In these cases, the isotopeInterpretation1_variable is still "P_isotope" and information about the climatic interpretation is included in the climate interpretation fields. These distinctions are critical for facilitating comparisons with isotope-enabled climate models, where complex and nonstationary climate-isotope relationships can be examined directly.
For isotopeInterpretation1_seasonality, some proxy sensors and/or archives are interpreted to record a seasonally biased signal, whereas others may record climate at an annual or sub-annual resolution (e.g., corals, some speleothems, sclerosponges, mollusks, wood). If the record is interpreted to be biased towards a specific season, the calendar months corresponding to that season -generally given as the first letter of each month, unless clarification is necessary -are recorded in the metadata field (e.g., MAM, DJFM, Jan). If the record represents an approximately mean annual signal, "annual" is recorded in the seasonality field. For coral records, if the record has sub-annual resolution (e.g., sampled at monthly or bimonthly intervals), but the overall record is not biased to any particular season, "sub-annual" is recorded in the metadata field.

Climate interpretation metadata
In contrast to the isotope interpretation (Table 3), climate interpretation metadata (Table 4) represent the original authors' expert judgment about the primary climatic controls on the isotope ratios at their study site. Climate interpretation metadata specify either climatic variables (e.g., temperature, precipitation amount) or processes (e.g., the Pacific Decadal Oscillation, Asian monsoon intensity) that the authors interpreted to influence the isotopic composition of the proxy record, and as such they are neither standardized nor quality-controlled. These metadata are included as useful background information but should not serve as a primary filter for users of the Iso2k database. A user might filter records based on the isotope interpretation field, then check the climate interpretation field for a qualitative understanding of which climatic processes may be important for the filtered set of records. For records where the isotopeInterpretation2 and isotopeInterpretation3 metadata are populated (Table 3), the corresponding climateInterpretation2 and climateInter-pretation3 metadata may also be provided.

Queryable and standardized metadata
To make the database more user-friendly and queryable, some metadata fields contain logical flags (e.g., 0 or 1, true or false), cross-links (e.g., to a corresponding record ID in another PAGES 2k database), or geographic labels (e.g., continent or ocean basin) that allow for easy sorting (Table 6). For example, if a record was included in the PAGES 2k temperature database and reconstructions (Abram et al., 2016;Kaufman, 2014;PAGES 2k Consortium, 2017;Stenni et al., 2017;Tierney et al., 2015), that record is cross-linked to its associated PAGES 2k ID wherever possible, permitting easy database query and analysis of records in only one database and those common to both databases. Approximately 15 % of the records in the Iso2k database were also incorporated into other PAGES 2k compilations, with the most overlap occurring in coral records and high-latitude ice cores. For these records, the extensive metadata can be used to facilitate deeper analyses of the hydroclimatic signals contained in these mainly temperature-dominated isotopic records. For example, with coral δ 18 O records, many of which are included in both the PAGES 2k temperature and Iso2k databases, the isotope interpretation fields denote the relative influence of δ 18 O sw vs. temperature on the isotopic variability of the coral carbonate skeleton.

Chronological control data
Chronological or depth-age metadata provide essential information for isotope records across all archive types, including Table 3. Key isotope interpretation metadata. Bold text indicates Level 1 or required fields in the database.

Variable
Name of field in database Description QC level Primary isotopeInterpretation1_variable Variable that controls isotopic 1 isotope variability within the record (e.g., interpretation "Temperature_air", "d18O seawater"). See Table 7.

Direction of isotopeInterpretation1_direction
Sign ("positive" or "negative") of the 1 relationship relationship between the isotope values and the isotope interpretation variable. For example, a record with a temperature interpretation may have a decrease in δ 18 O that corresponds to an increase in temperature.

Interpretation isotopeInterpretation1_variableGroup
Super-group of isotope 1 group interpretations (one of temperature, effective moisture, or precipitation isotope ratio). See Table 7.
Mathematical isotopeInterpretation1_mathematicalRelation Type of relationship between 2 relation isotope and climate variable ("linear" or "nonlinear").

Seasonality isotopeInterpretation1_seasonality
The calendar months the isotope 2 interpretation applies to is given as first initial of the months or as "annual" or "sub-annual" where applicable (e.g., corals, speleothems).

Basis isotopeInterpretation1_basis
Basis for the isotope interpretation 2 of each record as stated in the original publication (text or citation may be given).

Coefficient isotopeInterpretation1_coefficient
Numerical coefficient with 2 interpretation variable.

Fraction
isotopeinterpretation1_fraction Fraction of variance explained by 2 given climate variable.
an age model and the average temporal resolution for each isotope record. For non-annually banded records, age-depth models and radiometric dating information (Table 8) are included where available to facilitate independent age modeling. This information is stored in "chronData" tables that are linked to the measured data ("paleoData") tables. If a record has raw chronology data in the database (e.g., radiometric age determinations), hasChron is set to 1; otherwise this parameter is 0. Similarly, if sample depth data are available (e.g., core depth), hasPaleoDepth is set to 1.
To support the information implicit within each record's age-depth model, chronological metadata are provided for all individual age constraints (when available) and these metadata are summarized in Table 8. If available, sample information (thickness and labID) is pro-vided for all age constraints. Each age constraint that is not in radiocarbon years has age in calendar years before 1950 CE and ageUncertainty. Radiocarbon age constraints have age14C in radiocarbon years before 1950 CE and age14Cuncertainty. The materialDated, reservoirAge14C, and reservoirAge14Cuncertainty are also provided for radiocarbon age constraints to allow users to derive their own age-depth models if desired. For radiocarbon ages, we also provide fractionModern, fractionModernUncertainty, delta13C (of the material that was radiocarbon dated), and delta13Cuncertainty when available.
Several lake and marine sediment archives contain measurements of radiogenic isotopes -210 Pb, 137 Cs, and/or 239+240 Pu -to constrain the age of the sediment at and near the surface or core top. Where applicable, we provide the Table 4. Key climate interpretation metadata.

Variable
Name of field in database Description QC level Primary climate climateInterpretation1_variable Climate variables interpreted in each 2 interpretation record (queryable freeform text with quotes from original publications; e.g., "salinity", "temperature").
Primary climate climateInterpretation1_variableDetail Provides more information about the 2 interpretation climate variable (e.g., sea surface for detail temperature or salinity).

Climate climateInterpretation1_direction
Sign ("positive" or "negative") of the 2 interpretation relationship between the isotope ratios and relationship climate variable. For example, a record direction with a temperature interpretation may have a decrease in δ 18 O that corresponds to an increase in temperature.

Climate climateInterpretation1_basis
Basis for climate interpretation of each 2 interpretation record as stated in the original publication. basis isotope activity and the activityUncertainty. For 210 Pb measurements, the supportedActivity field is Y if the activity is supported by 210 Pb production in the surrounding matrix and N if the activity is not supported. The x210PbModel describes the type of model used to determine the age based on the radiogenic isotope measurements.
For carbonate systems such as speleothems and corals, U/Th dating is often used. Where available, chronological tables in the database contain information about the 238 U and 232 Th content (U238, Th232), the 230 Th/ 238 U activity ratio (Th230_U238activity), δ 234 U(d234U), and their uncertainties (U_Thactivity_error and d234U_error). Fields such as the initial 234 U/ 238 U (dU234intial) and 230 Th/ 232 Th activity ratios (Th230_Th232ratio) are also included for correcting ages for the initial 234 U/ 238 U activity and detrital thorium contamination, respectively.
The useInAgeModel is a binary field where Y indicates that age constraint was used in the published age model and N indicates that age constraint was not used in the published age model.
The amount and type of uncertainty in each chronology are provided in paleo-Data_chronologyIntegrationTimeUncertainty and pale-oData_chronologyIntegrationTimeUncertaintyType, respectively, while paleoData_chronologyIntegrationTimeBasis outlines how the chronology was constructed. By contrast, the paleoData_sensorIntegrationTime, paleoData_sensorIntegrationTimeBasis, paleo-Data_sensorintegrationTimeUncertainty, paleo-

PAGES 2k region geo_pages2kRegion
The continental (e.g., "SAm" for South 3 America) or ocean (i.e., Ocean) regions corresponding to the PAGES 2k or Ocean2k temperature reconstructions for the records included in those data compilations.

Ocean region geo_ocean
The ocean region (e.g., Pacific) 3 corresponding to the record site.
Data_sensorIntegrationTimeUncertaintyType, and pale-oData_sensorIntegrationTimeUnits fields -where available -describe the amount of time over which a sample integrates isotopic values.
5 Key characteristics of Iso2k data records 5.1 Spatial, temporal, archival, and isotopic characteristics of data coverage The Iso2k database contains 759 stable isotope (δ 18 O, δ 2 H, d-excess) records from 506 unique sites. There are 10 archive types, including 143 records from annually banded skeletal carbonate marine archives, i.e., corals (n = 137), sclerosponges (n = 4), and mollusks (n = 2); 210 from glacier ice (n = 206) and ground ice (n = 4); 158 from lake or terrestrial sediments; 99 from marine sediments; 68 from speleothems; and 81 from wood (Fig. 2a). A total of 87 % of the 759 stable isotope records in the database are δ 18 O, and 13 % are δ 2 H, with 12 sites (∼ 2 %) having records of both isotope systems (derived from the same sensor in ice cores or different sensors in lake sediments). In addition to the 759 stable isotope records, the database contains 255 records containing ancillary data (e.g., δ 13 C, Mg/Ca, Sr/Ca). Of the 759 records, 606 are considered "primary" δ 18 O, δ 2 H, d-excess time series (Fig. 2, Table S1 in the Supplement, and Sect. 2.4), including 101 records from annually banded skeletal carbonate marine archives, i.e., corals (n = 95), sclerosponges (n = 4), and mollusks (n = 2); 170 from glacier ice (n = 166) and ground ice (n = 4); 114 from lake or terrestrial sediments; 95 from marine sediments; 47 from speleothems; and 79 from wood. Spatial coverage of the sites in the database is global, but most sites are from low latitudes and Northern Hemisphere midlatitudes (Figs. 2a and 4b). Data availability is low for most of the Southern Hemisphere, with the exception of glacier ice records from Antarctica (Fig. 4b). The temporal coverage increases from about 250 proxy time series near the year 0 CE to more than 400 time series at the beginning of the 20th century (Fig. 2b). The average length and resolution of each δ 18 O time series vary considerably and are archive dependent. Banded, biologically derived archives (corals, sclerosponges, mollusks, and wood) offer the highest resolution (monthly to seasonal) and a temporal extent of between 24 to 375 years for corals and 38 to 1030 years for paleoData_measurementMaterial Coral, mollusk, ostracod, gastropod, glacier ice, aquatic or terrestrial (Level 2 quality-controlled, not fully standardized) biomarkers (n-alkane, n-alkanoic acid, dinosterol, botryococcene), planktonic foraminifera, cellulose, carbonate, or bulk carbonate tree records (time span is the 2.5 %-97.5 % quantiles). Layercounted archives such as glacier ice generally offer annual resolution and a time span between 41 and 1979 years. Other archives have lower resolution but provide more continuous coverage across the CE. The median resolution of records is 12 years per sample for speleothems, 25 years per sample for lake sediments, 28 years per sample for marine sediments, and 97 years per sample for ground ice, and the median time span of records in these archives is > 1200 years. These lower-resolution time series almost exclusively make up the records in the database prior to ∼ 1700 CE, preventing the characteristic drop in coverage in older time periods observed in and described by other PAGES 2k compilations (PAGES 2k Consortium, 2013). The records in the Iso2k Database capture many aspects of hydroclimate (Fig. 4). The first-order interpretation (iso-topeInterpretation1_variable) for 44 % of the δ 18 O and δ 2 H records in the database is "P_isotope", meaning that δ 18 O and δ 2 H of the inferred material (ice, soil water, seawater, etc.) is primarily driven by the δ 18 O and δ 2 H of precipitation. The first-order interpretation for 26% of the records in the database is "T_water" or "T_air", meaning that the temperature of water or air is the primary driver of δ 18 O and δ 2 H of the inferred material. Finally, 24 % of records in the database are primarily driven by some aspect of evaporation or evapotranspiration, collectively referred to as "effective moisture" in the isotopeInterpretation1_variableGroup category. This category includes "d18O_seawater" (driven by ocean circulation and by precipitation/evaporation at the sea surface), "ET" (evapotranspiration), "I_E" (infiltration / evaporation), and "P_E" (precipitation / evaporation) entries for isotopein-terpretation1_variable.

Validation
There is currently no existing observational dataset of isotope ratios in all major pools of the water cycle that can serve as a true validation of the Iso2k database. However, the vast majority of ice records in the Iso2k database have an inferred material of "precipitation" and a first-order isotope interpretation of "P_isotope". For these records, the δ 18 O averaged for the 20th century (all data points after 1900 CE) provides a reasonable match with the observed annual average δ 18 O of precipitation from the Global Network of Isotopes in Precipitation (GNIP) (Terzer et al., 2013) (Fig. 5). This provides confidence that the isotopic data contained in the Iso2k database can reasonably be used for analyses such as the calculation of latitudinal gradients in δ 18 O over the CE, even Age uncertainty ageUncertainty 1 SD (standard deviation) uncertainty of calendar age.

Radiocarbon age age14C
Age in radiocarbon years before 1950 CE.
Radiocarbon age uncertainty age14Cuncertainty 1 SD uncertainty of radiocarbon age in years.
Fraction modern 14 C fractionModern Fraction of modern radiocarbon activity. activity Fraction modern 14 C fractionModernUncertainty 1 SD uncertainty of fraction of modern activity uncertainty radiocarbon activity.
δ 13 C uncertainty delta13Cuncertainty 1 SD uncertainty of δ 13 C of material analyzed for radiocarbon.    (Terzer et al., 2013). Antarctica is excluded from this map due to the scarcity of GNIP stations.
before accounting for seasonal biases and other transformations within the proxy system. We note that while other proxy data types such as speleothems and leaf wax biomarkers are sensitive to P_isotope (and isotopeinterpretation1_variable for many of these records is listed as "P_isotope"; Fig. 4), their most direct inferred materials are meteoric waters such as soil water or groundwater rather than precipitation; further, water isotope values are fractionated by proxy sensors, such that they are not as directly comparable to the GNIP database.
6 Usage notes

General applications
The Iso2k database is the most comprehensive database of paleo-water isotope records to date for the CE. For the first time, this database allows for the investigation of spatial and temporal hydroclimate variability from regional to global scales across multiple proxy systems. Using the "inferred material" metadata, the database can be directly compared with the output of climate models, allowing investigation of the water cycle in far greater depth than was previously possible.
Alongside the data itself, the detailed "isotope interpretation" metadata fields are the foundation of this database. These fields allow users to understand the processes reflected in the isotope data and filter the database according to particular scientific questions. For example, a user may be interested in the temporal variability of isotope records driven primarily by changes in effective moisture, and the Iso2k standardized vocabulary means that it is straightforward to filter for these records. Note that, for many records in the database, isotopic variability is affected by more than one variable and that these secondary influences may not be trivial when conducting meta-analyses. Although only isotopeinterpretation1 fields have been quality-controlled to the highest level, the subsequent isotope interpretation fields also contain wellcurated information that is important for data interpretation.

Example workflow for filtering and querying data records
Records in the Iso2k database are provided as published (i.e., not recalibrated or validated). This preserves the large amount of information contained within water isotope proxy measurements that would be lost if condensed to reconstruct discrete variables. Rather, we leave it to the database users to filter and assess records as needed.
The MATLAB and R serializations contain three variables: "D", "TS", and "sTS". The variable D includes sitelevel data for each dataset structured in the LiPD format. Datasets in D often contain multiple variables (e.g., stable isotope, ancillary, and chronological data), and represent how LiPD data appear when loaded into the initial environment. For most users, however, a "flattened" version of the database is more useful. We have provided this as the TS variable, where each entry contains an individual time series and its associated metadata. A slightly modified version of TS is included with R and MATLAB, called sTS, which is identical to TS except that the interpretation fields are split by scope ("isotope" or "climate") in order to simplify querying, which may be preferable for some users. The Python serialization contains only D and TS because tools to split by scope were unavailable.
Additional filtering of records should be performed using Level 1 or Level 2 fields; e.g., -isotopeInterpretation1_variable = P_isotope (includes only records where the first-order control of isotopic variability is the isotopic composition of precipitation); -paleoData_description = "carbonate" or "terrestrial biomarker" or "tree ring cellulose" (to extract terrestrial archives sensitive to P_isotope aside from ice cores); -paleoData_inferredMaterial = "groundwater" or "soil water" or "lake water" (accomplishes similar results to the above).
Additional filtering of records may be useful with other Level 2 fields; e.g., -climateInterpretation1_variable = contains "P " or "Precipitation_amount" or "P_amount" (to extract only records where authors' primary climatic interpretation was based on the amount effect).
The sample R, MATLAB, and Python codes provided with this dataset (in the Supplement) provide a similar example to users.
6.3 Database updates, versioning scheme, and submission of new or updated datasets This publication marks Version 1.0.0 of the Iso2k database. Following publication, the database will continue to evolve, as new datasets are added (both new studies and previous records that have been missed) and existing data or metadata are extended or, as necessary, corrected. Readers who know of missing datasets are asked to submit them directly through http://lipd.net/playground (last access: 30 July 2020). Database users who find errors in individual datasets can submit proposed edits using the "Edit LiPD file" function at http://lipdverse.org/iso2k/current_version/ (last access: 30 July 2020), or they can use the "Report an issue" option for errors that apply to multiple datasets. More detailed instructions for dataset submission and a link to a LiPD entry template hosted through http://lipd. net/playground (last access: 30 July 2020) will be added to the WDS-NOAA landing page (https://www.ncdc.noaa.gov/ paleo/study/29593, last access: 30 July 2020) when they become available.
As the database updates, it will be versioned following the scheme used by other PAGES data collections (Kaufman et al., 2020;McKay and Kaufman, 2014;PAGES 2k Consortium, 2013, with the following format: X1.X2.X3, where X1, X2, and X3 are incrementing integers. When X1 increases, X2 and X3 reset to zero. When X2 increases, X3 resets to zero. X1 represents the number of publications describing the database. X2 increments each time the set of records in the database changes (addition or removal of a dataset). X3 increments when the data or metadata within the dataset change, but the set of records remains the same. Upon updates, extensions, or corrections to the database, rather than issuing errata to this publication, changes will be included in subsequent versions of the database and updated and described through the online data repository.

Citation
This Iso2k data descriptor should be cited when the database is used in whole or in part, including its metadata fields, for subsequent studies. We encourage users of the database to not only cite the Iso2k data product but also the original publications of the underlying primary data (Tables S2 and S3). Citation of both the Iso2k data product and the underlying studies is particularly encouraged when analyses make explicit use of individual records or small subsets of records, even though citation of > 400 original studies may not be practical if the entire Iso2k database is used.

Code and data availability
Following the previous PAGES 2k and the Temperature 12k data compilations (Kaufman et al., 2020;PAGES 2k Consortium, 2017), the Iso2k database employs the Linked Paleo Data (LiPD) format (McKay and Emile-Geay, 2016), with serializations available for R, MATLAB, and Python. The LiPD format is machine-readable, with code bases to facilitate input, output, visualization, and data manipulation in R, Python, and MATLAB. Simple visualization and data access (both as LiPD and csv files) is available through the LiPDverse at http://lipdverse.org/iso2k/ current_version/ (last access: 30 July 2020) (iso2k1_0_0, 2020). The LiPDverse additionally houses other paleoclimate records and compilations that may be of interest to users of the Iso2k database. The serializations contain all LiPD files included in the current version of the Iso2k database. Serializations of the database can be downloaded from https://doi.org/10.25921/57j8-vs18 (Konecky and McKay, 2020) and from the WDS-NOAA Paleo Data landing page: https://www.ncdc.noaa.gov/paleo/study/29593 (last access: 30 July 2020) (NOAA, 2020). We recommend accessing the database through the WDS-NOAA landing page in order to find up-to-date instructions on using the database.

Conclusions and anticipated applications of the Iso2k database
The global extent, quantity, and quality of metadata included in the Iso2k database allow examination of the multiple variables that impact water isotopes, including moisture source and transport history, temperature, and precipitation amount. These multivariate controls mean that water isotopes contain a wealth of information about climate. Importantly, water isotope signals contained in proxy archives can be modified by local environmental processes, such as evaporation, biosynthetic fractionation, bioturbation in sediments, or diffusion. These archive-or proxy-specific transformations therefore additionally allow for reconstruction of water balance (P − E), different forms of drought (e.g., meteorological, hydrological, or soil moisture), and relative humidity (Rach et al., 2017). It is difficult to tease apart the effects of multiple variables in a single proxy record, but this global compilation of water isotope proxy records from a range of archives will help to overcome this barrier, facilitating extraction of common signals from the noise of individual proxies and providing insights into different aspects of the hydrological cycle at a range of spatial and temporal scales. The Iso2k database also provides an unprecedented direct comparison for state-of-the-art water-isotope-enabled climate models. Many data-model comparison efforts com-pare climate model variables such as temperature and precipitation to paleoclimate data; the latter is often a complex and nonlinear signal integration of multiple climate influences, and uncertainties arise from the assumptions that must be made (Dee et al., 2016;Evans et al., 2013). Comparing water isotope fields from climate model outputs to isotope proxy records of the same components of the water cycle circumvents these uncertainties, providing a more direct comparison of proxies and model simulations in the same units. Model validation on this relatively level playing field will improve estimates of climate models' ability to simulate changes in hydroclimate on long timescales. For those archives that further filter the isotopic signal, proxy system models can aid data model comparison (Dee et al., 2015Jones and Dee, 2018). Therefore, the Iso2k database will not only enable global-scale comparisons with isotope-enabled climate models but may also serve as an input database for paleoclimate data assimilation reconstructions such as the Last Millennium Reanalysis Steiger et al., 2014) and the Paleo Hydrodynamics Data Assimilation (Steiger et al., 2018).