The SISAL database: a global resource to document oxygen and carbon isotope records from speleothems

. Stable isotope records from speleothems provide information on past climate changes, most particularly information that can be used to reconstruct past changes in precipitation and atmospheric circulation. These records are increasingly being used to provide “out-of-sample” evaluations of isotope-enabled climate models. SISAL (Speleothem Isotope Synthesis and Analysis) is an international working group of the Past Global Changes (PAGES) project. The working group aims to provide a comprehensive compilation of speleothem isotope records for climate reconstruction and model evaluation. The SISAL database contains data for individual speleothems, grouped by cave system. Stable isotopes of oxygen and carbon ( δ 18 O, δ 13 C) measurements are referenced by distance from the top or bottom of the speleothem. Additional tables provide information on dating, including information on the dates used to construct the original age model and sufﬁcient information to assess the quality of each data set and to erect a standardized chronology across different speleothems. The metadata table provides location information, information on the full range of measurements carried out on each speleothem and information on the cave system that is relevant to the interpretation of the records, as well as citations for both publications and archived data. The compiled data are available at https://doi.org/10.17864/1947.147.


Introduction
Speleothems are inorganic carbonate deposits (mostly calcite and aragonite) that grow in caves and form from drip water supersaturated with respect to CaCO 3 . Speleothems are highly suitable for radiometric dating using uranium-series disequilibrium techniques. Since they form through continuous accretion, speleothems can provide a highly resolved record of environmental conditions, generally with a temporal resolution ranging from seasonal scale to 100 years, depending on sampling resolution.
Speleothem records are one of the types of record widely used to reconstruct past climate changes (see Bradley, 2015 for an overview of other methods). Speleothem growth is, in itself, an indicator of precipitation availability (Ayliffe et al., 1998;Wang et al., 2004), and variations in annual growth increments have been interpreted as an index of precipitation amount (Fleitmann et al., 2004;Polyak and Asmerom, 2001;Trouet et al., 2009). Many different types of measurements have been made on speleothems, but the most common are the stable isotopes of oxygen and carbon (δ 18 O, δ 13 C). Although the interpretation of such records can be complicated, for samples that are deposited close to equilibrium, changes in δ 18 O are primarily a signal of changes in precipitation amount and source, precipitation temperature, and cave temperature (Affek et al., 2014;Hu et al., 2008;McDermott, 2004;Wang et al., 2008) and have been widely used to reconstruct changing atmospheric circulation patterns (e.g. Bar-Matthews et al., 1999;Cai et al., 2012Cai et al., , 2015Luetscher et al., 2015;Spötl and Mangini, 2002;Trouet et al., 2009). Changes in δ 13 C are a more indirect signal of precipitation changes. If not affected by non-equilibrium deposition (Baker et al., 1997), δ 13 C can reflect the changing abundance of C 3 and C 4 plants above the cave (Baldini et al., 2008;Dorale et al., 1998) or the availability of soil CO 2 during the dissolution of limestone (Genty et al., 2003;Hendy, 1971;Salomons and Mook, 1986). Speleothem records are widely distributed geographically, and this makes them an ideal type of archive for regional climate reconstructions.
An increasing number of climate models explicitly simulate water isotopes as a tool for characterizing and diagnosing the atmospheric hydrological cycle (Schmidt et al., 2007;Steen-Larsen et al., 2017;Sturm et al., 2010;Werner et al., 2011;Haese et al., 2013). Such models are evaluated against modern observations of the isotopic composition of rainwater (see for example Yoshimura et al., 2008;Steen-Larsen et al., 2017). However, evaluations against palaeo-records such as the δ 18 O records from speleothems can be used to provide an "out-of-sample" test (Schmidt et al., 2014) of these models. Thus, in addition to their use for climate reconstruction, speleothem records are a useful addition to the tools that are used for climate-model evaluation.
More than 500 speleothem data sets have been published to date, 70 % of which have been published in the decade since 2007. There have been some attempts to provide syntheses of speleothem data, particularly in the context of providing climate reconstructions or data sets for model evaluation (e.g. Bolliet et al., 2016;Caley et al., 2014;Harrison et al., 2014;Shah et al., 2013). However, these compilations generally lack sufficient information to allow careful screening of the records to ensure the reliability of the climate interpretation or the quality of the dating of the record. Furthermore, none of them provide comprehensive coverage of the globe. Table 1. Information on speleothem records (entities) in the SISAL_v1 database. Elevation (Elv) is given in metres above sea level and latitude (Lat) and longitude (Long) in decimal degrees. For convenience, we have given the location of each record, although this information is not available in the database itself. Note that the latitude and longitude of Abaliget, Brown's cave and Uamh an Tartair are given correctly here but are incorrect in the SISAL_v1 database. SISAL (Speleothem Isotope Synthesis and Analysis) is an international working group set up in 2017 under the auspices of the Past Global Changes (PAGES) programme (http: //pastglobalchanges.org/ini/wg/sisal, last access: 4 September 2018). The aim of the working group is to compile the many hundreds of speleothem isotopic records worldwide, paying due attention to careful screening and metadata documentation, the construction of standardized age models, and age-model uncertainties, in order to produce a public-access database that can be used for palaeoclimate reconstruction and for climate-model evaluation. In this paper, we document the first publicly available version of the SISAL database, focusing on describing its structure and contents including the information that has been included to facilitate quality control.

Compilation of data
The database contains stable carbon and oxygen isotope measurements made on speleothems, as well as supporting metadata to facilitate the interpretation of these records. All available speleothem data are included, and no attempt was made to screen records on the basis of the time period covered, the resolution of the records, or the quality of the data or age models. Adequate metadata are provided to allow database users to select the records that are suitable for a particular type of analysis. The raw data were either provided by members of the SISAL working group or extracted from data lodged in PANGAEA or from the National Centres for Environmental Information. Additional information on the records was compiled from publications. All the records in the current version of the database (SISAL_v1) are listed and described in Table 1.

Structure of the database
The data are stored in a relational database (MySQL), which consists of 14 linked tables, specifically site, entity, sample, dating, dating lamina, gap, hiatus, original chronology, δ 13 C, δ 18 O, entity link reference, reference, composite link entity and notes. Figure S1 shows the relationships between these tables. A detailed description of the structure and content of each of the tables is given below. The details of the predefined lists for all fields can be found in Table S1.

Site metadata (table name: site)
A site is defined as the cave or cave system from which speleothem records have been obtained. A site may therefore be linked to several speleothem records, where each record is treated as a separate entity. The site table contains basic metadata about the cave or cave system, including site ID, site name, latitude, longitude, elevation, geology, rock age and monitoring (see Table S2). The elevation is that of the cave itself, not the elevation of the land surface above the cave. Since the elevation of the land surface can be obtained from other sources, we include the cave elevation to facilitate making additional lapse rate corrections for oxygen isotopes for high elevation sites (Bowen and Wilkinson, 2002). This also allows an estimation of the depth of the overburden above the speleothem site, and hence an estimate of the time taken for water to reach the cave. The description of the geology and the age of the rock formation (rock_age) is given because this is important for understanding the degree of permeability of the material above the cave. Primary porosity decreases and fracture flow increases as rocks age, which in turn affects the likely speed at which water flows through the host rock and reaches the cave system. The geology field is also useful because it gives an indication of whether the cave is formed in Mg-rich rocks (e.g. dolomite) and thus whether the speleothems are likely to be formed of aragonite, which would require special consideration in terms of oxygen and carbon isotope comparisons with that of calcite (see also Table S3). Only a limited number of descriptive terms are allowed for each field. The age of rock formation follows the standard era, period, epoch terminology as defined by the International Commission on Stratigraphy in 2015 (Cohen et al., 2015). The database includes information on whether the cave site has been monitored: positive returns in this field mean that monitoring of in-cave environmental parameters (e.g. cave air temperature) and/or cave drip chemistry has been carried out periodically for at least one entire season (as opposed to one-off measurements of in-cave conditions when the speleothem was collected). The database does not contain monitoring data, but inclusion of this field facilitates researchers being able to contact the original data providers about monitoring information, which can be useful in understanding if a cave is likely to contain speleothems that have been deposited close to isotopic equilibrium.

Entity metadata (table name: entity)
Each speleothem (or composite speleothem record) has a unique identifier and a unique name. The entity metadata table (Table S3) provides information on the cave environment that can affect speleothem formation. This includes the thickness of the cover above the speleothem, which might affect the time taken for water to reach the drip site feeding the speleothem and hence the responsiveness of the record to individual rain events or seasonal patterns of precipitation . The distance of the speleothem from the cave entrance is provided, which, depending on the morphology of a cave, can be a useful indicator of cave ventilation (direct air advection). Ventilation is important as it can control cave air temperature, humidity, evaporation and pCO 2 levels Frisia et al., 2011;Spötl et al., 2005;Tremaine et al., 2011). The entity table also contains a field to document whether any tests have been carried out to establish whether there is oxygen and carbon isotope quasi-equilibrium between the drip water (CO 2 -H 2 O system) and the speleothem (CaCO 3 ). There are several such tests (see for example Hendy, 1971;Johnston et al., 2013;Mickler et al., 2006;Tremaine et al., 2011), but no attempt is made to identify which test has been applied in the database. The drip type (e.g. seepage flow, seasonal drip, vadose flow; Smart and Friederich, 1987) also provides useful hydrological information: seepage flow shows a small inter-annual variability of discharge and the speleothem record will therefore more likely reflect a long-term average state over several years; other drip types, such as seasonal drip, will indicate the potential to record seasonal or individual rainfall events.
The main focus of the SISAL database is stable isotope measurements, but the entity metadata table also contains information on the kinds of measurements that have been made for a specific speleothem. Only the stable isotope measurements are currently archived in the database. However, listing the range of data available from any speleothem will facilitate future updates of the database to include other types of measurements apart from stable isotopes (i.e. trace elements) and will help researchers in locating speleothems where multiple types of measurements have been done. The entity metadata table contains four fields to facilitate data traceability. The first two fields give the name of the person who was responsible for collating the data, and a DOI or URL for the original data. Some records have been partially or entirely updated since first being published. Although these records are included for data traceability, the entity_status field indicates whether they have been partially updated (e.g. with additional samples and/or improved age models) or completely superseded by a new record. The field corresponding_current indicates which entity (or entities) provides updated information. Information on original publications on specific speleothems is given in the reference table (see Sect. 2.2.11).

Sample metadata (table name: sample)
The sample metadata table (Table S4) contains information on the location of the sample with respect to a reference point, where the reference point can be either the top or the base of the speleothem. It also provides information on the thickness and mineralogy (calcite, secondary calcite, aragonite, vaterite, mixed, not known) of each sample. Since some samples may have mixed mineralogy, it also provides information on whether a correction for aragonite has been applied to δ 18 O or δ 13 C, due to different phosphoric acid fractionation factors.

Dating information (table name: dating)
The dating information table (Table S5) provides information on the radiometric dates used to construct the original age model for each of the speleothem entities, including type of radiometric date (e.g. U series), depth of dated sample, thickness of dated sample and sample weight. Dates that are used to anchor sequences that are dated by lamina counting (see Sect. 2.2.5) are included in the dating information table and identified in date type as an event (i.e. the start or end of a laminated sequence). The degree of precision varies between different dating methods and techniques, for example mass spectrometric U/Th dating generally produces a more precise age than the alpha spectrometry U/Th data method. So the inclusion of the dating method provides a basic measure of the reliability, in terms of analytical precision, of any given date. Sample thickness also influences the dating uncertainty, because thicker samples will integrate more material of different age. Similarly, sample weight can influence precision: samples younger than a few thousand years may contain very low levels of the daughter isotope 230 Th (whose accumulation by radioactive decay provides the measure of the sample's age), and so require more material to provide an accurate and precise result. The content of 232 Th is included in the dating information table because this value is used for the detrital correction of initial 230 Th. Sample mineralogy is also included because this affects the reliability of individual dates (e.g. samples from re-crystallized secondary calcite are not reliable because of the loss of uranium; Bajo et al., 2016).
We provide both the original uncorrected age and the corrected age for each date. The corrected U/Th age is adjusted for detrital contamination; the corrected calibrated 14 C age is adjusted for dead carbon. The correction factors used to derive the corrected U/Th or 14 C age are included in the dating information table. The decay constant used to calculate the U/Th ages is given because the values used have changed through time (Cheng et al., 2000Edwards et al., 1987;Ivanovich and Harmon, 1992). The calibration curve used to convert radiocarbon ages to calendar years in the original publication is also given. Several different standards have been used in the original publications for the modern reference state (e.g. BP(1950), b2k, CE / BCE or the year when U/Th chemical separation was performed) but all of these have been converted to BP(1950) in the database.
Some of the dates listed for a given entity were not used in the original age model, for example because the dating sample was contaminated with organic material or because of age inversions. The dates excluded from the original age model are flagged in the database (date_used) but the other information on these dates is nevertheless included in the dating information table to ensure transparency.
The geochemical characteristics of the sample provide information that is required to assess the quality or reliability of these dates. The ratio of 230 Th / 232 Th, for example, is a measure of detrital thorium concentration in the sample and thus provides an initial quality control on each date. A 230 Th / 232 Th activity ratio > 300 is considered a good indicator of a reliable date (Hellstrom, 2006); a higher ratio indicates a cleaner sample with higher accuracy. The thorium corrected errors are also included to provide an indication of the magnitude of the correction related to detrital thorium contamination. (table name: dating_lamina)

Lamina dating information
Variations in the drip-water geochemistry and/or quantity or cave conditions may occur at regular intervals, forming laminae of a range of thicknesses usually linked to surface seasonal climate variations . A highresolution chronology can be established for such records by lamina counting, provided an absolute date is available for either the start or the end of the laminated sequence (e.g. because U/Th dates have been obtained or because the stalagmite was actively growing when collected). The identification of individual lamina can be difficult if they are very thin or of varying width, so best practice is to provide an estimate of the counting uncertainty that propagates from the absolute anchor dates. The lamina dating information table (Table S6) provides the age of each lamina in the sequence and the uncertainty on this dating; the absolute dates used as anchor points are given in the dating information table and identified in the date type field there as an event (see Sect. 2.2.4).
It should be noted that laminae can be formed on a variety of timescales, depending on the frequency that the thresholds for the formation of specific fabrics and mineralogies are crossed. Annual laminations are more likely in regions where there is a clear seasonality in climate or cave environment. In other regions, the lamination may be a result of lower-or higher-frequency variations in, for example, hydrologically effective precipitation (e.g. infiltrated waters) or soil CO 2 production. It is imperative to demonstrate that the laminations are annual (see Table S9) before using lamina counting for dating.

Hiatus place mark information (table name: hiatus)
A prolonged cessation of speleothem growth can occur under unfavourable environmental conditions leading to, for example, undersaturation of drip water or cessation of dripping. Growth hiatuses can often be recognized from structural or mineralogical features, or inferred based on absolute dating. Growth hiatuses have to be taken into account in the construction of age models and thus the hiatus place mark table (Table S7) provides information on the location of such features. The hiatus is referenced to the specific depth at which it occurs, and this depth is considered as an imaginary sample that then appears with a specific sample_id in the sample table. There are some cases in which the hiatus depth was not recorded; in these cases the depth was specified as the imaginary mid-point depth between bracketing samples.

Gap place mark information (table name: gap)
When a composite record is created based on more than one individual speleothem from the same cave system, there may be discontinuities in the overlapping time of the individual records. These gaps are not growth hiatuses, but must be identified to facilitate plotting of the records. The gap place mark information table (Table S8) provides information on the location of sample gaps. The gap is referenced to the specific depth at which it occurs, and this depth is considered as an imaginary sample which then appears with a specific sample_id in the sample table. In composite records where sample depths are not given, the location of a gap can be derived from the sample ordering and the absence of isotopic information for a given sample. In point of fact, this table is empty in version 1 of the database. (table name: original_chronology)

Original chronology
The original chronology table (Table S9) provides an estimate of the age and age uncertainty, according to the original published age model for each sample on which stable isotope measurements have been made. The table also provides information on the type of age model (e.g. linear interpolation between dates, polynomial fit, Bayesian, StalAge; Scholz and Hoffmann, 2011, COPRA;Breitenbach et al., 2012, Ox-Cal;Bronk Ramsey, 2001 used in the original publication. The fields ann_lam_check and dep_rate_check are included for quality assurance purposes, since they indicate that the assumption that laminae are truly annual has been explicitly tested.

Carbon isotope data (table name: d13C)
The carbon isotope data table (Table S10) contains the carbon isotope measurements. It also provides information on the laboratory precision of each measurement and the standard (PDB or Vienna-PDB) used as a reference.

Oxygen isotope data (table name: d18O)
The oxygen isotope data table (Table S11) contains the oxygen isotope measurements. It also provides information on the laboratory precision of each measurement and the standard (PDB or Vienna-PDB) used as a reference.

Publication information (table name: reference)
This table (Table S12) provides full bibliographic citations for the original references documenting the speleothems, their isotopic records and/or their age models. References on monitoring of the cave may also be provided. There may be multiple publications for a single speleothem record, and all of these references are listed. For convenience, there is also a table (Table S13) that links the publications to the specific entity.

Link composite and entity information (table name: composite_link_entity)
Multiple speleothem records showing a temporal overlap (and a similar signal) can be combined to create a composite record of changes through time. The composite record is treated as a distinct entity in the database. The link composite and entity information table (Table S14) is provided in order to be able to link this composite record to the individual speleothem records from which it was derived. Thus any single composite entity (composite_entity_id) is linked to multiple single entities (single_entity_id)

Notes (table name: notes)
The notes table (Table S15) is provided in order to record additional information regarding the site which cannot be recorded in the fields of the table; this may also include entity specific information.

Quality control
Individual records in the SISAL database were compiled either by the original authors or from published and openaccess material by specialists in the collection and interpretation of speleothem records. In this latter case, the data compilers made every attempt to contact original authors to check that the compiled data were correct. The name of the person who compiled the data is included in the database (entity table, contact) so that they can be consulted in the future about queries or corrections. Individual records for the database were subsequently checked by a small number of regional coordinators, to ensure that records were being entered in a consistent way. Prior to entry in the database, the records were automatically checked using specially designed database scripts (in Python) to ensure that the entries to individual fields were in the format expected (e.g. text, decimal numeric, positive integers) or were selected from the predefined lists provided for specific fields. In defining both the formats and the pre-defined lists, the SISAL working group has taken special care to ensure that the entries are unambiguous. Null values for metadata fields were identified during the checking procedure, and checks were made with the data contributors whenever possible to ensure that null fields genuinely corresponded to missing information. The database contains information designed to allow an assessment of the quality of an individual record. Thus, the entity metadata table contains information on, for example, the distance of the speleothem from the cave entrance in order to allow the user to assess whether cave temperatures are driven by advection of air or conduction through the bedrock. There are several other factors that can affect ventilation, for example the contrast between the cave and external climates, and cave morphology such as the size of the entrance or the number of entrances. Information on these factors is only rarely given in publications; we assume that this information would be more likely to be available if the original authors thought that ventilation was a significant influence on the speleothem record. Including information on distance from the cave entrance is therefore being regarded as a minimal indicator for record quality. Other fields that are included to allow the user to select appropriate records include geology, rock age, speleothem type and drip type.
The database also contains information to allow an assessment of the reliability of the dates used in constructing the original age model. The most important of these fields are those with information on the sample geochemistry (see Sect. 2.2.4), which allows the user to determine whether the samples were sufficiently large and sufficiently pure to yield good U/Th dates. The database also gives information on sample weight, which also addresses this issue. The information on the corrections employed, dating uncertainties and whether the original authors considered the date reliable (and therefore used it in constructing an age model) also provide insights into the reliability of individual chronologies.
The SISAL database is an ongoing effort and continuing efforts to update the records will include updating missing data fields for individual records. Analysis of the data is also useful for verification purposes and may result in corrections of some data. Any such changes to sites and entities included in version 1 of the database will be documented in subsequent updates. The SISAL working group also aims to provide new chronologies in future versions of the database based on Bayesian approaches, namely OxCal (Bronk Ramsey, 1995, COPRA  and Sta-lAge (Scholz and Hoffmann, 2011).

Overview of contents
The first version of the SISAL database contains 211 022 δ 18 O measurements, 127 115 δ 13 C measurements from 371 speleothem records and 10 composites from 174 cave systems. This represents approximately 58 % of published speleothem records we have identified. Of the 371 speleothem records, 8 have been superseded, 7 provide information for which there are also updates or additional information recorded as separate entities, 6 have dating information but no isotopic records because the individual entities were only used to construct composite records, and 15 do not have age models. The database also contains 6 records that have not been published.
The distribution of sites is global in extent (Fig. 1). The majority (30 %) of the sites are from Europe (53 sites) and there is currently less good representation of sites from other regions. The temporal distribution of records is excellent for the past 2000 years (Fig. 2a) and good for the past 22 000 years (Fig. 2b). Altogether, 142 entities record some  part of the past 2000 years, 87 of which have a resolution ≤ 10 years between samples on average. There are 253 entities recording some part of the past 22 000 years, including 153 with a resolution of ≤ 100 years between samples on average. The database contains 42 entities from the last interglacial period (115 000 to 130 000 years before present), 41 of which have a resolution of ≤ 1000 years between samples on average (Fig. 2c).

Conclusions
The SISAL database is based on a community effort to compile isotopic measurements from speleothems to facilitate palaeoclimate analysis. Considerable effort has been made to ensure that there is adequate metadata and quality control information to allow the selection of records appropriate to answer specific questions and to document the uncertainties in the interpretation of these records. The database is publicly available.
The first version of the SISAL database contains 211 022 δ 18 O measurements and 127 115 δ 13 C measurements from 371 individual speleothem records, and 10 composites from 174 cave systems. The distribution of sites is global in extent. The temporal distribution is excellent for the past 2000 years and good for the past 22 000 years. There are also records that span the last interglacial period.
The format of the database is designed to facilitate the use of the data for regional-to continental-scale analyses, and in particular to facilitate comparisons with and evaluation of isotope-enabled climate model simulations. The SISAL working group will continue to expand the coverage of the SISAL database and will provide new chronologies based on standardized age models; subsequent versions of the database will be made freely available to the community.
The Supplement related to this article is available online at https://doi.org/10.5194/essd-10-1687-2018supplement. Author contributions. LCB is the coordinator of the SISAL working group. KA, SPH, LCB, MD and AB designed the database, drawing on discussions with participants at the first SISAL workshop. KA, SPH, LCB and SAM were responsible for the database construction. NK, SMA, MA, YAB, PB, KB, YB, SC, WD, IGH, JH, ZK, IL, ML, FL, AL, CP-M, RP and NS coordinated the regional data collection. All authors listed as "SISAL Working Group members" provided data or helped to complete data entry. TA and DG contributed original unpublished data to the database. The first draft of the paper was written by KA, SPH, LCB, MB, AB and NK and all authors contributed to the final version.
Competing interests. The authors declare that they have no conflict of interest.
Acknowledgements. SISAL (Speleothem Isotopes Synthesis and Analysis) is a working group of the Past Global Changes (PAGES) programme. We thank PAGES for their support for this activity. Additional financial support for SISAL activities has been provided by the European Geosciences Union (EGU TE Winter call, grant number W2017/413), Irish Centre for Research in Applied Geosciences (iCRAG), European Association of Geochemistry (Early Career Ambassadors program 2017), Geological Survey Ireland, Quaternary Research Association UK, Navarino Environmental Observatory, Stockholm University, Savillex, John Cantle, University of Reading and University College Dublin (Seed Funding award, grant number SF1428). The design and creation of the database has been supported by funding to Sandy P. Harrison from the ERC-funded project GC2.0 (Global Change 2.0: Unlocking the past for a clearer future, grant number 694481) and by funding to Laia Comas-Bru from the Geological Survey Ireland Short Call 2017 (Developing a toolkit for model evaluation using speleothem isotope data, grant number 2017-SC-056). Sandy P. Harrison also acknowledges funding from the JPI-Belmont project "PAlaeo-Constraints on Monsoon Evolution and Dynamics (PACMEDY)" through the UK Natural Environmental Research Council (NERC). We thank SISAL members who contributed their published data to the database and provided additional information when necessary.
Edited by: David Carlson Reviewed by: two anonymous referees