SISALv2: a comprehensive speleothem isotope database with multiple age–depth models

. Characterizing the temporal uncertainty in palaeoclimate records is crucial for analysing past climate change, correlating climate events between records, assessing climate periodicities, identifying potential triggers and evaluating climate model simulations. The ﬁrst global compilation of speleothem isotope records by the SISAL (Speleothem Isotope Synthesis and Analysis) working group showed that age model uncertainties are not systematically reported in the published literature, and these are only available for a limited number of records (ca. 15 %, n = 107 / 691). To improve the usefulness of the SISAL database, we have (i) improved the database’s spatio-temporal coverage and (ii) created new chronologies using seven different approaches for age– depth modelling. We have applied these alternative chronologies to the records from the ﬁrst version of the SISAL database (SISALv1) and to new records compiled since the release of SISALv1. This paper documents the necessary changes in the structure of the SISAL database to accommodate the inclusion of the new age models and their uncertainties as well as the expansion of the database to include new records and the quality-control measures applied. This paper also documents the age–depth model approaches used to calculate the new chronologies. The updated version of the SISAL database (SISALv2) contains isotopic data from 691 speleothem records from 294 cave sites and new age–depth models, including age–depth temporal uncertainties for 512 speleothems. SISALv2 is available at https://doi.org/10.17864/1947.256 (Comas-Bru et al., 2020a).


Introduction
Speleothems are a rich terrestrial palaeoclimate archive that forms from infiltrating rainwater after it percolates through the soil, epikarst and carbonate bedrock. In particular, stable oxygen and carbon isotope (δ 18 O, δ 13 C) measurements made on speleothems have been widely used to reconstruct regional and local hydroclimate changes.
The Speleothem Isotope Synthesis and Analyses (SISAL) working group is an international effort under the auspices of Past Global Changes (PAGES) to compile speleothem isotopic records globally for the analysis of past climates . The first version of the SISAL database (Atsawawaranunt et al., 2018a, b) contained 381 speleothem records from 174 cave sites and has been used for analysing regional climate changes (Braun et al., 2019a;Burstyn et al., 2019;Deininger et al., 2019;Kaushal et al., 2018;Kern et al., 2019;Oster et al., 2019;Zhang et al., 2019). The potential for using the SISAL database to evaluate climate models was explored using an updated version of the database (SISALv1b; Atsawawaranunt et al., 2019) that contains 455 speleothem records from 211 sites .
SISAL is continuing to expand the global database by including new records (Comas-Bru et al., 2020a). Although most of the records in SISALv2 (79.7 %; Fig. 1a) have been dated using the generally very precise, absolute radiometric 230 Th/U dating method, a variety of age-modelling approaches were employed (Fig. 1b) in constructing the orig-inal records. The vast majority of records provide no information on the uncertainty of the age-depth relationship. However, many of the regional studies using SISAL pointed to the limited statistical power of analyses of speleothem records because of the lack of temporal uncertainties. For example, these missing uncertainties prevented the extraction of underlying climate modes during the last 2000 years in Europe . To overcome this limitation, we have developed additional age-depth models for the SISALv2 records ( Fig. 2) in order to provide robust chronologies with temporal uncertainties. The results of the various age-depth modelling approaches differ because of differences in their underlying assumptions. We have used seven alternative methods: linear interpolation, linear regression, Bchron (Haslett and Parnell, 2008), Bacon (Blaauw and Christen, 2011;Blaauw et al., 2019), OxCal (Bronk Ramsey, 2008Bronk Ramsey and Lee, 2013), COPRA (Breitenbach et al., 2012) and StalAge (Scholz and Hoffmann, 2011). Comparison of these different approaches provides a robust measure of the age uncertainty associated with any specific speleothem record.

Construction of age-depth models: the SISAL chronology
We attempted to construct age-depth models for 533 entities in an automated mode. For eight records, this automated construction failed for all methods. For these records we provide L. Comas-Bru et al.: SISALv2: a comprehensive speleothem isotope database with multiple age-depth models manually constructed chronologies where no age model previously existed and added a note in the database with details on the construction procedure. Age models for 21 records were successfully computed but later dropped in the screening process due to inconsistent information or incompatibility for an automated routine. In total, we provide additional chronologies for 512 speleothem records in SISALv2. The SISAL chronology provides alternative age-depth models for SISAL records that are not composites (i.e. time series based on more than one speleothem record), that have not been superseded in the database by a newer entity and which are purely 230 Th/U dated. We therefore excluded records for which the chronology is based on lamina counting, radiocarbon ages or a combination of methods. This decision was based on the low uncertainties of the age-depth models based on lamina counting and the challenge of reproducing age-depth models based on radiocarbon ages. We made an exception with the case of entity_id 163 (Talma et al., 1992), which covers two key periods -the mid-Holocene and the Last Glacial Maximum -at high temporal resolution. In this case, we calculated a new SISAL chronology based on the provided 230 Th/U dates but did not consider the uncorrected 14 C ages upon which the original age-depth model is based. We also excluded records for which isotopic data are not available (i.e. entities that are part of composites) and entities that are constrained by less than three dates. Additionally, the dating information for 23 entities shows hiatuses at the top and bottom of the speleothem that are not constrained by any date. For these records, we partially masked the new chronologies to remove the unconstrained section(s). Original dates were used without modification in the age-depth modelling.
To allow a comprehensive cross-examination of uncertainties, seven age-depth modelling techniques were implemented here across all selected records. Due to the high number of records (n = 533), all methods were run in batch mode. A preliminary study using the database version v1b demonstrated the feasibility of the automated construction and evaluation of age-depth models using a subset of records and methods (Roesch and . Further details on the evaluation of the updated age-depth models are provided in Sect. 3.2. The seven different methods are briefly described below. All methods assume that growth occurred along a single growth axis. For one entity, where it was previously known that two growth axes exist, we added an explanatory statement in the database. All approaches except StalAge produce Monte Carlo (MC) iterations of the age-depth models. We aimed to provide 1000 MC iterations for each new SISALv2 chronology at https://doi.org/10.5281/zenodo.3816804 , but this was not always possible because some records (n = 12) yield a substantial number of non-monotonic ensembles that were not kept.
Major challenges arise through hiatuses (growth interruptions) and age reversals. We developed a workflow to deal with records with known hiatuses that allowed the construction of age-depth models for 20 % of the records with one or more hiatuses (Roesch and Rehfeld, 2019; details below for each age-depth modelling technique). Regarding the age reversals, we distinguish between tractable reversals (with overlapping confidence intervals) and non-tractable reversals (i.e. where the 2-sigma dating uncertainties do not overlap) following the definition of Breitenbach et al. (2012). Details such as the hiatus treatment and outlier age modification are recorded in a log file created when running the age models. We followed the original author's choices regarding date usage. If an age was marked as "not used" or "usage unknown", we did not consider this in the construction of the new chronologies except in OxCal, where dates with "usage unknown" were considered.
1. Linear interpolation (lin_interp_age) between radiometric dates is the classic approach for age-depth model construction for palaeoclimate archives and was used in 32.1 % of the original age-depth models in SISALv2.
Here, we extend this approach and calculate the age uncertainty by sampling the range of uncertainty of each 230 Th/U age 2000 times, assuming a Gaussian distribution. This approach is consistent with the implementation of linear interpolation in CLAM (Blaauw, 2010) and COPRA (Breitenbach et al., 2012). Linear interpolation was implemented in R (R Core Team, 2019), using the approxExtrap() function in the Hmisc package. We included an automated reversal check that increases the dating uncertainties until a monotonic age model is achieved, similar to that of StalAge (Scholz and Hoffmann, 2011). Hiatuses are modelled following the approach of Roesch and , where rather than modelling each segment separately, synthetic ages with uncertainties spanning the entire hiatus duration are introduced for use in age-depth model construction. These synthetic ages are removed after agedepth model construction. Linear interpolation was applied to 80 % (n = 408/512) of the SISAL records for which new chronologies were developed.
2. Linear regression (lin_reg_age) provides a single bestfit line through all available radiometric ages assuming a constant growth rate. Linear regression was used in 6.7 % of the original SISALv2 age models. As with linear interpolation, age uncertainties are based on randomly sampling the U-series dates to produce 2000 age-depth models (i.e. ensembles). Temporal uncertainties are then given by the uncertainty of the medianbased fit to each ensemble member. If hiatuses are present, the segments in-between were split at the depth of the hiatus without an artificial age. The method is implemented in R using the lm() function from the base package. Linear regression was applied to 36 % (n = 185/512) of the SISAL records for which new chronologies were developed.
3. Bchron (Bchron_age) is a Bayesian method based on a continuous Markov processes (Haslett and Parnell, 2008) and is available as an R package (Parnell, 2018). This method was originally used for only one speleothem record in SISALv2. Since Bchron cannot handle hiatuses, we implemented a new workflow that adds synthetic ages with uncertainties spanning the entire hiatus duration (Roesch and Rehfeld, 2019), as performed with linear interpolation, StalAge and our implementation of COPRA. Bchron provides age-depth model ensembles, of which we have kept the last 2000. We calculate the age uncertainties from the spread of the individual ensembles. Here we use the function bchron() with jitter.positions = true to mitigate problems due to rounded-off depth values. This method has been applied to 83 % (n = 426/512) of the SISAL records for which new chronologies were developed.
4. Bacon (Bacon_age) is a semi-parametric Bayesian method based on autoregressive gamma processes (Blaauw and Christen, 2011;Blaauw et al., 2019). It was used in three of the original chronologies in SISALv2. The R package rBacon can handle both outliers and hiatuses, and apart from giving the median age-depth model, it also returns the Monte Carlo realizations (i.e. ensembles), from which the median age-depth model is calculated. During the creation of the SISAL chronologies, the existing rBacon package (version 2.3.9.1) was updated to improve the handling of stalagmite growth rates and hiatuses. We use this revised version, available on CRAN (https://cran. r-project.org/web/packages/rbacon/index.html, last access: 31 January 2020), to provide a median age-depth model and an ensemble of age model realizations for 65 % (n = 335/512) of the SISAL records for which new chronologies were developed.
5. OxCal (Oxcal_age) is a Bayesian chronological modelling tool that uses Markov chain Monte Carlo (Bronk Ramsey, 2009). This method was used in 4.1 % of the original SISALv2 chronologies. OxCal can deal with hiatuses and outliers and accounts for the non-uniform nature of the deposition process (Poisson process using the P_Sequence command). Here we used the analysis module of OxCal version 4.3 with a default initial interpolation rate value of 1 and an initial model rigidity (k) value of k 0 = 1 with a uniform distribution from 0.01 to 100 for the range of k/k 0 (log 10(k/k 0 ) = (−2, 2)) (Christopher Bronk Ramsey, personal communication, 2019). The initial value of the interpolation rate determines the number of points between any two dates for which an age will be calculated. We subsequently linearly interpolated the age-depth model to the depths of individual isotope measurements. Where multiple dates are given for the same depth for any given entity, the date with the smallest uncertainty was used to construct the SISAL chronology. In the case of asymmetric uncertainties in the dating table, the largest uncertainty value was chosen. We kept the last 2000 realizations of the age-depth models for each entity. We calculate the age uncertainties from the spread of the individual ensembles. Details of the workflow used to construct these chronologies are available in Amirnezhad-Mozhdehi and Comas-Bru (2019). OxCal chronologies are available for 21 % (n = 106/512) of the SISAL records for which new chronologies were developed.
6. COPRA (copRa_age) is an approach based on interpolation between dates (Breitenbach et al., 2012) and was used for 9.7 % of the original SISALv2 chronologies. COPRA is available as a MATLAB package in Rehfeld et al. (2017) with a graphical user interface (GUI) that has interactive checks for reversals and hiatuses. The MATLAB version can handle multiple hiatuses and (to some extent) layer-counted segments. However, age reversals can occur near short-lived hiatuses. To overcome this, we implemented a new workflow in R that adds artificial dates at the location of the hiatuses and prevents the creation of age reversals (Roesch and  as done with linear interpolation, StalAge and Bchron. Additionally, we also incorporated an automated reversal check similar to that already embedded into StalAge (Scholz and Hoffmann, 2011). This R version, copRa, uses the default piecewise cubic Hermite interpolation (pchip) algorithm in R without consideration of layer counting. We calculate the age uncertainties from the spread of the individual ensembles. This approach was used for 76 % (n = 389/512) of the SISAL records for which new chronologies were developed.
7. StalAge (StalAge_age) fits straight lines through three adjacent dates using weights based on the dating measurement errors (Scholz and Hoffmann, 2011). Age uncertainties are iteratively obtained through a Monte Carlo approach, but ensembles are not given in the output. StalAge was used to construct 13.1 % of the original SISALv2 chronologies. The StalAge v1.0 R function has been updated to R version 3.4, and the default outlier and reversal checks were enabled to run automatically. Hiatuses cannot be entered in StalAge v1.0, but the updated version incorporates a treatment of hiatuses based on the creation of temporary synthetic ages following Roesch and . In contrast to other methods, mean ages instead of median ages are reported for StalAge, and the uncertainties are internally calculated and based on iterative fits considering dating uncertainties. StalAge was applied to 62 % (n = 320/512) of the SISAL records for which new chronologies were developed.

Revised structure of the database
The data are stored in a relational database (MySQL), which consists of 15 linked tables: site, entity, sample, dating, dating_lamina, gap, hiatus, original_chronology, d13C, d18O, entity_link_reference, references, compos-ite_link_entity, notes and sisal_chronology. Figure 3 shows the relationships between these tables and the type of each field (e.g. numeric, text). The structure and contents of all tables except the new sisal_chronology  Table 1. Changes were also made to the dating table (dating) to accommodate information about whether a specific date was used to construct each of the age-depth models in the sisal_chronology table (Table 2). We followed the original authors' decision regarding the exclusion of dates (i.e. because of high uncertainties, age reversals or high detrital content). However, some dates used in the orig-  inal age-depth model were not used in the SISALv2 chronologies to prevent unrealistic age-depth relationships (i.e. age inversions). Information on whether a particular date was used for the construction of specific type of age-depth model is provided in the dating table under columns labelled date_used_lin_interp, date_used_lin_reg, date_used_Bchron, date_used_Bacon, date_used_OxCal, date_used_copRa and date_used_StalAge ( Table 2). The dating and the sample tables were modified to accommodate the inclusion of new entities in the database. Specifically, the predefined option lists were expanded, options that had never been used were removed, and some typographical errors in the field names were corrected; these changes are listed in Table 3.

Quality control of individual speleothem records
The quality control procedure for individual records newly incorporated in the SISALv2 database is based on the steps described in Atsawawaranunt et al. (2018a). We have updated the Python database scripts to provide a more thorough quality assessment of individual records. Additional checks of the dating table resulted in modifications in the 230Th_232Th, 230Th_238U, 234U_238U, ini230Th_232Th, 238U_content, 230Th_content, 232Th_content and decay constant fields in the dating table for 60 entities. A summary of the fields that are both automatically and manually checked before uploading a record to the database is available in the Supplement.
Analyses of the data included in SISALv1 (Braun et al., 2019a;Burstyn et al., 2019;Deininger et al., 2019;Kaushal et al., 2018;Kern et al., 2019;Oster et al., 2019;Zhang et al., 2019) and SISALv1b (Comas-Bru et al., 2019) revealed a number of errors in specific records that have now been corrected. These revisions include, for example, updates in mineralogies (sample.mineralogy), revised coordinates (site.latitude and/or site.longitude) and addition of missing information that was previously entered as "unknown". The fields affected and the number of records with modifications are listed in Table 4. All revisions are also documented in Comas-Bru et al. (2020a).

Automation and quality control of the age-depth models in the SISAL chronology
We used an automated approach to age-depth modelling in R because of the large number of records. Roesch and Rehfeld (2019) have described the basic workflow concept and tested it using all of the age-modelling approaches used here except OxCal. The basic workflow involves step-by-step inspection and formatting of the data for the different methods, and the use of predefined parameter choices is specific to each method. Each age-modelling method is called sequentially. An error message is recorded in the log file if a particular age-modelling method fails, and the algorithm then progresses to the next method. If output is produced for a par- Table 3. Changes made to tables other than the sisal_chronology since the publication of SISALv1 (Atsawawaranunt et al., 2018a, b).  Visual summary of quality control of the automated SISAL chronology construction. The evaluation of the age-depth models for each method (x axis) is given for each entity (y axis) that was considered for the construction (n = 533). Black lines mark age-depth models that could not be computed. Age-depth models dropped in the automated or expert evaluation are marked by grey lines. Age-depth models retained in SISALv2 are scored from 1 (only one criterion satisfied) to 3 (all criteria satisfied) in shades of blue. For 503 records alternative age-depth models with uncertainties are provided (green lines) in the "success" column. ticular age-modelling method, these age models are checked for monotonicity. Finally, the output standardization routine writes out, for each entity and age-modelling approach, the median age model, the ensembles (if applicable) and information of which hiatuses and dates were used in the construction of the age models. These outputs are then added to the sisal_chronology table (Table 2). All functions are avail-able at https://github.com/paleovar/SISAL.AM (last access: 23 July 2020). The general approach for the OxCal age models was similar, and step-by-step details and scripts are provided at https://doi.org/10.5281/zenodo.3586280 (Amirnezhad-Mozhdehi and Comas-Bru, 2019). The quality control parameters obtained from OxCal were compared with the rec-  ommended values of the agreement index (A) > 60 % and convergence (C) > 95 % in accordance with the guidelines in Bronk Ramsey (2008), both for the overall model and for at least 90 % of the individual dates. OxCal age-depth models failing to meet these criteria were not included in the sisal_chronology table (Table 2). An overview of the evaluation results for the age-depth models constructed in automated mode is given in Fig. 4. Three nested criteria are used to evaluate them. Firstly, chronologies with reversals (Check 1) are automatically rejected (score −1). Secondly, the final chronology should flexibly follow clear growth rate changes (Check 2) such that 70 % of the dates are encompassed in the final age-depth model within 4-sigma uncertainty (score +1). Thirdly, temporal uncertainties are expected to increase between dates and near hiatuses (Check 3). This criterion is met in the automated screening (score +1) if the interquartile range (IQR) is higher between dates or at hiatuses than at dates. Only       entities that pass all three criteria are considered successful. All age-depth models that satisfied Check 1 were also evaluated in an expert-based manual screening by 10 people. If more than two experts agreed that an individual age-depth model was unreliable or inconsistencies, such as large offsets between the original age model and the dates marked as "used", occurred, the model was not included in the SISAL chronology table. This automatic and expert-based quality control screening resulted in 2138 new age-depth models constructed for 503 SISAL entities.
L. Comas-Bru et al.: SISALv2: a comprehensive speleothem isotope database with multiple age-depth models

Recommendation for the use of SISAL chronologies
The original age-depth models for every entity are available in SISALv2. However, given the lack of age uncertainties for most of the records, we recommend considering the SISAL chronologies with their respective 95 % confidence intervals whenever possible. No single age-depth modelling approach is successful for all entities, and we therefore recommend that all the methods for a specific entity are used together in visual and/or statistical comparisons. Depending on methodological choices, age-depth models compatible with the dating evidence can result in considerable temporal differences for transitions (Fig. 5). For analyses relying on the temporal alignment of records (e.g. cross-correlation), age-depth model uncertainties should be considered using the ensemble of compatible age-depth models as described in, for example, Mudelsee et al. (2012), Rehfeld and Kurths (2014) and Hu et al. (2017).  Table 6). SISALv2 represents 72 % of the existing speleothem records identified by the SISAL working group and more than 3 times the number of speleothem records in the NCEI-NOAA repository (n = 210 as of November 2019; https://www.ncdc.noaa.gov/ data-access/paleoclimatology-data/datasets/speleothem (last access: 20 October 2020), which is the one most commonly used by the speleothem community to make their data publicly available. SISALv2 also contains nine records that have not been published or are only available in PhD theses. The published age-depth models of all speleothems are accessible in the original_chronology metadata table, and our standardized age-depth models are available in the sisal_chronology table for 512 speleothems. Temporal uncertainties are now provided for 79 % of the records in the SISAL database. This is a significantly larger number than in SISALv1b, where most age-depth models lacked temporal uncertainties. Most speleothem records show average 230 Th/U age errors between 100 and 1000 years (Fig. 6), which are only slightly changed by using age-depth modelling software. Nevertheless, when comparing the mean uncertainties of the 230 Th/U ages with those of their corresponding age-depth model, the slope between both parameters is smaller than 1. This indicates that age-depth models tend to reduce uncertainties, especially when dating errors are large, while they increase uncertainties when 230 Th/U age errors are small. This second version of the SISAL database has an improved spatial coverage compared to SISALv1 (Atsawawaranunt et al., 2018b) and SISALv1b ( Fig. 3; Atsawawaranunt et al., 2019). SISALv2 contains most published records from Oceania (80.2 %), Africa (73.7 %) and South America (77.6 %), but improvements are still possible in regions like the Middle East (42.3 %) and Asia (64.8 %; Table 6).

Region
Version 1  olution of 20 isotope measurements in every 100-year slice (Fig. 7a). There are 182 records that cover at least one-third of the Holocene (last 11 700 years BP), with 37 of these covering the whole period with at least one isotope measurement in every 500-year period (Fig. 7b). There are 84 entities during the deglaciation period (21 000 to 11 700 years BP) with at least one measurement in every 500-year time period (Fig. 7b). The Last Interglacial (130 000 to 115 000 years BP) is covered by 47 speleothem records that record at least onethird of this period with, on average, 25 isotope measurements in every 1000-year time slice (Fig. 7c). This updated SISALv2 database now not only provides the basis for comparing a large number of speleothem-based environmental reconstructions on a regional to a global scale but also allows for comprehensive analyses of stable-isotope records on various timescales, from multi-decadal to orbital. coordinated the regional data collection and the age model screening. SFMB, MB and DS provided support for COPRA, Bacon and StalAge, respectively. JF assisted in the quality control procedure of the SISAL database. Figures 1, 4 and 5 were created by CR and KR. Figures 2, 3 and 6 were created by LCB. All authors listed as "SISAL working group members" provided data for this version of the database and/or helped to complete data entry. The first draft of the paper was written by LCB with input by KR and SPH, and all authors contributed to the final version.
Competing interests. The authors declare that they have no conflict of interest.
Acknowledgements. This study was undertaken by SISAL (Speleothem Isotopes Synthesis and Analysis), a working group of the Past Global Changes (PAGES) project, which in turn received support from the Swiss Academy of Sciences and the Chinese Academy of Sciences. We thank SISAL members who contributed their published data to the database and provided additional information when necessary. We thank all experts who engaged in the age-depth model evaluation. The authors would like to acknowledge Avner Ayalon, Jordi López, Bahadur Singh Kotlia and Dennis Rupprecht.
Financial support. The design and creation of v2 of the database were supported by funding to Sandy P. Harrison from the ERCfunded project GC2.0 (Global Change 2.0: Unlocking the past for a clearer future; grant no. 694481) and the Geological Survey Ireland Short Call 2017 (Developing a toolkit for model evaluation using speleothem isotope data; grant no. 2017-SC-056) award to Laia Comas-Bru. Sandy P. Harrison and Laia Comas-Bru received additional support from the ERC-funded project GC2.0 and from the JPI-Belmont project "PAlaeo-Constraints on Monsoon Evolution and Dynamics (PACMEDY)" through the UK Natural Environmental Research Council (NERC). Laia Comas-Bru and Belen Martrat received support from the CSIC scientific international collaboration programme I-LINKA20102 IBCC-lo2k. Kira Rehfeld and Denis Scholz acknowledge support by the Deutsche Forschungsgemeinschaft (DFG; codes RE3994/2-1 and SCHO 1274/11-1).
Review statement. This paper was edited by Thomas Blunier and reviewed by Oliver Bothe and Judson W. Partin.