VARDA (VARved sediments DAtabase) – providing and connecting proxy data from annually laminated lake sediments

Varved lake sediments provide climatic records with seasonal to annual resolution and low associated age uncertainty. Robust and detailed comparison of well-dated and annually laminated sediment records is crucial for 10 reconstructing abrupt and regionally time-transgressive changes as well as validation of spatial and temporal trajectories of past climatic changes. The VARved sediments DAtabase (VARDA) presented here is the first data compilation for varve chronologies and associated palaeoclimatic proxy records. The current version 1.0 allows detailed comparison of published varve records from 95 lakes. VARDA is freely accessible and was created to assess outputs from climate models with highresolution terrestrial palaeoclimatic proxies. VARDA additionally provides a technical environment that enables to explore 15 the database of varved lake sediments using a connected data-model and can generate a state-of-the-art graphic representation of multi-site comparison. This allows to reassess existing chronologies and tephra events to synchronize and compare even distant varved lake records. Furthermore, the present version of VARDA permits to explore varve thickness data. In this paper, we report in detail on the data mining and compilation strategies for the identification of varved lakes and assimilation of highresolution chronologies as well as the technical infrastructure of the database. Additional paleoclimate proxy data will be 20 provided in forthcoming updates. The VARDA graph-database and user interface can be accessed online at https://varve.gfzpotsdam.de, all datasets of version 1.0 are available at http://doi.org/10.5880/GFZ.4.3.2019.003 (Ramisch et al., 2019).


Introduction
A major challenge in simulating climate change is validating model outputs with paleoclimatic data. Model-data comparisons on regional to global scale require the integration of paleoclimatic data from single sites into multi-site networks (e.g. Franke 25 et al., 2017). Annually laminated lake sediments provide reliable data for such networks because they offer paleoclimatic information in high temporal resolution with low associated age uncertainty. Due to their annual to seasonal resolution, multisite networks of varved lake sediments enable investigations of abrupt and regionally time-transgressive climate change on the continents (e.g. Lane et al., 2013;Rach et al., 2014) which is fundamental to understand past climates, especially of the last glacial cycle (Clement and Peterson, 2008) and to better assess spatial and temporal trajectories of future climate changes.
2 Networks of varved lake sediments also provide means to test contrasted proxy responses to climate change (e.g. Ott et al., 2017;Ramisch et al., 2018;Roberts et al., 2016), further enhancing the robustness of paleoclimatic reconstructions. However, despite their usefulness for the generation of highly resolved multi-site networks, a global synthesis of varve-related paleoclimatic data is still not available.
Various data providers have been developed which offer free access to palaeoclimatic and paleoenvironmental information 35 including high resolution terrestrial archives. These include (1) large scale data repositories such as Pangaea (www.pangaea.de), the National Oceanic and Atmospheric Administration's (NOAA) World data service for Paleoclimatology archives (www.ncdc.noaa.gov) and Neotoma (www.neotomadb.org, Williams et al., 2018) and, (2) proxy or time-slice specific databases like the ACER (Sánchez Goñi et al., 2017), the European Pollen database (Fyfe et al., 2009), the SISAL database (Atsawawaranunt et al., 2018) or the PAGES2k Global 2,000 Year Multiproxy Database (Pages 2k consortium, 2017). 40 However, the distribution of information in between data providers make a custom generation of multi-site networks from varved sediments inefficient and time consuming. Moreover, continuous geochronological development results in frequent updates of fundamental methods such as calibration curves (e.g. Reimer et al., 2004Reimer et al., , 2009Reimer et al., , 2013 and age-depth modelling algorithms (e.g. Bronk Ramsey et al., 2007;Blauuw and Christen, 2011). Incorporating such changes into existing varverelated datasets requires an interactive approach that is not offered by fixed data structures of standard relational database 45 management systems. To overcome these limitations, we developed a new and state-of-the-art graph database especially, but not exclusively, for varved sediment records. The database was developed within the German climate modelling initiative PalMod (Latif et al., 2016), to validate the output of comprehensive Earth system models with reliable proxy data from terrestrial and marine (Jonkers et al., 2020) archives. We compiled all available and published varved sediment records and developed criteria how these data are integrated in this database. 50

Data mining
We assessed varve related publications aided by the literature database of the PAGES varve working group (http://www.pastglobalchanges.org/download/docs/working_groups/vwg/Varve%20publications.pdf) to identify lake archives exhibiting varved sediments and to compile suitable core related paleoclimatic proxy time series. A comprehensive 55 set of lake sediment records was identified, for which proxy data from continuous or floating varve sequences were previously published. All data were collected as raw data from freely available online sources, either from online data repositories (Pangaea, NOAA, and Neotoma) or data archives within the supplementary materials section of online publications. For a permanent and definite assignment of the compiled data sets within the database to their respective original publication, the digital object identifier (DOI) of the publication or the data-provider (if available) was additionally collected and stored.

Data compilation
To ensure an unambiguous identification of a lake record corresponding to a given dataset, we collected and reviewed the required information of lake names and geographic coordinates from the published literature. Table 1 lists required and additional information for lake records included in VARDA. To facilitate searches for lakes in an alphabetically ordered list, the string "Lake" was removed from the name if the string appeared in the beginning of the lake name (e.g. "Lake Ammersee" 65 was changed to "Ammersee"). However, exceptions were made if the string "Lake" is an essential feature of the lake name (e.g. "Lake of the Clouds") or if the reference is in non-english language (e.g. "Lac d'Annecy"). Lake locations were stored as WGS84 referenced geographical coordinates in decimal degree with 4 decimal places, which corresponds to a precision of ~ 10 m. This even allows a reliable location of small lakes with a surface area < 1 ha and especially useful for dense lake distributions common in large lake districts such as in Canada or Scandinavia. Since the required precision was not available 70 in most publications, we re-assessed the published geographical location using ArcGIS and Google Earth. All lake locations refer to the approximate lake centre and are independent from coring locations.

Table 1
Sediment composite profiles that were collected from primary literature sources (see Tab. 2) only require a unique identifier (e.g. MON for Lago Grande di Monticchio) within the VARDA database that links a profile to a corresponding lake (Tab.2). 75 Additional information encompasses the geographical coordinates of coring location (fields: Latitude, Longitude), coring methods (e.g. piston corer), a coring date, water depths at the core location as well as the total length of the sediment composite profile with an upper (field: depth start) and lower (field: depth end) depth.

Lake and sediment composite profile meta information
The data compilation followed the basic strategy to collect proxy data associated with a published sediment composite profile 80 and information about age-depth models and event layers. A sediment composite profile may either consist of a single core section or several overlapping core sections combined to a composite profile. The depth scale within a sediment composite profile is referred to as composite depth. Since data and meta information availability greatly varied in between different publications, we classified the available information into required and additional information. The category required encompasses all information that is necessary to a) associate a proxy value at a given depth in a sediment composite profile 85 with a corresponding age and to b) uniquely identify a lake, sediment composite profile and original publication for a given dataset. The category additional encompasses all information that extends the data pool for more comprehensive analyses and therefore improves reproducibility, the ability to filter data by specific properties and, in addition, the quantification of methodological uncertainties. We converted all datasets to default units to provide standardized and thus intercomparable data formats. Tables 1 to 7 provide an overview of data categories and required and additional information properties including the 90 default units. Uncalibrated radiocarbon measurements were collected from the published literature and adapted to the 14 C data reporting standards of Millard et al. (2014). This allows efficient reassessments of published chronologies by calibration, age-depth 95 modelling, and age uncertainty estimation (see Table 3). However, reporting standards are not yet fully adapted in the paleoclimatic community, leading to variations in reported information and data gaps. The required information encompasses from left to right (i) the sampling depth (field: composite depth); (ii) the uncalibrated age (field: Age uncalibrated); (iii) the associated measurement error (field: Error); (iv) the error type (e.g. 1 sigma); and (v) the dated material (e.g. wood remains).
The required sampling position refers to the depth within the sediment composite profile, whereas the sampling position within 100 the individual core sections can be attributed as additional information. If available, we collected additional information on (i) the corresponding core section label (field core section); (ii) section depth (field: section depth); (iii) the lab code; (iv) δ13C data; (v) the measurement method (field: method) as e.g. AMS 14 C; (vi) the organic carbon content of a sample (field: %C) and (vii) C/N ratios.

Age-depth models and chronologies
Chronologies for varved lake sediments are commonly based on a combination of different dating methods , such as varve counting, radiometric dating (e.g. 14 C, 137 Cs or 210 Pb) and event age-equivalent dating (e.g. correlation to dated volcanic eruptions). Age-depth models provide the time frame for down-core sequences of sediment composite profiles and allow transformations of sediment proxy records into time series. Initially, most researchers constructed age-depth models 110 by simple linear interpolation between individual chronological points. However, age-depth modelling algorithms such as the OxCal P-Sequence (Bronk-Ramsey, 2007) or Bacon (Blaauw and Christen, 2011) have become more common and perform more complex statistical interpolations. Table 4 VARDA version 1.0 includes published chronologies that are available in public data repositories. Table 4 and 5 provide an 115 overview of the required and additional meta-information for storing chronologies in VARDA and the resulting chronological data-sheet respectively. The required information includes a label for the associated sediment composite profile as well as the corresponding data and publication DOI. Additional information will enable rapid reassessments of original chronologies.

Table 5
Additional information reports (i) on age uncertainty; (ii) presence, type and age of anchor points for floating chronologies 120 (e.g. sediment surface for continuous varve chronologies, 14 C dates or elsewhere dated tephra layer for floating chronologies); (iii) the applied dating methods (e.g. varve counting, radiometric dating or event layers); (iv) the interpolation method (e.g. linear interpolation or bayesian age-depth modelling such as OxCal P-sequence or Bacon); (v) the applied 14 C calibration curve (e.g. IntCal09); and (vi) the resulting median resolution of the chronology. available, an uncertainty range expressed as minimum and maximum estimate as additional information (2 sigma as default).
If an uncertainty range was not provided, the range was recalculated using the estimated counting error (if available in the corresponding publication). If depth information for a sediment composite profile was not provided, we either reconstructed an auxiliary composite depth by cumulative sums of continuous varve thickness measurements (if available) or excluded the corresponding chronology from the present data compilation because such time series without corresponding core depth are 130 not updatable. The default depth scale unit was set to mm to avoid excessive decimal places in depth reporting. The default age scale unit was set to years BP (1950 CE). The default age unit was restricted to annual precision and ages are reported in integer numbers (without usage of decimal places).

Isochronous event layers
Isochronous event layers provide precise tie points for the synchronization of proxy time series from regionally different 135 locations and facilitate the construction of multi-site networks. Furthermore, the identification of layers corresponding to dated events such as e.g. volcanic eruptions or geomagnetic excursions provide additional information for the construction of robust chronologies. For the first version of VARDA, we collected information on reported tephra layers in the sediment composite profiles included in the database. Table 6 provides an overview of required and additional information of published tephra layers in VARDA. The required information (composite depth, age, age error and dating method) are essential to assign a 140 tephra layer to a given depth in a sediment composite profile and to store information on the age of the layer as it has been reported. Since standards for age reporting of tephra layers greatly vary in between different studies (e.g. uncalibrated vs. calibrated), information on the dating method and calibration are required for the field "Dating method/Calibration". The required field "Dated in profile?" provides information if the age of the tephra layer originates from the corresponding sediment composite profile itself (field = true) or if the age was adapted from the literature (field = false). If the age was adapted from 145 the literature, a DOI from the original publication is required. Further event layers such as geomagnetic excursions will be included in forthcoming versions of VARDA. Table 6 2.

Proxy data
The technical infrastructure of VARDA is intended to attribute a down-profile record of paleoclimatic proxy data to the 150 corresponding chronology of the sediment composite profile. Therefore, the required information for proxy data sequences is the composite depth and a corresponding proxy measurement, while additional information further describes proxy specific measurement standards. We adapted the variable controlled vocabulary of the PaST thesaurus for proxy data (World Data Service for Paleoclimatology, https://www.ncdc.noaa.gov/data-access/paleoclimatology-data/past-thesaurus, last access in September 2019). Therefore, all proxy records will be broadly categorized into biological, sedimentological and geochemical 155 proxy data. In the present version of the database, we included varve thickness data that were found in public data repositories. Table 7 lists the required and additional information concerning varve thickness records. Further proxy data such as stableisotope, pollen or XRF records will be included in forthcoming versions of VARDA. Table 7 3. Database 160

Database design
VARDA is intended to offer a flexible generation of multi-site networks with complex data relations for storing and organizing the collected information. To store and organize datasets from varved lake archives, we use a graph database. Graph technology in computer science has evolved as part of the NoSQL movement (meaning "Not only SQL") and is based on graph theory, a mathematical concept of expressing objects as interconnected entities, which dates back to the early works of Leonard Euler 165 in the 18th century (Euler, 1741). In contrast to fixed data schemes required by relational database management systems (RDBMS), a graph explicitly models relations between data by representing entities as nodes (or vertices) described by properties and connected through edges as shown in Fig. 1 (also see property graph model). To categorize the nature of a particular entity, one or more labels can be added to the node. Edges can be distinguished by their type and may have properties just like nodes. The ability to add new labels, edges and properties to any entity at all times enables developers to quickly adapt 170 the data model to changing scientific or technical requirements. Neo4j's native query language Cypher is used to read and update the contents in the graph. It allows for an intuitive and flexible generation of queries that are short and readable even for complex patterns (many relationships, circular structures, variable-length paths).

Figure 1
The integration of paleoenvironmental datasets from varved lakes into a graph database resulted in a flexible data structure, 175 which allows for connected paleoenvironmental datasets within a single lake as well as in between different lakes. Fig. 1 illustrates the VARDA property graph model schematically and visualizes connections between nodes. The VARDA data model associates each lake with one or more sediment composite profiles, which are connected to one or more datasets.
Datasets, in turn, are connected to a publication, a category (chronology, tephra layer, radiocarbon date or varve thickness record in version 1.0) and various category specific attributes (as listed in Tab. 1 to 7) which further describe a dataset. All 180 these connections provide the necessary meta information to the actual data points, which are included in a given data set. Data points from the category tephra layer can additionally connect to an event which is described in more than one lake, as for example the Laacher See tephra. The event node offers the possibility to connect datasets between different lakes for e.g. synchronization.

Application design 185
VARDA provides fast access to palaeoclimatic data from varved lakes, irrespective of a user's technical background or operating system. Therefore, the user interface (UI) was designed to be intuitive and reactive with self-explanatory forms and components which immediately respond to the user's actions. It is implemented as an online service, which can be accessed permanently using a web browser.
Overall the application consists of the web client, a server-side Neo4j graph database and an Application Programming 190 Interface (API) for communication of the client with the database. All software libraries that are integrated into VARDA have licenses that are free and permissive. The client is built with Vue.js, a JavaScript UI framework which has raised attention in the developer community since its launch in 2014 due to its versatility and runtime performance. Some features of VARDA integrate other well-documented third-party libraries, such as D3.js for data visualization and OpenLayers for rendering maps (e.g. from OSM) among vector layers with spatial data. The client state (e.g. user data and entity cache) and any transactions 195 with the database are being handled with Apollo GraphQL, a framework for API communication and state management. The client's component-oriented architecture enables fast development of new features with little interference with existing modules. All lines of source code required by the client are being checked, minified and bundled using WebPack for use in the browser.
The web application offers a user interface with optional filters to explore and visualize multi-site networks on demand (see 200 Fig. 2). A universal search field (1 in Fig. 2) can be used to select filters either by region or proxy category. An interactive diagram (2 in Fig. 2) can be used to select a temporal filter by scrolling with the mouse or resizing the light-blue coloured frame (3 in Fig. 2) underneath the main figure.

Figure 2
We add the iconic NGRIP oxygen -isotope (δ18O) record with the GICC05 chronology Rasmussen et 205 al., 2006;Svensson et al., 2005) as a temporal reference curve for the user. This curve is well-known in the paleoclimate community and thus allows an easy recognition of the time interval covered by a lake record of interest. In the present version it does not allow precise correlations between lake records with the NGRIP curve because chronological uncertainties for the latter are not shown for visual clarity. Orange circles (4 in Fig. 2) correspond to tephra layers that have been identified in sediments of at least two archives. Clicking a circle enables (or disables) the respective filter. The results 210 will be updated immediately on the map (5 in Fig. 2) and in the result list (6 in Fig. 2) below whenever any filters have been changed. Direct selection of a lake on the map or in the result list guides users to the lake detail view with a list of corresponding core datasets. In version 1.0 all datasets of interest can be downloaded in CSV format.

Data inventory
We identified 186 lakes from the published literature, which are described to exhibit continuous or floating varve sequences 215 in their sediments. We additionally included unvarved sediments from Lake Prespa (Europe), Lake Ohrid (Europe), Laguna Potrok Aike (South America) and Bear Lake (North America) to the compilation due to their long continuous chronologies and good age-control from independent dating techniques or the frequent occurrence of tephra layers. In total, 261 datasets for 8 The datasets comprise of 70 individual chronologies from 43 lakes, 146 tephra layers from 36 lakes, 118 uncalibrated 14 C 220 records from 50 lakes and 55 varve thickness records from 23 lakes. Tab. 8 lists all identified lakes with name, geographical coordinates and available data sets including the corresponding literature reference.

Table 8
Fig . 3 presents the spatial coverage of lakes and associated datasets included in VARDA 1.0. The identified lakes are located on all continents except Antarctica, with ~56% located in Europe, ~26% in North America, ~8% in Asia, ~ 5% in Middle and 225 South America, ~3% in Africa, and ~2% in Oceania. The spatial coverage shows a distinct spatial emphasis in lake distribution on the mid-latitudes of the Northern Hemisphere, especially the North Atlantic realm. In contrast, only 13 of the 190 lake archives are located on the Southern Hemisphere.  layers are reported to occur in more than one lake and are therefore suitable for synchronization.

Conclusion and future developments
VARDA offers a user-friendly and time efficient way to explore the multitude of paleoenvironmental data from varved lake 245 archives. Due to the integration of precise chronologies and isochrones from tephra event layers into a modern graph database, VARDA offers an easy way to construct regional to global networks of paleoenvironmental information. These multi-site networks can be used e.g. to explore and analyze leads and lags of regional climate change, large scale patterns in environmental variability or differentiated proxy responses within and between archives. The first version of VARDA presented here includes all technological requirements and tools for future upgrades and developments. Presently, we are 250 working on the integration of (1) an advanced visualization tool, (2) a user-friendly import application and, (3) additional proxy data such as stable isotopes and geochemical data, as priority goals for the next update. Additionally, the source code of the database application will be made available for the public in a separate contribution. In general, VARDA is intended to be community-based effort and we welcome and encourage the participation of varve specialists and the broader paleoenvironmental community for the further development and application of this tool. Quaternary Science Reviews, 141, 9-25, doi:10.1016Reviews, 141, 9-25, doi:10. /j.quascirev.2016Reviews, 141, 9-25, doi:10. .03.020, 2016 Clark, J. S., Merkt, J. and Muller, H.: Post-Glacial Fire, Vegetation, and Human History on the Northern Alpine Forelands, South-Western Germany, The Journal of Ecology, 77(4), 897, doi:10.2307/2260813, 1989. Dräger, N., Theuerkauf, M., Szeroczyńska, K., Wulf, S., Tjallingii, R., Plessen, B., Kienel, U. and