A multiproxy database of western North American Holocene paleoclimate records

. Holocene climate reconstructions are useful for understanding the diverse features and spatial het-erogeneity of past and future climate change. Here we present a database of western North American Holocene paleoclimate records. The database gathers paleoclimate time series from 184 terrestrial and marine sites, including 381 individual proxy records. The records span at least 4000 of the last 12 000 years (median duration of 10 725 years) and have been screened for resolution, chronologic control, and climate sensitivity. Records were included that reﬂect temperature, hydroclimate, or circulation features. The database is shared in the machine readable Linked Paleo Data (LiPD) format and includes geochronologic data for generating site-level time-uncertain ensembles. This publicly accessible and curated collection of proxy paleoclimate records will have wide research applications, including, for example, investigations of the primary features of ocean– atmospheric circulation along the eastern margin of the North Paciﬁc and the latitudinal response of climate to orbital changes. The database is available for download at https://doi.org/10.6084/m9.ﬁgshare.12863843.v1 (Routson and McKay, 2020).


Introduction
Reconstructing past climate is challenging because it is spatially and temporally complex and because all paleoclimate records are influenced by factors other than climate. Although rarely done, taking advantage of the full breadth of paleoclimatic evidence provides the best possibility of discerning signal from noise. Of all the geologic epochs, the paleoclimate of the Holocene (11.7 kiloannum (ka) to present) has been investigated most extensively. Studying the Holocene is useful, in part, because it serves as a baseline from which to assess natural versus human-forced climate changes. A keyword search on "Holocene" and "climate" returns approximately 21 000 studies globally on the Web of Science. The volume of this previous work, as well as the evolving scientific understanding that it represents, generates organizational challenges related to data validation, extraction, and application.
Here we present a new database of Holocene paleoclimate records from western North America and the adjacent eastern Pacific Ocean. The spatial domain ( Fig. 1) extends from tropical Mexico to Arctic Alaska. This region was chosen because (1) it encompasses the large latitudinal range necessary to study effects of orbital changes, the primary climate forcing during the Holocene; (2) it is affected by the major modes of modern Pacific climate variability including the Pacific Decadal Oscillation (Mantua et al., 1997), El Niño-Southern Oscillation (ENSO) (Redmond and Koch, 1991), and the Northern Annular Mode (McAfee and Russell, 2008), among others; (3) it represents a range of climatologies, especially hydroclimate as influenced by the Pacific westerlies and North American monsoon (Adams and Comrie, 1997); (4) it features multiple sources of proxy climate information, including marine sediment, caves, glaciers, and lakes, which are sensitive to changes in wintertime mois-ture, a key variable for tracking the primary variability of North Pacific ocean-atmospheric circulation; and (5) it is a region of concern for future climate change, considering the large population growth and climate hazards related to, for example, water scarcity in the southern tier (Garfin, 2013) and changing wildfire hazards throughout (e.g., Marlon et al., 2012;Power et al., 2008).
This database is composed of records from individual sitelevel studies and records that were compiled by previous summaries. Many (42 %) of the records in this database are also included in version 1 of the global Temperature 12k database (Kaufman et al., 2020a). This database adds another 39 temperature-sensitive records, plus 179 records that reflect hydroclimate and circulation changes. The added data were published in various formats and often with little metadata to inform the reuse of the data. Together, this geographically distributed collection of proxy climate records integrates marine and terrestrial realms and forms a network from which to assess the spatial variability of regional climatic change and ocean-atmospheric circulation and to compare with climate model simulations of past climate states.

Data collection
Paleoclimate records located in western North America and the adjacent Pacific Ocean (Fig. 1) were considered for inclusion in the database. They were obtained from public archives in PANGEA and NOAA's World Data Service (WDS) for Paleoclimatology using the keyword search "Holocene" and record duration searches on NOAA's paleoclimate search engine. The remainder were obtained through either the supplements of publications or directly from individual data generators and are now being made available in (c) Spatial distribution of the subset of records sensitive to temperature (n = 200) and (d) the spatial distribution of other records including upwelling, sea ice, glacier extent, dust, circulation, and climate modes (n = 31). (e) Temporal availability of the records in the database by proxy type (proxy general in Supplement Table S1) over the last 12 ka. digital form as part of this data product. This database builds on several previously published paleoclimate data compilations overlapping the spatial domain encompassed by this study. These include the global Holocene temperature reconstruction of Marcott et al. (2013) (n = 4 records in western North America), Arctic Holocene Transitions database (Sundqvist et al., 2014) (n = 30 records in western North America), a collection compiled to characterize Holocene North American monsoon variability  (n = 8 records in common with this database), the Northern Hemisphere dataset used to reconstruct Holocene temperature gradients and mid-latitude hydroclimates (Routson et al., 2019a) (n = 55 records in common with this database), a network of Holocene pollen reconstructions (Marsicek et al., 2018) (n = 71 records in common with this study), two collections of records focused on the last 2 millennia (Rodysill et al., 2018;Shuman et al., 2018) (n = 18 and n = 16 records in common with this study respectively), and the global Temperature 12k database (Kaufman et al., 2020a) (n = 161 records in common with this database). Two dust deposition records were included from the global dust compilation (Albani et al., 2015). This database also complements the recently published PAGES (Past Global Changes) global multiproxy database for temperature reconstructions of the Common Era (PAGES 2k Consortium, 2017) and the PAGES global database for water isotopes over the Common Era (Konecky et al., 2020), which are both structured in the same format as this database. A few of the records were not available from the original data generators, and therefore the time series data were digitized from the source publication (as noted in the metadata) using the MATLAB program digi-tize2.m (Anil, 2020). Digitized records were mainly included to fill geographic gaps in the network of proxy sites.
Other Holocene paleoclimate records were considered but ultimately excluded because they did not satisfy the selection criteria. The majority of excluded records either (1) lacked a clear relation between proxy and climate, (2) were of insufficient duration, (3) possessed large gaps between chronologic control points, or (4) did not meet the sampling resolution criteria. In some instances selection criteria were eased to fill geographic gaps or for reasons justified by the authors in the QC (quality control) comments metadata. Removing records from the database for subjective reasons, such as removing records with outliers, was avoided.

Relation between proxy and climate
Only records with a demonstrated relation to a climate variable were included, as interpreted by the original authors of the site-level studies, but some records are not calibrated to a climate variable. Calibrated records, for example, are presented in temperature units ( • C) and precipitation units (mm). Other records are reported in their native proxy variables (e.g., δ 18 , ‰, or sediment mass accumulation, g/cm 2 /yr). Some calibrated records rely on statistical procedures to determine the relationship between proxy and instrumental data and to infer paleoclimate change, assuming that the processes that control the proxy signal remain constant down core (Tingley et al., 2012;Von Storch et al., 2004). Other calibrations rely on transfer functions based on the correlation of contemporary environmental gradients (e.g., Juggins and Birks, 2012) or the modern analogue technique, which uses the similarity between modern and fossil assemblages (e.g., Guiot and de Vernal, 2007). The original species assemblage data (primarily pollen) for these records are not included in this data product. However, a link to the Neotoma Paleoecology Database dataset ID is provided where available. The Neotoma Paleoecology Database is a community-curated database that is a primary repository for assemblage and other paleoecology data (Williams et al., 2018).
The database also includes proxy records that have not been calibrated to a specific climate variable but that display a clear relation between the proxy and climate. These "relative" climate indicators are useful because they (1) attest to the timing and relative magnitude of change, which is sufficient for many statistical reconstruction methods, especially those that do not assume linearity between proxy and climate variables; (2) can be used in proxy system modeling and in some cases (e.g., δ 18 O) can be compared directly to the output of climate models; and (3) provide more complete spatial coverage.

Record duration and resolution
The database aims to document paleoclimate variability that ranges on the timescale of multi-millennial trends to centennial excursions. However, not all records encompass the entire Holocene epoch. To be included, records must span a duration of ca. 4000 years anytime between 0 and 12 ka. To focus on records that can resolve sub-millennial patterns, the database includes those with a sample resolution finer than 400 years (i.e., the median spacing between consecutive samples in the time series is less than 400 years over the past 12 000 years or over the full record length, if shorter).

Chronologic control
Age control is a fundamental variable underlying proxy records. The database includes the chronologic data necessary for reproducing original age-depth models for records from sediment and speleothem archive types. Chronologic data include depth, uncalibrated radiometric or other dates, analytical errors, and associated corrections where applicable. Other metadata, including material type analyzed and sample identifiers, were included when available. Time series with a maximum of 3000 years between dates within the 0-12 ka interval or with five or more relatively evenly distributed Holocene dates were included in the database. Overall, the age control screening retained a high proportion of available records while recognizing that such coarse age control often precludes the ability to address questions that require fine temporal-scale accuracy (Blaauw et al., 2018).

Metadata
The database includes a large variety of metadata (Supplement Table S1) to facilitate analyses and reuse. The metadata included in this database are largely consistent with those developed and used in the Temperature 12k database (Kaufman et al., 2020a), with some refinement for hydroclimate-related records. Predominant metadata are subdivided into the following categories: 1. Geographic information includes "site name", "latitude", "longitude", and "elevation". Geodetic data are relative to the WGS84 (World Geodetic System 1984) ellipsoid and in units of decimal degrees. "Country ocean" is generated based on the NASA GCMD (Global Change Master Directory) convention.
2. Bibliographic information includes the DOI (digital object identifier) when available. The original study is typically referenced in "publication 1". "Publication 2" generally corresponds to subsequent publications contributing to record development or reuse.
3. The original data source ("original data citation") is the persistent identifier (URL, Uniform Resource Locator, or DOI) that connects to the publicly accessible repository (e.g., PANGAEA and NOAA WDS paleoclimatology when available). Fields with the entry "wNAm" correspond to records transferred to a public repository for the first time by this study. "Neotoma ID" includes the Neotoma dataset ID when available for the original assemblage data.
4. Metadata describing the proxy record include "archive type", "proxy general", "proxy type", "proxy detail", "calibration method", and "paleo data notes". Archive type corresponds to the physical archive (e.g., lake sediment, marine sediment, peat, and speleothem). Proxy general simplifies plotting figures by grouping similar proxies from proxy type. For example, proxy general for "other biomarkers" includes proxy type TEX86 (tetraether index of 86 carbon atoms) and GDGT (glycerol dialkyl glycerol tetraether) but not alkenones, which are treated separately. Proxy general for "biophysical" includes biogenic silica, tree-ring width, total organic content, chlorophyll, and macrofossils. Proxy general for "other microfossil" includes coccolith, diatom, dinocyst, and foraminifera. Pollen and chironomid records are treated separately. Proxy detail corresponds to specific species or material types. "Calibration method" is the statistical method used for proxy calibration. Paleo data notes include information from the original study to help users understand the proxy record.
5. For climate interpretation, primary "climate variables" include "T " (temperature), "P " (precipitation), and "P -E" (precipitation minus evaporation). Other climate indicators include "MODE" (climate modes such as ENSO), "upwelling" (coastal upwelling), "DUST" (dust deposition), "ICE" (sea ice extent), and "ELA" (glacier equilibrium line altitude). The "interpretation direction" is the sign relation ("positive" or "negative") between the proxy value and the climate variable. Proxy records originally reported as E-P were cataloged as the climate variable of P -E, and the field interpretation direction was inverted from the original interpretation. "Variable name" corresponds to the specific variable type (e.g., "temperature" or "δ 18 O"; oxygen-18 isotopes). "Units " correspond to the measurement unit specified in the variable name (e.g., "degC" or "permil"). "Climate variable detail" refines the climate variable field. Temperature records follow the structure of the variable sensed (e.g., "air") at a specific level (e.g., "surface"). Examples include "air@surface", "air@condensation", and "sea@surface". Hydroclimate and some other record types do not always conform as well to this format. Climate variable detail for these records specifies the variable sensed (e.g., "lake level", "runoff", "river flow", and "amount"), at a specific level (e.g., surface). Examples include "lakeLevel@surface" and "runoff@surface". If the variable sensed is the same as the climate variable (e.g., "precipitation"), the field is left blank. In these cases only the level is specified (e.g., "@surface"). In cases where the level was ambiguous, not specified, or not applicable (e.g., "soil moisture", "lake salinity", or "El Niño"), only the variable sensed was specified.
7. Metadata describing the underlying time series data include the youngest and oldest sample ages ("min year" and "max year"), the median sample resolution ("resolution") over the past 12 000 years, and the frequency of age control points ("ages per kyr"), which includes radiocarbon and U-series (uranium) ages.
8. Quality control metadata include ("QC certification") and ("QC comments"). QC certification includes the initials of the co-author of this data descriptor who was responsible for reviewing the screening criteria for records included in the data product. QC comments were written by the person who completed QC to improve reusability of the data.
9. Data access and visualization includes a website link for viewing and downloading the data in .csv (commaseparated value) or LiPD format ("link to LiPDverse").

Database structure and format: Linked Paleo Data (LiPD)
The site-level data and metadata are formatted in the LiPD structure. The LiPD framework comprises JSONformatted files that are machine-readable with MATLAB, Python, and R packages that enable rapid querying and data extraction (McKay and Emile-Geay, 2016). LiPD encodes the database into a structured hierarchy that allows for explicit descriptions at any level and aspect of the database. Code packages for evaluating the database can be accessed on GitHub (https://github.com/nickmckay/ LiPD-utilities, last access: 29 March 2021).

Data visualization
A one-page dashboard for each record is included as a Supplement to this article. The dashboards include the primary information associated with each record including the location, the time series plot, bibliographic reference, and proxy data information (Supplemental dashboards). Each record is also linked to a web page (link to LiPDverse) where the data can be visualized and downloaded in LiPD or text versions. A globally distributed collection of paleoclimate LiPD files is housed at https://lipdverse.org/ (last access: 29 March 2021). This western North American Holocene paleoclimate database is a subset of the records that can be found by choosing wNAm in the LiPDverse browser. The full collection can also be accessed at http://lipdverse.org/ wNAm/1_0_0/ (last access: 29 March 2021).

Proxy records and climate variables
The western North American Holocene paleoclimate database includes proxy climate records from 184 different sites. Many "sites" (locations) are represented by more than one proxy "record" (time series). Multiple records from one site often represent different climate variables or reconstruction methods. Pollen assemblages, for example, are often translated into both temperature and moisture variables, sometimes for different seasons. The list of sites is shown by row in Table 1, whereas Supplement Table S1 contains a row for each record. In total, this database comprises 184 sites and 381 records. The records are derived from nine archive types and are based on eight proxy categories (Supplement Table S1). The database includes 259 records from lake sediments, 58 records from marine sediment, and 64 other terrestrial.
The western North America database includes 84 records that are being transferred to a publicly accessible data repository for the first time with this data product. These include 61 "new" records as follows. Pollen ratio time series reflecting changes in the position of forest boundaries and long-term temperature change were calculated for 23 records. These ratios were computed by the original data generators following methods and rationale described in Jiménez-Moreno et al. (2019) and Johnson et al. (2013). The database also includes 20 precipitation records, which were generated by Marsicek et al. (2018) but not released with that publication. Finally, we have included 18 hydroclimate records based on subsets of packrat midden sites from Harbert and Nixon (2018), following the same methods applied for temperature reconstructions in Kaufman et al. (2020b). Briefly, the Climate Reconstruction Analysis using Coexistence Likelihood Estimation (CRACLE) method was used to infer absolute precipitation given the modern relationship between WorldClim climate data and packrat midden fos-sil data. In the original paper (Harbert and Nixon, 2018), an overall MAT (mean annual temperature) anomaly that combines all sites is presented. This MAT is calculated by subtracting the WorldClim calibration data for each site and then averaging all inferred temperatures (across space) in discrete time intervals. Here we provide the absolute precipitation from CRACLE, without spatiotemporal averaging, and note that some of the inferred absolute precipitation appears more extreme than precipitation reconstructed from other proxies. For further details and code, please refer to Harbert and Nixon (2018). These midden records are noted in the QC comments column of Supplement Table S1.
The database contains 200 temperature-sensitive records; 150 hydroclimate sensitive records (e.g., precipitation, P -E, flood frequency, and streamflow); and 31 other records including upwelling, dust, climate mode, and sea ice extent. Marine records are primarily sea surface temperatures, but there are several marine records of other variables including sea ice extent, upwelling strength, and flood frequency. Many (228) of the proxy records are interpreted by the original authors to represent mean annual values of specific climate variables. Others represent individual seasons, primarily with some aspect of summer. Background information including the strengths, weaknesses, and underlying assumptions of the specific poxy types can be found in textbooks devoted to the topic (e.g., Bradley, 2015).

Geographic coverage
The geographic distribution of records within western North America is far from uniform (Fig. 1). The density of all sites is comparatively high in Alaska and the conterminous western United States. In contrast, Mexico is represented by few study sites, mainly because many studies failed to meet the inclusion criteria. Hydroclimate records have the most uniform coverage, albeit with a spatial gap in Mexico. The spatial distribution of temperature records has gaps in Canada, the midwestern United States, Texas, and continental Mexico.

Record length and temporal resolution
Median record duration is 10 725 years, not counting the duration of records beyond 12 000 years. Most of the records (94 %) extend back at least 6000 years, thereby including the frequently modeled 6 ka paleoclimate time slice. The median sample resolution of individual records in the database is 127 years (Fig. 2).
These primary age controls can be used to recalculate the age models for all of the 14 C-based sedimentary sequences and U-series-based speleothems using a systematic approach to addressing age uncertainty.

Uncertainties
A variety of approaches have been used to characterize uncertainties in paleoclimate variables, and there is no standard procedure for either calculating or reporting uncertainties (Sweeney et al., 2018). Generally, calibration and other uncertainties are large relative to the small amplitude of most Holocene climate change, but these uncertainties are less important when investigating the relative magnitude of climate changes rather than the absolute value of a climate variable. Uncertainty arising from differences among records can be explored using a bootstrapped sampling with a replacement approach (e.g., Boos, 2003;Routson et al., 2019a); however, these ranges reflect a combination of record-level uncertainty and regional climate heterogeneity. In this database we are following other syntheses (Kaufman et al., 2020b;Marcott et al., 2013;Routson et al., 2019a) by applying a single uncertainty estimate for each proxy type (Supplement Table S1). Proxy-specific uncertainties for temperature records follow Kaufman et al. (2020b), as did our approach for calculating uncertainty estimates for the hydroclimate records. For the calibrated hydroclimate records (primarily pollen based), we have calculated average RMSE values from the following references within or adjacent to the study region (Brown et al., 2006;Schoups, 2015, 2019;Harbert and Nixon, 2018;Marsicek et al., 2013). For the 163 uncalibrated records we have estimated the error as ±1 SD (standard deviation) of the Holocene values.

Summarizing major trends
Recognizing major climatological differences across the study domain (spanning from tropical Mexico to Arctic Alaska), we have summarized some dominant patterns in the database including climate variables (temperature and hydroclimate), proxy group, and season. Dominant temperature and hydroclimate patterns by proxy group as specified in proxy general in Supplement Table S1 were evaluated (Fig. 3). Only proxy groups with more than 10 records were considered. The records were screened by season to include one record per site ("season general" for "annual" or "summer only" or "winter only"). Records were then binned to 500-year resolution by averaging data points within respective intervals, normalized to a mean of zero and 1 SD variance (z scores), and composited using the median to minimize the influence of outliers. Dominant temperature proxies include chironomids (n = 15), biophysical (n = 17), pollen (n = 130), and isotopes (n = 14). Chironomids show peak warmth in the Early Holocene (ca. 10 ka), followed by a Holocene cooling trend. Biophysical records have more variability, with peak warming at ca. 7 ka. Pollen records show relatively low Holocene variability, with peak warming at ca. 6 ka. Isotopes have the highest Holocene variability and the lowest sample depth and show two intervals of warming (ca. 9 and 4 ka). Dominant hydroclimate proxies include other microfossils (n = 11), biophysical records (n = 46), pollen (n = 57), and isotopes (n = 35). Other microfossils show variable Holocene conditions, with the wettest period in the Early Holocene. This interval however, has very low sample depth. Biophysical records show only small Holocene hydroclimate changes. Pollen records show a strong Holocene wetting trend, whereas isotope records show variable conditions. Temperature and hydroclimate trends were compared by summer, winter, and annual seasons (Fig. 4). The records were binned to 500-year resolution by averaging data points within respective intervals and normalized to a mean of zero and 1 SD variance (z scores). Records were then averaged into equal-area (127 525 km 2 ) grids following Routson et al. (2019a). The grids were then combined into a single composite using the median. The most recent 500-year bin was then subtracted, registering the present end to zero. This was done to help compare the seasonal Holocene evolutions. In the Early to Middle Holocene (ca 12 to 6 ka), summertime and annual temperatures warmed faster than wintertime temperatures, consistent with Northern Hemisphere seasonal insolation forcing (Berger and Loutre, 1991). Temperatures in all seasons show a cooling pattern from ca. 6 ka to the present. Hydroclimate composites show a Holocene-length wetting trend in all seasons, with the largest trend in wintertime.

Use and limitations
The machine-readable database includes multiple parameters for searching and screening records. The data compilation will form the foundation of new analyses of Holocene Earth Syst. Sci. Data, 13, 1613Data, 13, -1632Data, 13, , 2021 https://doi.org/10.5194/essd-13-1613-2021  Table S1). Only proxy types with n > 10 are shown. The composites are produced from normalized (units of standard deviation) records to include both calibrated and uncalibrated time series. Records have been filtered by seasonality (season general for annual, summer only, and winter only), to include one record per site. Shading shows the 95 % bootstrapped confidence interval on the estimate of the mean over 1000 (sampling with replacement) iterations. Gray bars show the number of records contributing to each 500-year bin.
climate variability in western North America and will help identify future research priorities, including data-sparse regions. The 381 records in this database will enable studies of Holocene climate on centennial to multi-millennial timescales. At finer timescales, the number of records with sufficient resolution and geochronological control is more limited. For example, 170 records have a median sampling resolution of better than 100 years, and only 26 sites have resolution finer than 10 years. The accuracy and precision of age control can also limit inferences involving correlations https://doi.org/10.5194/essd-13-1613-2021 Earth Syst. Sci. Data, 13, 1613-1632, 2021 and spectral properties of the time series. The availability of the raw chronology data for each record in this database allows users to quantify and incorporate aspects of chronologic uncertainty into their analyses. This database represents a concerted effort to generate a comprehensive data product but is an ongoing effort, with newly published records continuing to be added. Some published records that meet the criteria might have been inadvertently overlooked. Readers who know of missing datasets or who find errors in this version are asked to contact one of the authors so that future versions of the database will be more complete and accurate. Rather than issuing errata to this publication, errors and additions will be included in subsequent versions of the database. NPM built the data infrastructure and performed data processing. CCR, DSK, and SHA did quality control, term standardization, and database cleaning. CCR and DSK wrote the paper with contributions from the other authors.