LegacyClimate 1.0: a dataset of pollen-based climate reconstructions from 2594 Northern Hemisphere sites covering the last 30 kyr and beyond

. Here we describe LegacyClimate 1.0, a dataset of the reconstruction of the mean July temperature ( T July ), mean annual temperature ( T ann ), and annual precipitation ( P ann ) from 2594 fossil pollen records from the Northern Hemisphere, spanning the entire Holocene, with some records reaching back to the Last Glacial Period. Two reconstruction methods, the modern analog technique (MAT) and weighted averaging partial least squares regression (WA-PLS), reveal similar results regarding spatial and temporal patterns. To reduce the impact of precipitation on temperature reconstruction, and vice versa, we also provide reconstructions using tailored modern pollen data, limiting the range of the corresponding other climate variables. We assess the reliability of the reconstructions, using information from the spatial distributions of the root mean squared error in the prediction and reconstruction signiﬁcance tests. The dataset is beneﬁcial for synthesis studies of proxy-based reconstructions and to evaluate the output of climate models and thus help to improve the models themselves. We provide our compilation of reconstructed T July , T ann , and P ann as open-access datasets at PANGAEA (https://doi.org/10.1594/PANGAEA.930512; Herzschuh et al., 2023a). The R code for the reconstructions is provided at Zenodo (https://doi.org/10.5281/zenodo.7887565; Herzschuh et al., 2023b), including the harmonized open-access modern and fossil datasets used for the reconstructions, so that customized reconstructions can be easily established.


Introduction
The comparison of climate model outputs with climate data is essential for model improvements (Eyring et al., 2019). The extratropical Northern Hemisphere is of particular interest because it is known for complex spatial and temporal temperature and precipitation patterns. However, the period for which instrumental observations are available is only of limited use to validate simulations, in particular when assessing the climate response to natural climate drivers, because it is too short and because it is impacted by human-induced greenhouse gas forcing. Climate proxy data derived from natural archives are therefore of great value.
Previous proxy-based climate inferences have contributed to major debates about Holocene climate change. For example, while simulations indicate gradual warming of the Holocene, temperature proxy data syntheses rather support a mid-Holocene optimum, which resulted in the "Holocene conundrum" debate (Liu et al., 2014). While the debate has progressed since, new proxy-based syntheses can help us to understand regional differences and contribute further to the debate. Qualitative proxy-based inferences indicate that the mid-Holocene in the Northern Hemisphere midlatitudes was rather dry and warm when compared with the present day, which is in agreement with modeling outputs (Routson et al., 2019). Also, quantitative precipitation reconstructions from eastern and central Asia unveiled the complex monsoonwesterlies interactions Herzschuh et al., 2019). However, evaluating modeling outputs using proxybased reconstructions is a complex task and strongly depends on the purpose of the proxy data-model comparison study (e.g., the purpose of an evaluation could either target the mean or site-specific changes or it could target relative changes or absolute values, or the purpose could be to infer spatial or temporal climate variability at specific scales). All these types of evaluations require a specific handling of the proxy data and have to be considered for proxy-model comparisons.
Fossil pollen records are well established in their use as a paleoecological and paleoclimatological proxy and are of great value as indicators of past environmental and climatic change over many decades. Considerable efforts have been made to establish regional, continental, and even global data repositories like the North American Pollen Database (NAPD; https:// www.ncei.noaa.gov/products/paleoclimatology, last access: 1 July 2020), the European Pollen Database (EPD; http: //www.europeanpollendatabase.net/index.php, last access: 1 July 2020), and the Neotoma Paleoecology Database (https://www.neotomadb.org/, last access: 1 April 2021; Williams et al., 2018). Pollen data from archives across multiple environmental settings, such as lakes, wetlands, or marine sediments, have been widely used to quantitatively reconstruct past vegetation and climate variables (Birks, 2019;Chevalier et al., 2020). Among land-derived proxy data, pollen is particularly suitable for a temporarily and spatially high-resolution evaluation of climate model simulations of the late Quaternary period. A number of methods have been proposed for making pollen-based climate reconstructions (Chevalier et al., 2020). Among them, classification approaches like the modern analog technique (MAT) or regression approaches like the weighted averaging partial least squares regression (WA-PLS) are most commonly used. MAT and WA-PLS rely on extensive collections of modern spectra. Hence, designing a robust calibration dataset from modern pollen assemblages is a crucial part of the reconstruction process. A suitable calibration dataset should cover a wide range of climatic and environmental gradients in order to represent an empirical relationship between pollen assemblages and climate (Birks et al., 2010;Chevalier et al., 2020). As with fossil pollen records, data syntheses and repositories also exist for modern surface pollen data, e.g., for North America (Whitmore et al., 2005), Eurasia (Davis et al., , 2020, and China (Cao et al., 2013;Herzschuh et al., 2019).
For temperature reconstruction time series, several broadscale syntheses exist; however, they either originate from different proxies (Kaufman et al., 2020a, b) or are restricted to certain continents or regions and/or are poorly documented (Mauri et al., 2015;Marsicek et al., 2018;Routson et al., 2019). Temperature reconstructions from extratropical Asia are mostly lacking. Precipitation syntheses are available from Europe (Mauri et al., 2015), North America (Gajewski et al., 2000), and China and Mongolia (Herzschuh et al., 2019), but hitherto, no global or hemispheric syntheses of quantitative precipitation changes are available for the Holocene.
In a recent effort, we synthesized and taxonomically harmonized pollen records available in the Neotoma Paleoecology Database (Williams et al., 2018) and additional records from China and Siberia (Cao et al., 2013(Cao et al., , 2020 into a global late Quaternary fossil pollen dataset (LegacyPollen 1.0; Herzschuh et al., 2022b) and revised all chronologies of those records using a Bayesian approach that allows for the inference of temporal uncertainties (LegacyAge 1.0; Li et al., 2022). Here, in the third part of a series of interconnected studies, we present the pollen-based reconstruction of the mean July temperature (T July ), mean annual temperature (T ann ), and annual precipitation (P ann ), including the reconstruction and temporal uncertainties in addition to quality measures from 2594 records from the Northern Hemisphere, using WA-PLS and MAT (LegacyClimate 1.0; this study).

Input data
The objective of this study is to create a dataset of quantitative reconstructions of T July , T ann , and P ann , spanning the last 30 kyr and beyond from fossil pollen records. These variables (or variables highly correlated to them) were shown to explain most variance in the modern pollen data (T July ; P ann ) or are typically used in syntheses and proxy-model comparison studies (T ann ). Accordingly, we selected these three variables. We used the fossil dataset compiled in Lega-cyPollen 1.0 (stored on the PANGAEA open-data repository and presented in Herzschuh et al., 2022b) that integrates pollen records archived in the Neotoma Paleoecology Database, a dataset from eastern and central Asia (Cao et al., 2013;Herzschuh et al., 2019), and a dataset from northern Asia (Cao et al., 2020). Ages were taken from the Bacon (Blaauw and Christen, 2011) age-depth models presented in Li et al. (2022;LegacyAge 1.0), and for each record, we provide an ensemble of 1000 realizations of the agedepth model in our data product so that it can be used to account for chronological uncertainty in the reconstructions. As the chronological and reconstruction errors are independent, they can be added in quadrature to obtain the combined error. With this information, users can easily produce curves with all relevant uncertainties (as shown in Appendix Fig. 1).
We compiled the fossil data into four subcontinental datasets for eastern North America (< 104 • W; Williams et al., 2000), western North America, Europe (< 43 • E), and Asia (> 43 • E). We restricted the analyses to the 70 most common taxa on each continent to reduce computational power, after making sure that a higher taxa number would not substantially improve model statistics in climate reconstructions. The number of taxa is limited by the modern training dataset from North America, which contains 70 taxa after applying our taxa harmonization routine (see details in Herzschuh et al., 2022b). We therefore restricted the number of taxa in all fossil datasets to keep the taxa comparable for the reconstructions. To identify the most common taxa, we used Hill's N2 diversity index (i.e., the effective number of occurrences of a species in the dataset; Hill, 1973). For all analyses, square root percentages were used, if not indicated otherwise.
The site-specific T ann , T July , and P ann were derived from WorldClim 2 version 2.1 (spatial resolution of 30 s; ∼ 1 km 2 ; https://www.worldclim.org, last access: 22 November 2021; Fick and Hijmans, 2017), by extracting the climate data at the location of the modern sample sites using the raster package in R (version 3.5-11, Hijmans et al., 2021;R Core Team, 2020). The WorldClim 2 dataset provides spatially interpolated gridded climate data aggregated from weather stations as temporal averages between 1970-2000 (Fick and Hijmans, 2017). We used monthly average temperature data to extract the mean T July and the bioclimatic variables of bio1 (T ann ) and bio12 (P ann ).

Reconstruction methods
Our reconstruction approach included MAT (Overpeck et al., 1985) and WA-PLS (ter Braak and Juggins, 1993), by applying the MAT and WA-PLS functions from the rioja package (version 0.9-21, Juggins, 2019) for R (R Core Team, 2020) on our northern hemispheric fossil pollen synthesis. For each fossil location, we calculated the geographic distance between each modern sampling site and the fossil pollen record using the rdist.earth function from the fields R package (version 10.3, Nychka et al., 2020) and selected a unique calibration set from modern sites within a 2000 km radius. We fixed the radius to 2000 km instead of 1500 km, as suggested in a study in eastern Asia by Cao et al. (2017), because the modern dataset density is rather low in northern Asia. For the reconstruction with MAT, we used the original pollen percentages of the selected fossil pollen taxa, looking for seven analogs between the pollen data and the selected calibration dataset. The dissimilarity between the fossil samples and the modern pollen assemblages was determined by the squaredchord distance of the percentage data (Simpson, 2012;Cao et al., 2014).
In addition to the classic WA-PLS reconstruction, we also propose WA-PLS_tailored. This approach addresses the problem that covariation in climate variables today in space is transferred to the reconstruction, even if the past temporal relationship among the climate variables differs mechanistically. In fact, this approach aims to make use of the full climate space covered by the modern pollen samples, avoiding those samples in the calibration set that cause spatial covariation. This approach is based on the assumption that several climate variables can be reflected in one and the same pollen assemblage because different plant taxa have different optima in temperature and precipitation ranges and might therefore occur with different co-occurrences and abundance patterns. To reconstruct T July , we identified the P ann range reconstructed by WA-PLS and extended it by 25 % to both ends of the modern P ann range in order to reduce the influence of P ann on the T ann and T July reconstruction due to covariation. We applied the same method to the reconstruction of P ann . T ann and T July were tailored by P ann ; P ann was tailored by T July and, additionally, by T ann (illustrated, for an example, in Appendix Fig. 2). Reconstruction uncertainties are provided as root mean square errors (RMSEs) derived from the output in the MAT and WA-PLS functions. Model errors in WA-PLS and MAT are reported as root mean square errors in the prediction (RMSEP) that are derived from the leave-one-out cross-validation.
We provide site-or sample-specific measures of quality, in addition to the error estimates and model statistics, to allow the user to assess the quality of the climate reconstruction dataset. First, we applied a canonical correlation analysis (CCA) to the modern training dataset in order to explore the modern relationship between the pollen spectra and the climate variables and to infer the explained variance in the modern pollen dataset by the target climate variables (ter Braak, 1988) by using the cca function in the vegan R package (version 2.5-7, Oksanen et al., 2020). The ratio between the constrained (λ 1 ) and unconstrained (λ 2 ) explained variance was determined for all modern training datasets used for climate reconstructions. High values of λ 1 vs. λ 2 ( 1) are commonly considered to be an indicator to measure how well the target environmental variable relates to the variation in the modern pollen dataset (e.g., Juggins, 2013). However, most training datasets encompass multiple environmental variables that are often correlated, and additional requirements for such variables would be necessary to explain a significant and independent portion of the variation in the training dataset. While a careful design of the training dataset can help reduce the effect of correlated environmental gradients, it can never eliminate them completely (Juggins, 2013). To infer the analog quality as an indicator of non-analog situations, we calculated the minimum dissimilarity (squaredchord distance) between modern pollen assemblages and fossil pollen assemblages with probability thresholds of 1 % (indicating very good analogs), 2.5 % (good analogs), and 5 % (poor analogs), using the minDC function from the analogue R package (version 0.17-6; Simpson et al., 2021).
A statistical significance test (Telford and Birks, 2011) was applied using the randomTF function in the palaeoSig R package (version 2.0-3; Telford, 2019). In this test, the proportion of variance in the fossil pollen data explained by the reconstructed environmental variable is estimated from redundancy analysis (RDA) and tested against a null distribution generated by replacing the modern training dataset with randomly generated surrogate fields. The surrogate fields were simulated to have realistic spatial autocorrelation by fitting variograms to the WorldClim 2 temperature and precipitation data; 1000-member ensembles were simulated for each variable. A reconstruction is considered statistically significant if the reconstructed variable explains more of the variance than 95 % of the random reconstructions (Telford and Birks, 2011). The reconstructed climate variables were tested as introducing the environmental variable as a single variable in a run, in addition to partialling out the explained variance in the pollen data by the respective other variables.
We used Plantaginaceae (mostly representing the Plantago lanceolata type in Europe) and Rumex type to assess human influence as an indicator of intense herding (Behre, 1988). In addition, we calculated the correlation between the WA-PLS reconstruction of T July , T ann , and P ann and the pollen percentages of Plantaginaceae and Rumex for 9000, 3000 and 1000 cal years BP to assess potential biases in the dataset.
3 Dataset description LegacyClimate 1.0: input data, reconstructions, and reconstruction model statistics LegacyClimate 1.0 provides pollen-based reconstructions and sample-specific reconstruction errors in T ann , T July , and P ann for 2594 fossil pollen records (i.e., a total of 146 067 single pollen samples) from three reconstruction methods (WA-PLS, WA-PLS_tailored, and MAT). Furthermore, we provide the method-specific model metadata and quality measures for each record and each climate variable (Table 1). To ease data handling, the dataset files are separated into western North America, eastern North America, Europe, and Asia.

Dataset assessment
4.1 Spatial and temporal coverage of LegacyClimate 1.0 In total, we provide reconstructions for 2594 fossil pollen records. Among them, 670 records are located in eastern North America, 361 records in western North America, 1075 records in Europe, and 488 records in Asia (Fig. 1). Some records are included that come from marine cores which were taken from the continental shelf. They contain information from source areas from the nearby continents (e.g., fluvially transported material). If users want to focus on terrestrial-only records, then those marine sites could be filtered out by the archive type provided in the metadata. Climate reconstructions for one fossil record in the western North American Dataset on Hawaii (dataset ID 17832; Keālia Pond) could not be performed, as there were no modern training data available within a 2000 km area.
The temporal coverage of the records is rather uneven. A total of 75 and 666 records cover the periods between 30-29 and 15-14 cal kyr BP, respectively (Fig. 2).

Modern relationships between pollen and climate assessed by constrained ordination
Results from CCA applied to modern datasets reveal that T July -constrained ordinations have high λ 1 /λ 2 ratios, indicating a strong relationship between this climate variable and modern pollen assemblages in eastern North America, while low ratios can be found in central Asia. The spatial pattern of λ 1 /λ 2 of ordinations constrained by T ann is similar overall to those of T July , but the ratios are slightly higher for T ann than for T July . Reconstructions for P ann show low ratios in Europe and eastern North America. Areas with high ratios are concentrated in Alaska and East Asia (Fig. 3).

Analog quality
The dissimilarity (squared-chord distance) between modern pollen assemblages and fossil pollen assemblages was calculated and extracted for distinct time slices at 9000, 6000, and 3000 cal years BP. In total, 36.4 % (9000 cal years BP), 39.2 % (6000 cal years BP), and 45.6 % (3000 cal years BP) records indicate a very good (< 1 %) analog quality. The central part of the North American continent, Scandinavia, and central Asia show a very good analog quality for all time slices investigated. Poor (< 5 %) analogs can be found in western Europe, the eastern parts of the United States, and along the East Asian coastline. Nonanalogs (> 5 %) are found for 22.6 % (9000 cal years BP), 20.47 % (6000 cal years BP), and 12.5 % (3000 cal years BP) records, respectively, especially in western Europe, and at 9000 cal years BP in Alaska (Fig. 4). Relative to the modern temperature range, the RMSEP from this region also reveals the lowest fraction. In general, MAT has the lowest mean error fraction relative to the modern temperature range of all three methods. The mean RMSEPs of T July are 1.90±0.63 (MAT), 2.50± 0.73 (WA-PLS), and 2.21 ± 0.75 • C (WA-PLS_tailored) and mean percentages of the T July range are 8.11 ± 1.64 % (MAT), 10.71±1.94 % (WA-PLS), and 10.70±2.60 % (WA-PLS_tailored). Thus, they are slightly smaller than those of T ann but slightly larger as a percentage of the range. The spatial patterns are, however, largely similar to those of T ann .

Prediction errors found in
The mean RMSEPs of P ann are 176.38 ± 51.40 (MAT), 244.48 ± 75.84 (WA-PLS), and 232.71 ± 98.57 mm (WA-PLS_tailored) and mean percentages of P ann range are 6.78± 1.48 % (MAT), 9.27±1.70 % (WA-PLS), and 10.26±2.67 % (WA-PLS_tailored). High RMSEPs are found for western North America, Europe, and along the coastline of Southeast Asia, while the lowest RMSEP values are found for central Asia. A clear division in RMSEPs is found on the North American continent; while the western part of North America (with the exception of Alaska) has a rather high RMSEP, the eastern part of North America has a smaller RMSEP. This pattern is found for all three methods (Fig. 5).

Human impact
We used the abundance of Plantaginaceae and Rumex as indicators of grazing and intense animal husbandry. An overall weak human impact is inferred for North America and northern Asia. The indicators show a strong human impact only in individual records at 9000 cal years BP in China and the Mediterranean region (Fig. 7). The percentage values of Plantaginaceae and Rumex were high, especially in Europe for 3000 and 1000 cal years BP, which indicates the growing  human impact on that region. High Plantaginaceae correlate with low T July and high P ann in central Europe, indicating potential biases in the temperature reconstructions (i.e., temperatures that are too low become reconstructed). Similar corre-lations are found for Rumex, especially in northern Europe (Fig. 8).  To illustrate the difference between mid and late Holocene climate, we calculated the value for the three climate variables at 6 and 1 cal kyr BP, each time taking the average of the interpolated values at those ages for the ensemble of 1000 realizations of the age-depth models (Li et al., 2022). Differences between these time slices reveal warmer and drier conditions during the mid-Holocene compared with late Holocene conditions, especially in eastern North America and also in central and northern Europe. The overall patterns are in good agreement for all three methods but show dif- ferences on a regional scale, especially when comparing the reconstructions with WA-PLS and MAT. For T July , the reconstruction with MAT shows greater temperature differences in western North America and Southeast Asia. Compared to the reconstruction with WA-PLS, there is a reduced cooling from 6 to 1 cal kyr BP in eastern Europe and a warming instead of a cooling in the western Mediterranean region and along the Southeast Asian coastline in MAT. For large areas in North America and Europe, the reconstructions with WA-PLS sug-gest an increase in precipitation from 6 to 1 cal kyr BP. A shift to drier conditions can be found along the southeastern coastline in North America, in the Mediterranean Region, and especially in Southeast Asia. The reconstruction with MAT reveals a gradient from increasing precipitation in southwestern Europe to decreasing precipitation in northeastern Europe. In contrast to the reconstructions with WA-PLS, records along the Southeast Asian coastline suggest an  increase in precipitation with MAT rather than a decrease (Fig. 9). Time series of absolute T ann reconstructions reveal temporal in addition to latitudinal spatial variation on the single continents. Eastern North America and Asia show the most variation in the low latitudes. It is also eastern North America that shows the most pronounced latitudinal gradient. In western North America, the most variation takes place in the high latitudes, while the variation is concentrated to the midlatitudes in Europe. Especially in North America, the warming since the last deglaciation and the beginning of the Holocene is clearly shown in the temporal variation in the time series (Fig. 10).

Assessment of consistency among reconstruction methods
Reconstructions with MAT are, in general, in good agreement with those derived from the WA-PLS. Comparing MAT with WA-PLS, 37.3 % (T July ), 38.9 % (T ann ), and 30.4 % (P ann ) of all records have a positive correlation of r 0.6. Strong positive correlations (r >= 0.9) can mainly be identified in eastern North America, while a weak correlation can be found for large areas in central North America and most of Europe (Fig. 11). WA-PLS_tailored used a reduced modern training dataset (illustrated for an example in Appendix Fig. 2). The tailoring successfully reduced the covariation in temperature and precipitation in the modern dataset, as indicated by the distribution of the correlation coefficient in Fig. 12. Nevertheless, the obtained reconstructions are largely consistent between WA-PLS and WA-PLS tailored, with a correlation of r 0.9 found for 59.2 % of all records for T July , 60.7 % for T ann , and 56.5 % for P ann . Maps showing mean July temperature (T July ), mean annual temperature (T ann ), and annual precipitation (P ann ) records that passed the reconstruction significance test (p < 0.2). Colors indicate the significance level. Records that did not pass the significance level (p 0.2) are shown as gray rectangles.

Data availability
The compilation of reconstructed T July , T ann , and P ann is open access and available from PANGAEA (https://doi.org/10.1594/PANGAEA.930512; see the "Other version" section; Herzschuh et al., 2023a). The dataset files are stored in machine-readable data format (.CSV), which are already separated into western North America, eastern North America, Europe, and Asia for easy access and use.

Code availability
The R code to run the reconstructions for single sites is available from Zenodo (https://doi.org/10.5281/zenodo.7887565; Herzschuh et al., 2023b), including harmonized open-access modern and fossil pollen datasets, so that customized reconstructions can be easily established.

Discussion
7.1 Impact of the fossil pollen data source on LegacyClimate 1.0 quality LegacyClimate 1.0 contains reconstructions of climate variables from fossil pollen data derived from open-access data repositories. The fossil records were derived from multiple natural archives (most commonly, assemblages from continuous lacustrine and peat accumulations; Herzschuh et al., 2022b). Different sizes of lakes and peat areas result in varying sizes of pollen source areas and thus the spatial representativeness of a record. While small lakes and peatlands are considered to provide information about the (extra-)local scale, the regional signal is better represented in pollen assemblages from large lakes (Jackson, 1990;Sugita, 1993). However, taphonomic changes in the records originating, for example, from lake-level changes may impact the reconstructed climate. Pollen from azonal riverine vegetation might be over-represented in fluvially impacted pollen records. Our dataset is based on taxonomically harmonized modern and fossil pollen datasets using a restricted number of taxa. Such an approach guarantees that all records are handled consistently. Although losing taxonomic information when merging taxa together into a higher taxonomic level, it also increases the possibility of matching climate analogs in the modern and the fossil datasets. However, one needs to keep in mind that species with different ecological requirements may be merged together into one genus or family, for example, Pinus species that are restricted to tropical or subtropical areas in China or ones that grow in boreal forests (Cao et al., 2013;Tian et al., 2017).
Along with the pollen assemblages, data repositories also provide chronological information for fossil records. The quality of such chronologies varies strongly with respect to dating methods, calibration, and numerical algorithms for determining an age-depth relationship (Blois et al., 2011;Trachsel and Telford, 2017). Having accurate and precise chronologies is thus of pivotal importance for reconstructing past climate in order to identify spatiotemporal patterns and in order to help evaluate climate model outputs. The advantage of the fossil pollen dataset used for the reconstruc-tion presented here (i.e., LegacyPollen 1.0; Herzschuh et al., 2022b) is that it has harmonized chronologies (LegacyAge 1.0), along with information about uncertainties, in addition to related metadata and scripts that allow a customized reestablishment of the chronologies (Li et al., 2022). Accordingly, we were able to provide sample-specific age uncertainties along with reconstruction uncertainties.

Modern pollen and climate data sources and LegacyClimate 1.0 quality
We a priori selected T July , T ann , and P ann as target variables for our reconstructions. However, we provide λ 1 /λ 2 (i.e., explained variance of the climate variable in the modern pollen dataset relative to the variance explained by the unconstrained first axis; ter Braak, 1988), a commonly used proxy for the assessment of reconstructions. The higher the λ 1 /λ 2 in the spatial modern dataset, the higher the chance that this target climate variable has also impacted vegetation over time and is thus reflected in the variation in the fossil pollen dataset. As a rule of thumb, a ratio of 1 is considered to indicate reliable reconstructions (Juggins, 2013), though useful reconstructions may also be obtained from datasets with lower values. As expected, maps of RMSEPs reveal similar spatial patterns as the results of constrained ordination. Our results indicate that calibration sets from Europe in particular have low ratios and a high RMSEP for all climate variables (despite having a high number of modern samples), likely related to the human impact on the modern and fossil data. Some areas that are known for their sensitivity to precipitation, e.g., East Asia, show low RMSEPs as expected for P ann on the one hand but show a low sensitivity to T ann and T July on the other. For our study we used the, to our knowledge, largest modern dataset ever used in a pollen-based climate reconstruction. For fossil pollen records in areas with an insufficient coverage of modern surface pollen samples (e.g., central Asia or western Siberia), it might be difficult to create a calibration dataset that maps the required variety of environmental and climatic gradients and therefore find enough modern analogs for reconstructions with a classification approach such as MAT. This is indicated by the high RMSEPs as percentages of gradient length in these areas. Our routine uses the modern pollen data from within a radius of 2000 km around the site of the fossil record. The information provided in the reconstruction metadata, including number of modern pollen samples and ranges of reconstructed variables, allows an assessment of the modern dataset used for reconstruction. Our assessments of the modern dataset (e.g., using CCA), the transfer function (e.g., RMSEP), and the reconstruction (e.g., the significance test) also revealed the potential biases in the pollenbased reconstruction and pointed to limitations. Further validation and assessments of the results and more comprehensive uncertainty analyses, e.g., by applying forward modeling approaches (Izumi and Bartlein, 2016;Parnell et al., 2016), would be highly valuable.

Reconstruction method and LegacyClimate 1.0 quality
Overall, the three reconstruction approaches, MAT, WA-PLS, and WA-PLS_tailored, yield rather similar results (i.e., indicated by the overall high correlation between the reconstructions of the different methods; Fig. 11). Accordingly, the major trends at global or continental scales are similar, even if the actual amplitude of change may vary locally. As  each method has its own strengths and weaknesses, there is not one set of reconstructions that is absolutely superior. One advantage of our multi-method reconstruction dataset is that users can identify the methods that are likely to perform best in a selected region and/or specific reconstructions. MAT is often recommended for large-scale studies, but it is highly sensitive to the quality of analogs (Chevalier et al., 2020). Low-analog situations can arise from two causes, namely climate conditions that differ strongly from today (e.g., the low atmospheric CO 2 concentration during the Last Glacial Maximum (LGM; ca. 19.0-26.5 cal kyr BP; Clark et al., 2009;Li et al., 2022) or in regions with limited modern samples (e.g., extratropical Asia). Furthermore, growing human influence on the landscape since the mid to late Holocene, especially in densely settled regions in Europe, contributed to gaps within the potential bioclimatic space of taxa and probably also led to extinction events, especially for disturbance-dependent taxa (Zanon et al., 2018). We report the analog distance for each sample to help identify such situations. From our assessments, we revealed that the analogs' quality is rather good overall, at least for the Holocene, except for western Europe and the British Isles in particular (Fig. 4).
In contrast to MAT, WA-PLS (and most regression techniques in general) model relationships between pollen and climate and are, as such, less sensitive to the low-analog situations (Birks et al., 2010). They are, however, based on some modeling assumptions, such as the unimodality of the response of the pollen taxa to climate (ter Braak and Juggins, 1993). This condition is not always met at the continental scale, primarily because of the limited taxonomic resolution of pollen data that merges several plant species with distinct climate requirements as one single pollen taxon. WA-PLS_tailored has the same limitation, but it has the advantage of reducing the influence of the correlation between variables when reconstructing, for instance, temperature and precipitation. This may be particularly relevant for regions with a temperature-moisture-driven circulation system, such as the East Asian summer monsoon (EASM), which can heavily affect precipitation patterns in certain regions (Herzschuh et al., 2019). Using WA-PLS_tailored also increases the number of records that pass a significance level of p < 0.1 (Telford and Birks, 2011). Providing several reconstructions based on different assumptions also allows exploring, even if only partially, the uncertainties associated with the modeling assumptions (e.g., MAT vs. WA-PLS, the number of analogs, and the type of metric used to compare pollen samples).
The significance tests (see Telford and Birks, 2011) revealed a rather low percentage of reconstructions to be substantial (p < 0.1). However, a failed significance test does not necessarily mean that the reconstruction is not reliable, but the results should be treated more cautiously, as the Telford-Birks test is rather conservative (Luoto et al., 2014;Hébert et al., 2022). Several reasons for possible false negative errors are reported and discussed in the literature, including the test supposedly being sensitive to the size of the training data, a low magnitude of an input climate signal, the trajectory of the core samples through calibration space, or poor analog situations (Luoto et al., 2014;Self et al., 2015;Andrén et al., 2015;Hébert et al., 2022).
All reconstruction methods used in this study heavily rely on extensive collections of modern assemblage data covering diverse climatic and environmental gradients and are applicable on a broad spatial scale. As discussed, all the methods may struggle with some intrinsic characteristics of pollen data and of pollen compilations, including complex species responses, sensitivity to spatial autocorrelation, and limited analogs that may produce poor results in so-called "quantification deserts" (Chevalier, 2019), where fossil pollen is hardly preserved or nearby modern surface pollen samples are missing (Chevalier et al., 2020). However, we designed our datasets so that more methods can be included in our reconstruction scripts (https://doi.org/10.5281/zenodo.7887565; Herzschuh et al., 2023b), such as CREST, an approach that combines presence-only occurrence data from species distribu- Figure 11. Correlation between the time series of the three different reconstruction methods used, namely the weighted averaging partial least squares using a global training set (WA-PLS), WA-PLS using a training set with a limited modern climate range (WA-PLS_tailored), and the modern analog technique (MAT), for the three climate variables of mean July temperature (T July ), mean annual temperature (T ann ), and annual precipitation (P ann ). tion databases instead of modern pollen samples to estimate the responses of pollen taxa to the climate variable to reconstruct to a climate variable (Chevalier et al., 2014;Chevalier, 2022). CREST is, therefore, more independent from the availability of modern pollen samples. Employing inversemodeling through iterative forward modeling (IMIFM; Izumi and Bartlein, 2016) might also be possible in such regions. Its use would be particularly interesting to reconstruct the LGM samples because IMIFM is the only technique that can explicitly take the effect of CO 2 on plants into consideration (Chevalier et al., 2020). The inclusion of CREST and/or IMIFM in such large-scale studies would complement our multi-model reconstruction ensemble by exploring a larger fraction of the "method uncertainty" space in greater detail (e.g., Brewer et al., 2008). Kucera et al. (2005) propose several metrics for a multi-technique approach to assess the uncertainty space, namely correlations between the residuals (observed minus reconstructed values) and between pairs of techniques are used to investigate the similarity in the reconstructions among different techniques. The correlation between the residuals in seasonal reconstructions (e.g., summer and winter temperatures; summer and annual temperatures) can be used to investigate the degree of independence of different seasonal reconstructions. Error rate estimates (RMSEP) determined by the cross-validation of the calibration datasets and the leave-one-out method can be used to compare the calibration of individual transfer function techniques, though it should be considered that error estimates may vary with the choice of the cross-validation procedure (Kucera et al., 2005).

Potential use of LegacyClimate 1.0
Our LegacyPollen 1.0 fossil pollen synthesis (Herzschuh et al., 2022b) contains records from all over the Northern Hemi-sphere extratropics. We used this synthesis to produce our LegacyClimate 1.0 reconstruction dataset, which thus can be used to infer spatiotemporal patterns in climate reconstructions that are not limited to a local or regional scale. Although several hemispheric or global reconstruction studies exist, they have been largely restricted to temperature or have included relatively few records (Marcott et al., 2013;Marsicek et al., 2018;Routson et al., 2019;Kaufman et al., 2020a, b). Our dataset is therefore a valuable addition. It may be used in a multi-proxy approach, to synthesize marine and terrestrial records in order to assess temperature development during the Holocene, and can help to highlight possible interdependencies between oceans and land masses. Globally or hemispherically averaged temperature reconstructions from proxy data indicate peak temperatures during the Holocene thermal maximum around 6000 ka, followed by a pronounced cooling trend toward the late Holocene, which is also visible in our pollen-based reconstructions (Fig. 10). Hence, spatial variability in the Holocene temperature trends (e.g., missing of a pronounced maximum for certain latitudinal bands; delayed thermal maximum on land compared to the ocean) indicate a more local rather than a global Holocene thermal maximum (Kaufman et al., 2020b;Osman et al., 2021;Cartapanis et al., 2022). In contrast, climate models simulate a monotonic warming throughout the Holocene, which resulted in the Holocene conundrum debate (Liu et al., 2014). The debate has since progressed and hints at discrepancies in data-model comparisons due to spatiotemporal dynamics related to heterogeneous responses to climate forcing and feedbacks (i.e., the timing of a Holocene thermal maximum in the Northern Hemisphere extratropics between reconstructions from continental and from marine proxy records; Cartapanis et al., 2022) and sometimes poor spatial averaging due to unevenly distributed proxies. Proxy-only reconstructions often rely on latitudinal binning and weighting, which makes this approach particularly sensitive to latitudinal bands that contain only sparse spatial coverage and thus do not represent a true global average (Osman et al., 2021). Those spatiotemporal dynamics should be considered in a data-model comparison.
Temperature reconstructions often use either mean annual temperatures (Birks, 2019;Bova et al., 2021) or global mean surface temperatures (Marcott et al., 2013;Marsicek et al., 2018;Kaufman et al., 2020a, b). Despite T ann being more commonly used in multi-proxy comparisons, it might be useful to also consider T July , as on a regional scale the mean July temperature (or in general summer temperature) is more important in high latitudes in particular. However, it is argued that proxy-based climate reconstructions are seasonally biased and therefore might be the reason for the observed proxy-model divergence (Liu et al., 2014;Rehfeld et al., 2016;Kaufman et al., 2020b). In this respect, it might help that we provide T July along with T ann reconstructions derived from our tailoring approach, which provides the opportunity to assess seasonal impacts on the reconstruction (especially in the high latitudes) in addition to a consistent reconstruction synthesis.
So far, reconstructions of precipitation have not been implemented on a hemispheric scale. The interconnection between temperature and precipitation (Trenberth, 2011) and its spatiotemporal variation across the Northern Hemisphere is therefore an important aspect of evaluating climate models (Wu et al., 2013;Hao et al., 2019;Herzschuh et al., 2022a). A broad-scale quantitative reconstruction of temperature and precipitation would therefore be of great value for evaluating transient climate model experiments such as TraCE-21K (He, 2010). Figure A1. Reconstruction error (shaded blue) and the chronological error (shaded red) around the reconstruction smoothed by the time uncertainty (i.e., when we interpolate at regular time steps for the 1000 realizations and average over the ensemble; dashed white). The original reconstruction with the median ages is also shown for comparison (solid white); this underlines the point that averaging over the age models only preserves the low frequencies but (unrealistically) smooths out the high frequencies. Figure A2. Example to illustrate the effect of tailoring the modern dataset for the location of Yellow Dog Pond (dataset ID 3047) in eastern North America. (a, b) Reconstruction of T July and P ann with WA-PLS (red) and WA-PLS_tailored (blue). (c, d) Correlation of T July and P ann in the modern dataset and the effect of tailoring the modern dataset (indicated with the red box). Correlations are given for non-tailored (red) and tailored (blue) data.

Appendix A
Author contributions. UH designed the study and reconstruction dataset. CL and TB compiled the metadata and the harmonized pollen dataset. TB wrote the R scripts and ran the analyses under the supervision of UH. UH, TB, and MC wrote the first draft. All authors discussed the results and contributed to the final version of this paper.

Competing interests.
The contact author has declared that none of the authors has any competing interests.
Disclaimer. Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Acknowledgements. We would like to express our gratitude to all the palynologists and geologists, who, either directly or indirectly by providing their work to the Neotoma Paleoecology Database, contributed pollen data and chronologies to the dataset. The work of the data contributors, the data stewards, and the Neotoma community is gratefully acknowledged. We thank Andrej Andreev, Mareike Wieczorek, and Birgit Heim from the Alfred Wegener Institute for providing information on pollen records and data uploads. We also thank Cathy Jenks for copy-editing a previous version of this work.
Review statement. This paper was edited by Hanqin Tian and reviewed by Patrick Bartlein and one anonymous referee.