LegacyClimate 1.0: a dataset of pollen-based climate reconstructions from 2594 Northern Hemisphere sites covering the last 30&thinsp;kyr and beyond

Herzschuh, Ulrike; Böhmer, Thomas; Li, Chenzhi; Chevalier, Manuel; Hébert, Raphaël; Dallmeyer, Anne; Cao, Xianyong; Bigelow, Nancy H.; Nazarova, Larisa; Novenko, Elena Y.; Park, Jungjae; Peyron, Odile; Rudaya, Natalia A.; Schlütz, Frank; Shumilovskikh, Lyudmila S.; Tarasov, Pavel E.; Wang, Yongbo; Wen, Ruilin; Xu, Qinghai; Zheng, Zhuo

doi:https://doi.org/10.5194/essd-15-2235-2023

Articles | Volume 15, issue 6

https://doi.org/10.5194/essd-15-2235-2023

© Author(s) 2023. This work is distributed under
the Creative Commons Attribution 4.0 License.

https://doi.org/10.5194/essd-15-2235-2023

© Author(s) 2023. This work is distributed under
the Creative Commons Attribution 4.0 License.

Articles | Volume 15, issue 6

Data description paper

|

02 Jun 2023

Data description paper |

| 02 Jun 2023

LegacyClimate 1.0: a dataset of pollen-based climate reconstructions from 2594 Northern Hemisphere sites covering the last 30 kyr and beyond

Ulrike Herzschuh, Thomas Böhmer, Chenzhi Li, Manuel Chevalier, Raphaël Hébert, Anne Dallmeyer, Xianyong Cao, Nancy H. Bigelow, Larisa Nazarova, Elena Y. Novenko, Jungjae Park, Odile Peyron, Natalia A. Rudaya, Frank Schlütz, Lyudmila S. Shumilovskikh, Pavel E. Tarasov, Yongbo Wang, Ruilin Wen, Qinghai Xu, and Zhuo Zheng

Download

Final revised paper (published on 02 Jun 2023)
Preprint (discussion started on 10 Feb 2022)

Interactive discussion

Status: closed

RC1:
'Comment on essd-2022-38', Anonymous Referee #1, 07 Mar 2022
Review of LegacyClimate 1.0: A dataset of pollen-based climate reconstructions from 2594 Northern Hemisphere sites covering the late Quaternary by Herzschuh et al.

The authors provide temperature and precipitation reconstructions based on pollen assemblage time series. They provide three different types of reconstructions and provide a clear description of the methods. The dataset is highly valuable and the manuscript is clearly written and the figures are of high quality (if sometimes a bit small). The manuscript seems to be part of a set of articles (a trilogy?): a manuscript describing the raw pollen data, a manuscript dedicated exclusively to the chronology and the present manuscript about the pollen-derived climate reconstructions. I can to some degree follow the rationale of the sequence, but I think this (last?) article would benefit from a closer integration with the article describing the chronology. The chronology, and importantly its uncertainty, is an integral part of the climate reconstruction that the authors present here. In addition, I have some further recommendations and points that require clarification in a revised manuscript.

Major issues

Integration with chronology:

this manuscript focuses entirely on the reconstruction of temperature and precipitation, yet the time series also have a chronology with associated uncertainty. By separating these two aspects into two manuscripts it becomes unclear how the full uncertainty of the paleoclimate time series can be derived. Looking at the data (on pangaea.de) it seems that the provided error only accounts for the reconstruction, not for the chronology. This is not the full story and the manuscript would be tremendously improved if the authors made this third manuscript of the sequence a true integration of the papers on the chronology and the climate reconstruction. In L341-343 the authors even touch on this possibility, but they refrain from taking the logical next step that would make the data product more useful for other researchers.

This means that the first order analysis of the time series as shown in figures 5 and 6 should include some combined error resulting from the reconstruction and the chronology and a clear description of the methodology to combine these errors. The provided data sets should also contain uncertainties that reflect both the chronological and the reconstructions errors. This is not a complicated step, but would massively improve the value of the data product.

Meaning of reconstruction differences:

The authors also mention other reconstruction methods (L372), which begs the question why MAT and WA-PLS were chosen. Only because they are widely used, or because they yield superior results?

In addition, the authors provide three different reconstructions for each time series. What I miss is a discussion of how these different reconstructions can be used. Does the difference between them represent additional uncertainty on the reconstruction? How should the user include or use this information? Are certain reconstruction methods better than others? If so, which is to be preferred? If not, how can the (information from the) reconstructions be combined?

Reconstruction quality:

The CCA suggests that only some part of the variance in the training sets is explained by T and precip and the significance testing indicates that a shocking 60-70 % of the reconstructions are basically noise. Whilst the authors go some way and filter out the time series that do not pass the significance test, I feel that the authors hardly mention this, let alone discuss. I also realise that this manuscript should not analyse the data, but perhaps some discussion in place and the different ways in which (pollen) assemblages could be used in paleoclimate science, including forward modelling, could be highlighted.

Land use issues/human influence:

Some of the time series must bear an imprint of human influence. Can the authors briefly discuss to what degree and if and how this influences the reconstructions?

Insufficient explanation and detail in the methods:

2,000 km radius for training set. Please explain why this was done and why the distance is (globally) appropriate.

Why were seven analogues used for MAT? Are the reconstructions weighted to analogue quality, or simply the arithmetic mean of the seven closest analogues?

How is the calibration error determined? Was spatial autocorrelation taken into account? From the code it seems that this is not the case, why?

What is the sample-specific error based on? Why is this provided and not the calibration error?

If I am correct, the tailoring approach serves the purpose of reducing the effect of co-variation between T and P. Please mention this earlier in the methods. I understand the point and that this goes some way to alleviating the problem. But what is done in cases where the correlation is not reduced? After all, there still is a large proportion of the sites for which there is a marked correlation in the training set. Some discussion would be appropriate here.

Please provide more detail on the significance test. How were the random environmental fields generated? Simple permutation, or taking spatial correlation into account. Why?

Why were the tailoring and the significance testing not applied to the MAT reconstructions?

The CCA seems to be the first step in the development of the transfer function model to demonstrate that T and Precip really explain the variance in the assemblages. Would it not be better placed earlier in the description? And why are the implications barely discussed?

How are poor analogues treated? Do they occur at all after the lumping? There is some discussion in L327-332, but it is unclear what the user of the data can do with this information.

Minor issues:

L3: reconsider the use of “late quaternary” in the title. The meaning is actually rather vague and something along the lines of 30,000 years would be more informative.

L108: not sure what the policy is to refer to submitted manuscripts.

L131: please provide a bit more detail on WorldClim 2. For instance, what are the data based on, over what period are the data integrated, etc.

L385: crucially, this manuscript does not describe a fossil pollen data set, but a data set of temperature and precip

L402-404: this seems a somewhat dangerous statement. Are the two reconstructions really independent?

Why is the x axis of figure 6 on a log scale?

Whilst glancing through the code I missed the significance testing and the CCA. (But thumbs up for sharing the code.)
Citation: https://doi.org/10.5194/essd-2022-38-RC1
- AC1: 'Reply on RC1', Chenzhi Li, 17 Oct 2022
  
  The comment was uploaded in the form of a supplement: https://essd.copernicus.org/preprints/essd-2022-38/essd-2022-38-AC1-supplement.pdf
  
  Citation: https://doi.org/10.5194/essd-2022-38-AC1
RC2:
'Comment on essd-2022-38', Patrick Bartlein, 09 Mar 2022

General comments:

This paper describes a set of pollen-based climate reconstructions for the Northern Hemisphere from the LGM to present. The paper is obviously one of three, one describing the pollen data (Herzschuh et al., submitted, which I couldn’t find), another describing the chronology (Li et al., 2022, ESSD-Disc), and this one, describing the reconstructions. There are obvious redundancies among the papers, and I think readers and potential users of the data will find it frustrating to have to track down three papers.

Overall, the paper is not that well organized, with motivations for some of the analyses (e.g. CCA) not appearing until the results section (Section 4, titled “Dataset assessment”), and tutorial material on the nature of pollen data as a palaeo-archive appearing in the discussion, as opposed to the introduction (and presumably also in the first paper of the series, which, with good cross-referencing among the papers, would make it superfluous here). Perhaps this disorganization arose in parting-out the papers.

There are several overarching issues and questions that should be addressed:

Why were January and annual temperature and annual precipitation chosen as the targets for reconstruction? A more appropriate set of climate variables might be those that mechanistically control vegetation like winter cold, summer warmth, and moisture stress. A lot of the paper is devoted to dealing with the obvious correlation between annual temperature and precipitation, but it is never actually established why this is an issue.

What was the role of the canonical correlation analysis? To simply explore the data perhaps, but in fact it represents an alternative reconstruction approach. In any case, it’s neighther clear what the purpose of the analysis is, nor are the results fully explained.

The two reconstruction approaches, weighted-averaging – partial-least-squares (WA-PLS) and the modern analogue technique (MAT) may be frequently applied, but they are not without issues themselves. WA-PLS, as is the case with some other methods, tends to “compress” the reconstructions toward the center of the distribution of the climate data (see Liu et al, 2020, Proc. Royal Soc. A, https://doi.org/10.1098/rspa.2020.0346). This will reduce the amplitude of the time series of the reconstructions. MAT suffers from the no-analogue problem, typically diagnosed by looking at the dissimilarities. The performance of the two approaches are examined in Fig. 3, but there is no attempt to account for the obvious spatial patterns.

A number of the analysis steps are not explained much at all, with the results just briefly described before moving on. In particular, the significance testing in Section 4.2 isn’t fully explained: What is the “take-home message”? What does this analysis say about the usefulness of the reconstructions.

The results are described in terms of mid-Holocene minus present (1.5 to 0.5 ka) long-term mean differences, and some unusual time series plots, but there is no attempt to assess the reasonableness of the reconstructions with respect to paleoclimatic first principles or to compare them with simulations or independent observations.

I think these issues are all basically addressible, and with a little overhauling (i.e. no new analysis, just more complete explanation and discussion), the paper(s) will make a useful contribution.

Specific comments:

line 62: “climate proxy synthesis studies”. Do you mean “syntheses of climate reconstructions” or “syntheses of climate proxies” (i.e. the pollen data)? It’s the former that can be directly compared with climate-model output.

line 71: “The evaluation of climate model outputs…” It’s actually the climate models that are being evaluated in data-model comparisons (of simulations and observations or reconstructions).

line 73: “strong changes in the climate driver” Are you alluding to changes in GHGs during the instrumental record? Changes in insolation, ice-sheet distribution and size, and GHGs between the LGM and present are much larger. For example, the companion CMIP experiment to the LGM is the 4xCO₂ experiment. CO₂ has yet to double from pre-industrial levels yet.

line 74: “The extratropical Northern Hemisphere … complex spatial and temporal … patterns.” Well, yes, but it’s also where most of the pollen data is from. I don’t think you need to motivate focusing on the Northern Hemisphere extratropics.

line 90: “Regarding the prevalence…”. Just say “Pollen data from … have been used…”

line 94: “high resolution”. Temporal? Spatial? Also, the last millennium is part of the Holocene, and the late-Quaternary, so you might get some push-back from dendroclimatologists about this notion.

line 102: delete “the large” (I think we know extratropical Asia is large area.)

line 103: Whitmore et al. (2005) describes the modern pollen (and climate) data set for North America, not (paleo) precipitation reconstructions.

line 108: If “Herzschuh et al., submitted” is “LegacyPollen 1.0: A taxonomically harmonized global…” then how is that different from this paper (and the data sets on Zenodo)? Does it describe just the fossil-pollen data, or the modern data set too?

line 110: “Li et al., 2022). So there are three papers, 1) the pollen data set, 2) new chronologies, and 3) this paper, right? Why not just say that?

line 116: Why reconstruct temperature and precipitation, as opposed to climate variables that are mechanistically related to vegetation?

line 136: “For consistency with the amount (number?) of taxa…”. This needs to be a little better explained. Why 70 taxa (except for tradition)?

line 147: “2000 km radius”. Why 2000 km?

line 150: “metrics”. Meaning something other than just the squared-chord distance?

line 151: “square-root transformed pollen percentages”. It might be worth pointing out that the same transformation is embedded in the use of the squared-cord distance dissimilarity measure in the MAT approach.

line 156: “co-variation”. Why is this an issue? It might be the case that covariation among predictands wouldn’t be an issue if they were mechanistically related to vegetation, as in the case of variables like MTCO and GDD (Wei, et al., 2020, Ecology http://dx.doi.org/ 10.17864/1947.194

line 161: “… partialling out the respective other variable”. Please explain.

line 161: “We applied a Canonical Correlation Analysis…”. What were the community, constraining, and conditioning matrices in this analysis? More to the point, what was the objective of this analyisi?

line 164: “the ratio … was determined…”. Why and for what purpose?

line 191: Define “RMSEP” on first use in the text.

lines 190-220: What accounts for the spatial variations in RMSEPs? Data density? Data quality (of both the pollen and climate data)? Confounding environmental factors?

line 221: “significance test”. Of what? What hypothesis does the Telford and Birks test address?

line 241: “we subtracted those means from every record”. There are two mean values (6.5 to 5.5 ka and 1.5 to 0.5 ka), and “every record” implies to me the whole data set, LGM to present. Aren’t you just looking at the difference between those two mean values? (And why 1.5 to 0.5 ka?)

line 243: “warmer and drier” Than what? (Which time period is the warmer and drier one?). Throughout this paragraph, the sense of change in climate has to be made explicit. For parallelism, you should adopt a standard way of expressing the changes, e.g. “warmer than present in the mid-Holocene” or “cooling from the mid-Holocene to present” but don’t mix states and trend.

line 250: What’s a “more gradual pattern”?

Figure 6: What exactly is plotted here? Why use a log age axis? An alternative depiction of all of the reconstructions, and their temporal and latitudinal varliations would be a Hovmöller diagram.

Figure 8: I guess we’re supposed to see that there are more correlation coefficients between temperature and precipitation close to zero in the “tailored” analyses. I’ve got nothing against violin plots, but I think a standard histogram would work a lot better.

line 301+: What are the implications of these statistics and their spatial patterns?

lines 315-343: This tutorial on pollen data, chronologies, etc. should probably be in the introduction, not the discussion.

line 378: “numerical mechanisms … reduce the reliability” Please explain.

line 410: “TraCE 21k” is a transient experiment. The model used was CCM 3.

Code and data:

I was able to run the example R code without problems. However, the data sets, described and labelled (via the extension) as .csv files (comma-separated values), are instead tab-separated files, which usually have the extension “.tab”, or sometimes “.txt”. This situation prevents a user from getting a quick look at the data using a spreadsheet program.

P.J. Bartlein

Citation: https://doi.org/10.5194/essd-2022-38-RC2
- AC2: 'Reply on RC2', Chenzhi Li, 17 Oct 2022
  
  The comment was uploaded in the form of a supplement: https://essd.copernicus.org/preprints/essd-2022-38/essd-2022-38-AC2-supplement.pdf
  
  Citation: https://doi.org/10.5194/essd-2022-38-AC2

Peer review completion

AR: Author's response | RR: Referee report | ED: Editor decision | EF: Editorial file upload

AR by Chenzhi Li on behalf of the Authors (20 Oct 2022) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (21 Oct 2022) by Hanqin Tian

RR by Anonymous Referee #1 (25 Nov 2022)

Suggestions for revision or reasons for rejection

This is my second review of Herzschu et al’s manuscript “LegacyClimate 1.0: A dataset of pollen-based climate reconstructions from 2594 Northern Hemisphere sites covering the last 30 ka and beyond”.

In general the authors have done a good job incorporating the comments raised in the previous round. However, I would recommend that more of the response to the review is included in the main text as I think this would make the reasoning for several analyses steps easier to follow.

There are also some points that require additional attention. Most of these I already raised in the previous round of review and I think the manuscript would still benefit from some more discussion and guidance for the user of these data.

The choice of reconstruction methods remains a bit vague and could go beyond stating that these are the most commonly used methods. Surely the authors have good scientific reasons to use these. The argument that with providing the raw pollen data, every one can make their own reconstruction seems to either downplay the importance of carefully evaluating such reconstructions or make the current manuscript superfluous. As such, I find that a bit of a slippery slope and I encourage the authors to clearly state why they chose these methods.

Towards the end of the revised manuscript the authors mention other reconstruction methods and that they could help to “explore a larger fraction of the “method uncertainty” space”. Exactly how this could be done remains unclear and the authors should more clearly outline an approach on how the difference between the methods can be used to evaluate the reconstructions (see e.g. Kucera et al. 2005).

The authors have carried out several analyses which can be used to evaluate the reconstructions and they should be complimented for this rigorous approach. They provide three different reconstructions, information on the transfer function performance (from the CCA), information about the analogue quality, on the significance of the reconstruction etc and indicate that all this information can be used to assess the reconstructions. However, I miss clear guidelines of how this should be done, or what the authors think is the best approach. All these quality measures are provided but in the end not used (e.g. figure 10 contains all time series from a single reconstruction method), so what is the point exactly? I suggest that the authors provide instructions on how to use the data, e.g. indicate which labda ratios should be omitted, at which analogue distance researchers should ignore the reconstruction, how to use the results significance test, how to interpret the difference between the methods, what to do with sites that show marked human influence, etc. Alternatively, they should provide a single reconstruction for each site (with uncertainty) that they think is best (with of course an explanation of the reason why).
The authors explicitly state that the purpose of these reconstructions is to evaluate climate models, but without clear instructions it is unclear how these data can be used for this purpose. The data certainly cannot be used "out of the box" but require processing. This is fine of course and almost always the case with complex data like palaeoclimate data, but it means that users need to be provided with clear guidelines and examples.

Finally, the authors assess the influence of human land use on their reconstructions by looking at the proportion of certain pollen types in the time series. This seems a reasonable approach, as far as I can judge, but the influence of humans not only affects the time series, but also the pollen data set and the climatic variables used for calibration. Perhaps the calibration even more? This is alluded to on page 34, but not really explored. Is it not possible to remove sites with high human influence from the training set? Or are there other approaches, such as (perhaps an outrageous suggestion) calibrating using deeper/older samples assuming relatively stable late Holocene conditions?

And a related question, to what degree are the climate variables averaged over the period between 1970 and 2000 representative of the conditions during deposition of the pollen?

I also imagine that human influence on the core top data also affects the identification of analogues and wonder if it could be the reason for the low analogue quality in Western Europe and the British Isles (page 35)?

These issues should be addressed in a revision.

Minor comments (page numbers refer to version with tracked changes)

Title: …”and beyond.” Why not state the exact duration of the time covered by the synthesis?
Page 4: Holocene conundrum. Is there still a conundrum, or is there mechanistic value in comparing global mean temperatures. The debate has progressed since Liu et al. (Osman et al. 2021; Cartapanis et al. 2022; Kaufman et al. 2020).
Page 4: “Pollen data are the only land-derived proxy … Quaternary period”. I suggest to be a bit more specific and tone this down. There is no a priori reason why one could not evaluate a model based on a single or a few observations. High spatio-temporal coverage allows one to investigate different aspects, but I don’t think there is a reason to be offensive to other proxy types.
Page 4-5: “MAT and WA-PLS rely on extensive collections of modern training data.” Does that not hold for any transfer-function based reconstruction?
Page 6: PANGAEA is a data publisher, or repository, not a data base.
Page 6: “We restricted the analyses to the 70 most common taxa to reduce computational power…” this seems odd wording, why would you like to reduce computational power. Is demand meant? More importantly, how was this tested? Please provide details.
Page 8: “... to measure how well the target environmental variable is strongly related…” delete strongly.
Page 8: please provide details of what the minDC function actually does. What are these probability thresholds and what are they based on?
Page 9: significance test. I asked before, how were the random environmental fields generated? Was it by simple permutation or was spatial autocorrelation taken into account. If the environment is spatially auto- correlated, which I imagine is likely the case, a red-noise null should be used instead of the default white noise null.
Page 9: “In addition, we calculated the correlation between WA-PLS reconstruction of …” please provide a rationale for this analysis that helps to understand why this has been done and how the results should be interpreted.
Page 9: “To ease data handling, the dataset files are separated into…” I disagree that splitting a single file into more than one eases data handling. I would argue that it is easier to filter a data set than combine different ones as I don’t need to load multiple files.
Page 11: “Minimum dissimilarities between modern pollen assemblages and fossil pollen assemblages for each site for MAT” Why not provide this information for each sample, rather than for each site? Like this it does not allow for meaningful filtering of the data since part of a time series may have poor analogues.
Page 11: Located in instead of located from.
Page 12/Fig 1: some of the fossil samples seem to come from marine sites (e.g. the Bay of Biscay or the Caspian Sea). Is that correct and if so, what do these time series tell us about vegetation and climate at that location?
Page 14/Fig 3: and what do values below 1 mean, after all the scale goes down to 0, implying that for some sites none of the variance is constrained?
Page 21/Fig 6: better to show the full range of p values. Or somehow indicate where the sites are that did not pass the test as we can assess if there are any spatial patterns.
Page 21/Table 2: there are no values for MAT.
Page 22: “only in single records” please rephrase. Single means one.
Page 22: “High Plantaginaceae correlate with low TJuly in Central Europe indicating potential biases” please explain why only negative correlations are a problem, the reasoning is unclear to the non-pollen specialist.
Page 23/Fig 7: the colour scale is not colourblind friendly.
Page 28/Fig 10: which method was used for the reconstructions shown here?
Page 34: “We a priori selected … Tann and Tjuly” it would be good to move this section to earlier in the discussion.
Page 36: “Climate reconstruction data sets like LegacyClimate 1.0 thus…” this is not a logical conclusion, it is only valid for reconstructions with a similar coverage as LegacyPollen 1.0. Reword.
Page 37: “Temperature reconstructions from proxy data indicate peak temperatures during the Holocene Thermal Maximum around 6000 years BP followed by a pronounced cooling trend toward the late Holocene (Kaufman et al., 2020b)” I think this is an oversimplification of the Kaufman reconstruction. They also highlight spatial variability in the Holocene temperature trends. As do Osman et al and Cartapanis et al.
Page 37: “Temperature reconstructions are often derived from sea-surface temperatures as either mean annual temperatures (Birks, 2019; Bova et al., 2021) or global mean surface temperature (Marcott et al., 2013; Marsicek et al., 2018; Kaufman et al., 2020a and 2020b).” That seems an oversimplification of the cited literature. Some of the cited studies included seawater temperature estimates, but not all are exclusively based on SST.
Page 37: “In this respect, it might help…” how confident are the authors about the independence of the Tann and Tjuly reconstructions? In other words, can we really reconstruct seasonality?
Page 37: “So far … hemispheric scale” is that statement still true after publication of https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2022GL099730 by the same lead author.
Page 37: The last two paragraphs of section 5 seem to stand on their own and should be better integrated with the remaining text.

Cartapanis, Olivier, Lukas Jonkers, Paola Moffa-Sanchez, Samuel L. Jaccard, and Anne de Vernal. 2022. “Complex Spatio-Temporal Structure of the Holocene Thermal Maximum.” Nature Communications 13 (1): 1–11.
Kaufman, Darrell, Nicholas McKay, Cody Routson, Michael Erb, Christoph Dätwyler, Philipp S. Sommer, Oliver Heiri, and Basil Davis. 2020. “Holocene Global Mean Surface Temperature, a Multi-Method Reconstruction Approach.” Scientific Data 7 (1): 201.
Kucera, Michal, Mara Weinelt, Thorsten Kiefer, Uwe Pflaumann, Angela Hayes, Martin Weinelt, Min-Te Chen, et al. 2005. “Reconstruction of Sea-Surface Temperatures from Assemblages of Planktonic Foraminifera: Multi-Technique Approach Based on Geographically Constrained Calibration Data Sets and Its Application to Glacial Atlantic and Pacific Oceans.” Quaternary Science Reviews 24 (7-9): 951–98.
Osman, Matthew B., Jessica E. Tierney, Jiang Zhu, Robert Tardif, Gregory J. Hakim, Jonathan King, and Christopher J. Poulsen. 2021. “Globally Resolved Surface Temperatures since the Last Glacial Maximum.” Nature 599 (7884): 239–44.

Hide

ED: Reconsider after major revisions (28 Nov 2022) by Hanqin Tian

AR by Chenzhi Li on behalf of the Authors (16 Mar 2023) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (21 Mar 2023) by Hanqin Tian

RR by Anonymous Referee #1 (30 Mar 2023)

ED: Publish as is (05 Apr 2023) by Hanqin Tian

AR by Chenzhi Li on behalf of the Authors (13 Apr 2023) Manuscript

Short summary

Climate reconstruction from proxy data can help evaluate climate models. We present pollen-based reconstructions of mean July temperature, mean annual temperature, and annual precipitation from 2594 pollen records from the Northern Hemisphere, using three reconstruction methods (WA-PLS, WA-PLS_tailored, and MAT). Since no global or hemispheric synthesis of quantitative precipitation changes are available for the Holocene so far, this dataset will be of great value to the geoscientific community.