the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
LegacyClimate 1.0: a dataset of pollen-based climate reconstructions from 2594 Northern Hemisphere sites covering the last 30 kyr and beyond
Ulrike Herzschuh
Thomas Böhmer
Chenzhi Li
Manuel Chevalier
Raphaël Hébert
Anne Dallmeyer
Xianyong Cao
Nancy H. Bigelow
Larisa Nazarova
Elena Y. Novenko
Jungjae Park
Odile Peyron
Natalia A. Rudaya
Frank Schlütz
Lyudmila S. Shumilovskikh
Pavel E. Tarasov
Yongbo Wang
Ruilin Wen
Qinghai Xu
Zhuo Zheng
Download
- Final revised paper (published on 02 Jun 2023)
- Preprint (discussion started on 10 Feb 2022)
Interactive discussion
Status: closed
-
RC1: 'Comment on essd-2022-38', Anonymous Referee #1, 07 Mar 2022
Review of LegacyClimate 1.0: A dataset of pollen-based climate reconstructions from 2594 Northern Hemisphere sites covering the late Quaternary by Herzschuh et al.
The authors provide temperature and precipitation reconstructions based on pollen assemblage time series. They provide three different types of reconstructions and provide a clear description of the methods. The dataset is highly valuable and the manuscript is clearly written and the figures are of high quality (if sometimes a bit small). The manuscript seems to be part of a set of articles (a trilogy?): a manuscript describing the raw pollen data, a manuscript dedicated exclusively to the chronology and the present manuscript about the pollen-derived climate reconstructions. I can to some degree follow the rationale of the sequence, but I think this (last?) article would benefit from a closer integration with the article describing the chronology. The chronology, and importantly its uncertainty, is an integral part of the climate reconstruction that the authors present here. In addition, I have some further recommendations and points that require clarification in a revised manuscript.
Major issues
Integration with chronology:
this manuscript focuses entirely on the reconstruction of temperature and precipitation, yet the time series also have a chronology with associated uncertainty. By separating these two aspects into two manuscripts it becomes unclear how the full uncertainty of the paleoclimate time series can be derived. Looking at the data (on pangaea.de) it seems that the provided error only accounts for the reconstruction, not for the chronology. This is not the full story and the manuscript would be tremendously improved if the authors made this third manuscript of the sequence a true integration of the papers on the chronology and the climate reconstruction. In L341-343 the authors even touch on this possibility, but they refrain from taking the logical next step that would make the data product more useful for other researchers.
This means that the first order analysis of the time series as shown in figures 5 and 6 should include some combined error resulting from the reconstruction and the chronology and a clear description of the methodology to combine these errors. The provided data sets should also contain uncertainties that reflect both the chronological and the reconstructions errors. This is not a complicated step, but would massively improve the value of the data product.
Meaning of reconstruction differences:
The authors also mention other reconstruction methods (L372), which begs the question why MAT and WA-PLS were chosen. Only because they are widely used, or because they yield superior results?
In addition, the authors provide three different reconstructions for each time series. What I miss is a discussion of how these different reconstructions can be used. Does the difference between them represent additional uncertainty on the reconstruction? How should the user include or use this information? Are certain reconstruction methods better than others? If so, which is to be preferred? If not, how can the (information from the) reconstructions be combined?
Reconstruction quality:
The CCA suggests that only some part of the variance in the training sets is explained by T and precip and the significance testing indicates that a shocking 60-70 % of the reconstructions are basically noise. Whilst the authors go some way and filter out the time series that do not pass the significance test, I feel that the authors hardly mention this, let alone discuss. I also realise that this manuscript should not analyse the data, but perhaps some discussion in place and the different ways in which (pollen) assemblages could be used in paleoclimate science, including forward modelling, could be highlighted.
Land use issues/human influence:
Some of the time series must bear an imprint of human influence. Can the authors briefly discuss to what degree and if and how this influences the reconstructions?
Insufficient explanation and detail in the methods:
- 2,000 km radius for training set. Please explain why this was done and why the distance is (globally) appropriate.
- Why were seven analogues used for MAT? Are the reconstructions weighted to analogue quality, or simply the arithmetic mean of the seven closest analogues?
- How is the calibration error determined? Was spatial autocorrelation taken into account? From the code it seems that this is not the case, why?
- What is the sample-specific error based on? Why is this provided and not the calibration error?
- If I am correct, the tailoring approach serves the purpose of reducing the effect of co-variation between T and P. Please mention this earlier in the methods. I understand the point and that this goes some way to alleviating the problem. But what is done in cases where the correlation is not reduced? After all, there still is a large proportion of the sites for which there is a marked correlation in the training set. Some discussion would be appropriate here.
- Please provide more detail on the significance test. How were the random environmental fields generated? Simple permutation, or taking spatial correlation into account. Why?
- Why were the tailoring and the significance testing not applied to the MAT reconstructions?
- The CCA seems to be the first step in the development of the transfer function model to demonstrate that T and Precip really explain the variance in the assemblages. Would it not be better placed earlier in the description? And why are the implications barely discussed?
- How are poor analogues treated? Do they occur at all after the lumping? There is some discussion in L327-332, but it is unclear what the user of the data can do with this information.
Minor issues:
L3: reconsider the use of “late quaternary” in the title. The meaning is actually rather vague and something along the lines of 30,000 years would be more informative.
L108: not sure what the policy is to refer to submitted manuscripts.
L131: please provide a bit more detail on WorldClim 2. For instance, what are the data based on, over what period are the data integrated, etc.
L385: crucially, this manuscript does not describe a fossil pollen data set, but a data set of temperature and precip
L402-404: this seems a somewhat dangerous statement. Are the two reconstructions really independent?
Why is the x axis of figure 6 on a log scale?
Whilst glancing through the code I missed the significance testing and the CCA. (But thumbs up for sharing the code.)
Citation: https://doi.org/10.5194/essd-2022-38-RC1 -
AC1: 'Reply on RC1', Chenzhi Li, 17 Oct 2022
The comment was uploaded in the form of a supplement: https://essd.copernicus.org/preprints/essd-2022-38/essd-2022-38-AC1-supplement.pdf
-
RC2: 'Comment on essd-2022-38', Patrick Bartlein, 09 Mar 2022
General comments:
This paper describes a set of pollen-based climate reconstructions for the Northern Hemisphere from the LGM to present. The paper is obviously one of three, one describing the pollen data (Herzschuh et al., submitted, which I couldn’t find), another describing the chronology (Li et al., 2022, ESSD-Disc), and this one, describing the reconstructions. There are obvious redundancies among the papers, and I think readers and potential users of the data will find it frustrating to have to track down three papers.
Overall, the paper is not that well organized, with motivations for some of the analyses (e.g. CCA) not appearing until the results section (Section 4, titled “Dataset assessment”), and tutorial material on the nature of pollen data as a palaeo-archive appearing in the discussion, as opposed to the introduction (and presumably also in the first paper of the series, which, with good cross-referencing among the papers, would make it superfluous here). Perhaps this disorganization arose in parting-out the papers.
There are several overarching issues and questions that should be addressed:
Why were January and annual temperature and annual precipitation chosen as the targets for reconstruction? A more appropriate set of climate variables might be those that mechanistically control vegetation like winter cold, summer warmth, and moisture stress. A lot of the paper is devoted to dealing with the obvious correlation between annual temperature and precipitation, but it is never actually established why this is an issue.
What was the role of the canonical correlation analysis? To simply explore the data perhaps, but in fact it represents an alternative reconstruction approach. In any case, it’s neighther clear what the purpose of the analysis is, nor are the results fully explained.
The two reconstruction approaches, weighted-averaging – partial-least-squares (WA-PLS) and the modern analogue technique (MAT) may be frequently applied, but they are not without issues themselves. WA-PLS, as is the case with some other methods, tends to “compress” the reconstructions toward the center of the distribution of the climate data (see Liu et al, 2020, Proc. Royal Soc. A, https://doi.org/10.1098/rspa.2020.0346). This will reduce the amplitude of the time series of the reconstructions. MAT suffers from the no-analogue problem, typically diagnosed by looking at the dissimilarities. The performance of the two approaches are examined in Fig. 3, but there is no attempt to account for the obvious spatial patterns.
A number of the analysis steps are not explained much at all, with the results just briefly described before moving on. In particular, the significance testing in Section 4.2 isn’t fully explained: What is the “take-home message”? What does this analysis say about the usefulness of the reconstructions.
The results are described in terms of mid-Holocene minus present (1.5 to 0.5 ka) long-term mean differences, and some unusual time series plots, but there is no attempt to assess the reasonableness of the reconstructions with respect to paleoclimatic first principles or to compare them with simulations or independent observations.
I think these issues are all basically addressible, and with a little overhauling (i.e. no new analysis, just more complete explanation and discussion), the paper(s) will make a useful contribution.
Specific comments:
line 62: “climate proxy synthesis studies”. Do you mean “syntheses of climate reconstructions” or “syntheses of climate proxies” (i.e. the pollen data)? It’s the former that can be directly compared with climate-model output.
line 71: “The evaluation of climate model outputs…” It’s actually the climate models that are being evaluated in data-model comparisons (of simulations and observations or reconstructions).
line 73: “strong changes in the climate driver” Are you alluding to changes in GHGs during the instrumental record? Changes in insolation, ice-sheet distribution and size, and GHGs between the LGM and present are much larger. For example, the companion CMIP experiment to the LGM is the 4xCO2 experiment. CO2 has yet to double from pre-industrial levels yet.
line 74: “The extratropical Northern Hemisphere … complex spatial and temporal … patterns.” Well, yes, but it’s also where most of the pollen data is from. I don’t think you need to motivate focusing on the Northern Hemisphere extratropics.
line 90: “Regarding the prevalence…”. Just say “Pollen data from … have been used…”
line 94: “high resolution”. Temporal? Spatial? Also, the last millennium is part of the Holocene, and the late-Quaternary, so you might get some push-back from dendroclimatologists about this notion.
line 102: delete “the large” (I think we know extratropical Asia is large area.)
line 103: Whitmore et al. (2005) describes the modern pollen (and climate) data set for North America, not (paleo) precipitation reconstructions.
line 108: If “Herzschuh et al., submitted” is “LegacyPollen 1.0: A taxonomically harmonized global…” then how is that different from this paper (and the data sets on Zenodo)? Does it describe just the fossil-pollen data, or the modern data set too?
line 110: “Li et al., 2022). So there are three papers, 1) the pollen data set, 2) new chronologies, and 3) this paper, right? Why not just say that?
line 116: Why reconstruct temperature and precipitation, as opposed to climate variables that are mechanistically related to vegetation?
line 136: “For consistency with the amount (number?) of taxa…”. This needs to be a little better explained. Why 70 taxa (except for tradition)?
line 147: “2000 km radius”. Why 2000 km?
line 150: “metrics”. Meaning something other than just the squared-chord distance?
line 151: “square-root transformed pollen percentages”. It might be worth pointing out that the same transformation is embedded in the use of the squared-cord distance dissimilarity measure in the MAT approach.
line 156: “co-variation”. Why is this an issue? It might be the case that covariation among predictands wouldn’t be an issue if they were mechanistically related to vegetation, as in the case of variables like MTCO and GDD (Wei, et al., 2020, Ecology http://dx.doi.org/ 10.17864/1947.194
line 161: “… partialling out the respective other variable”. Please explain.
line 161: “We applied a Canonical Correlation Analysis…”. What were the community, constraining, and conditioning matrices in this analysis? More to the point, what was the objective of this analyisi?
line 164: “the ratio … was determined…”. Why and for what purpose?
line 191: Define “RMSEP” on first use in the text.
lines 190-220: What accounts for the spatial variations in RMSEPs? Data density? Data quality (of both the pollen and climate data)? Confounding environmental factors?
line 221: “significance test”. Of what? What hypothesis does the Telford and Birks test address?
line 241: “we subtracted those means from every record”. There are two mean values (6.5 to 5.5 ka and 1.5 to 0.5 ka), and “every record” implies to me the whole data set, LGM to present. Aren’t you just looking at the difference between those two mean values? (And why 1.5 to 0.5 ka?)
line 243: “warmer and drier” Than what? (Which time period is the warmer and drier one?). Throughout this paragraph, the sense of change in climate has to be made explicit. For parallelism, you should adopt a standard way of expressing the changes, e.g. “warmer than present in the mid-Holocene” or “cooling from the mid-Holocene to present” but don’t mix states and trend.
line 250: What’s a “more gradual pattern”?
Figure 6: What exactly is plotted here? Why use a log age axis? An alternative depiction of all of the reconstructions, and their temporal and latitudinal varliations would be a Hovmöller diagram.
Figure 8: I guess we’re supposed to see that there are more correlation coefficients between temperature and precipitation close to zero in the “tailored” analyses. I’ve got nothing against violin plots, but I think a standard histogram would work a lot better.
line 301+: What are the implications of these statistics and their spatial patterns?
lines 315-343: This tutorial on pollen data, chronologies, etc. should probably be in the introduction, not the discussion.
line 378: “numerical mechanisms … reduce the reliability” Please explain.
line 410: “TraCE 21k” is a transient experiment. The model used was CCM 3.
Code and data:
I was able to run the example R code without problems. However, the data sets, described and labelled (via the extension) as .csv files (comma-separated values), are instead tab-separated files, which usually have the extension “.tab”, or sometimes “.txt”. This situation prevents a user from getting a quick look at the data using a spreadsheet program.
P.J. Bartlein
Citation: https://doi.org/10.5194/essd-2022-38-RC2 -
AC2: 'Reply on RC2', Chenzhi Li, 17 Oct 2022
The comment was uploaded in the form of a supplement: https://essd.copernicus.org/preprints/essd-2022-38/essd-2022-38-AC2-supplement.pdf
-
AC2: 'Reply on RC2', Chenzhi Li, 17 Oct 2022