LegacyClimate 1.0: A dataset of pollen-based climate reconstructions from 2594 Northern Hemisphere sites covering the late Quaternary
- 1Alfred Wegener Institute, Helmholtz Centre for Polar and Marine Research, Polar Terrestrial Environmental Systems, Telegrafenberg A45, 14473 Potsdam, Germany
- 2Institute of Environmental Science and Geography, University of Potsdam, Karl-Liebknecht-Str. 24-25, 14476 Potsdam, Germany
- 3Institute of Biochemistry and Biology, University of Potsdam, Karl-Liebknecht-Str. 24-25, 14476 Potsdam, Germany
- 4Institute of Geosciences, Sect. Meteorology, Rheinische Friedrich-Wilhelms-Universität Bonn, Auf dem Hügel 20, 53121 Bonn, Germany
- 5Max Planck Institute for Meteorology, Bundesstrasse 53, 20146 Hamburg, Germany
- 6Institute of Earth Surface Dynamics IDYST, Faculté des Géosciences et l'Environnement, University of Lausanne, Batiment Géopolis, 1015 Lausanne, Switzerland
- 7Alpine Paleoecology and Human Adaptation Group (ALPHA), State Key Laboratory of Tibetan Plateau Earth System, and Resources and Environment (TPESRE), Institute of Tibetan Plateau Research, Chinese Academy of Sciences, 100101 Beijing, China
- 8Alaska Quaternary Center, University of Alaska Fairbanks, Fairbanks, Alaska 99775, USA
- 9Kazan Federal University, Kremlyovskaya str. 18, 420008 Kazan, Russia
- 10Lomonosov Moscow State University, Faculty of geography, Leniskie gory 1, 119991 Moscow, Russia
- 11Department of Quaternary Paleogeography, Institute of Geography Russian Academy of Science, Staromonrtny lane, 29, 119017, Moscow, Russia
- 12Department of Geography, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul, 08826, Republic of Korea
- 13Institute for Korean Regional Studies, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul, 08826, Republic of Korea
- 14Institut des Sciences de l’Evolution de Montpellier, Université de Montpellier, CNRS UMR 5554, Montpellier, France
- 15PaleoData Lab, Institute of Archaeology and Ethnography, Siberian Branch, Russian Academy of Sciences, Pr. Akademika 36 Lavrentieva 17, 630090 Novosibirsk, Russia
- 16Biological Institute, Tomsk State University, Pr. Lenina, 26, Tomsk, 634050, Russia
- 17Lower Saxony Institute for Historical Coastal Research, D-26382 Wilhelmshaven, Germany
- 18Department of Palynology and Climate Dynamics, Albrecht-von-Haller Institute for Plant Sciences, University of Göttingen, Untere Karspüle 2, 37073 Göttingen, Germany
- 19Freie Universität Berlin, Institute of Geological Sciences, Palaeontology Section, Malteserstrasse 74-100, Building D, 12249 Berlin, Germany
- 20College of Resource Environment and Tourism, Capital Normal University, 105 West 3rd Ring Rd N, 100048 Beijing, China
- 21Key Laboratory of Cenozoic Geology and Environment, Institute of Geology and Geophysics, Chinese Academy of Sciences, 19 Beitucheng West Road, Chaoyang District, 100029 Beijing, China
- 22CAS Center for Excellence in Life and Paleoenvironment, 100044 Beijing, China
- 23College of Geographical Sciences, Hebei Normal University, 050024 Shijiazhuang, China
- 24Guangdong Key Lab of Geodynamics and Geohazards, School of Earth Sciences and Engineering, Sun Yat-sen University, 519082 Zhuhai, China
- 25Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai), 519082 Zhuhai, China
- 1Alfred Wegener Institute, Helmholtz Centre for Polar and Marine Research, Polar Terrestrial Environmental Systems, Telegrafenberg A45, 14473 Potsdam, Germany
- 2Institute of Environmental Science and Geography, University of Potsdam, Karl-Liebknecht-Str. 24-25, 14476 Potsdam, Germany
- 3Institute of Biochemistry and Biology, University of Potsdam, Karl-Liebknecht-Str. 24-25, 14476 Potsdam, Germany
- 4Institute of Geosciences, Sect. Meteorology, Rheinische Friedrich-Wilhelms-Universität Bonn, Auf dem Hügel 20, 53121 Bonn, Germany
- 5Max Planck Institute for Meteorology, Bundesstrasse 53, 20146 Hamburg, Germany
- 6Institute of Earth Surface Dynamics IDYST, Faculté des Géosciences et l'Environnement, University of Lausanne, Batiment Géopolis, 1015 Lausanne, Switzerland
- 7Alpine Paleoecology and Human Adaptation Group (ALPHA), State Key Laboratory of Tibetan Plateau Earth System, and Resources and Environment (TPESRE), Institute of Tibetan Plateau Research, Chinese Academy of Sciences, 100101 Beijing, China
- 8Alaska Quaternary Center, University of Alaska Fairbanks, Fairbanks, Alaska 99775, USA
- 9Kazan Federal University, Kremlyovskaya str. 18, 420008 Kazan, Russia
- 10Lomonosov Moscow State University, Faculty of geography, Leniskie gory 1, 119991 Moscow, Russia
- 11Department of Quaternary Paleogeography, Institute of Geography Russian Academy of Science, Staromonrtny lane, 29, 119017, Moscow, Russia
- 12Department of Geography, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul, 08826, Republic of Korea
- 13Institute for Korean Regional Studies, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul, 08826, Republic of Korea
- 14Institut des Sciences de l’Evolution de Montpellier, Université de Montpellier, CNRS UMR 5554, Montpellier, France
- 15PaleoData Lab, Institute of Archaeology and Ethnography, Siberian Branch, Russian Academy of Sciences, Pr. Akademika 36 Lavrentieva 17, 630090 Novosibirsk, Russia
- 16Biological Institute, Tomsk State University, Pr. Lenina, 26, Tomsk, 634050, Russia
- 17Lower Saxony Institute for Historical Coastal Research, D-26382 Wilhelmshaven, Germany
- 18Department of Palynology and Climate Dynamics, Albrecht-von-Haller Institute for Plant Sciences, University of Göttingen, Untere Karspüle 2, 37073 Göttingen, Germany
- 19Freie Universität Berlin, Institute of Geological Sciences, Palaeontology Section, Malteserstrasse 74-100, Building D, 12249 Berlin, Germany
- 20College of Resource Environment and Tourism, Capital Normal University, 105 West 3rd Ring Rd N, 100048 Beijing, China
- 21Key Laboratory of Cenozoic Geology and Environment, Institute of Geology and Geophysics, Chinese Academy of Sciences, 19 Beitucheng West Road, Chaoyang District, 100029 Beijing, China
- 22CAS Center for Excellence in Life and Paleoenvironment, 100044 Beijing, China
- 23College of Geographical Sciences, Hebei Normal University, 050024 Shijiazhuang, China
- 24Guangdong Key Lab of Geodynamics and Geohazards, School of Earth Sciences and Engineering, Sun Yat-sen University, 519082 Zhuhai, China
- 25Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai), 519082 Zhuhai, China
Abstract. Here we describe the LegacyClimate 1.0, a dataset of the reconstruction of mean July temperature (TJuly), mean annual temperature (Tann), and annual precipitation (Pann) from 2594 fossil pollen records from the Northern Hemisphere spanning the entire Holocene with some records reaching back to the Last Glacial. Two reconstruction methods, the Modern Analogue Technique (MAT) and Weighted-Averaging Partial-Least Squares regression (WA-PLS) reveal similar results regarding spatial and temporal patterns. To reduce the impact of precipitation on temperature reconstruction and vice versa, we also provide reconstructions using tailored modern pollen data limiting the range of the corresponding other climate variables. We assess the reliability of the reconstructions using information from the spatial distributions of the root-mean squared error of prediction and reconstruction significance tests. The dataset is beneficial for climate proxy synthesis studies and to evaluate the output of climate models and thus help to improve the models themselves. We provide our compilation of reconstructed TJuly, Tann, and Pann as open-access datasets at PANGAEA (https://doi.pangaea.de/10.1594/PANGAEA.930512; Herzschuh et al., 2021). R code for the reconstructions is provided at Zenodo (https://doi.org/10.5281/zenodo.5910989; Herzschuh et al., 2022), including harmonized open-access modern and fossil datasets used for the reconstructions, so that customized reconstructions can be easily established.
Ulrike Herzschuh et al.
Status: closed (peer review stopped)
-
RC1: 'Comment on essd-2022-38', Anonymous Referee #1, 07 Mar 2022
Review of LegacyClimate 1.0: A dataset of pollen-based climate reconstructions from 2594 Northern Hemisphere sites covering the late Quaternary by Herzschuh et al.
The authors provide temperature and precipitation reconstructions based on pollen assemblage time series. They provide three different types of reconstructions and provide a clear description of the methods. The dataset is highly valuable and the manuscript is clearly written and the figures are of high quality (if sometimes a bit small). The manuscript seems to be part of a set of articles (a trilogy?): a manuscript describing the raw pollen data, a manuscript dedicated exclusively to the chronology and the present manuscript about the pollen-derived climate reconstructions. I can to some degree follow the rationale of the sequence, but I think this (last?) article would benefit from a closer integration with the article describing the chronology. The chronology, and importantly its uncertainty, is an integral part of the climate reconstruction that the authors present here. In addition, I have some further recommendations and points that require clarification in a revised manuscript.
Major issues
Integration with chronology:
this manuscript focuses entirely on the reconstruction of temperature and precipitation, yet the time series also have a chronology with associated uncertainty. By separating these two aspects into two manuscripts it becomes unclear how the full uncertainty of the paleoclimate time series can be derived. Looking at the data (on pangaea.de) it seems that the provided error only accounts for the reconstruction, not for the chronology. This is not the full story and the manuscript would be tremendously improved if the authors made this third manuscript of the sequence a true integration of the papers on the chronology and the climate reconstruction. In L341-343 the authors even touch on this possibility, but they refrain from taking the logical next step that would make the data product more useful for other researchers.
This means that the first order analysis of the time series as shown in figures 5 and 6 should include some combined error resulting from the reconstruction and the chronology and a clear description of the methodology to combine these errors. The provided data sets should also contain uncertainties that reflect both the chronological and the reconstructions errors. This is not a complicated step, but would massively improve the value of the data product.
Meaning of reconstruction differences:
The authors also mention other reconstruction methods (L372), which begs the question why MAT and WA-PLS were chosen. Only because they are widely used, or because they yield superior results?
In addition, the authors provide three different reconstructions for each time series. What I miss is a discussion of how these different reconstructions can be used. Does the difference between them represent additional uncertainty on the reconstruction? How should the user include or use this information? Are certain reconstruction methods better than others? If so, which is to be preferred? If not, how can the (information from the) reconstructions be combined?
Reconstruction quality:
The CCA suggests that only some part of the variance in the training sets is explained by T and precip and the significance testing indicates that a shocking 60-70 % of the reconstructions are basically noise. Whilst the authors go some way and filter out the time series that do not pass the significance test, I feel that the authors hardly mention this, let alone discuss. I also realise that this manuscript should not analyse the data, but perhaps some discussion in place and the different ways in which (pollen) assemblages could be used in paleoclimate science, including forward modelling, could be highlighted.
Land use issues/human influence:
Some of the time series must bear an imprint of human influence. Can the authors briefly discuss to what degree and if and how this influences the reconstructions?
Insufficient explanation and detail in the methods:
- 2,000 km radius for training set. Please explain why this was done and why the distance is (globally) appropriate.
- Why were seven analogues used for MAT? Are the reconstructions weighted to analogue quality, or simply the arithmetic mean of the seven closest analogues?
- How is the calibration error determined? Was spatial autocorrelation taken into account? From the code it seems that this is not the case, why?
- What is the sample-specific error based on? Why is this provided and not the calibration error?
- If I am correct, the tailoring approach serves the purpose of reducing the effect of co-variation between T and P. Please mention this earlier in the methods. I understand the point and that this goes some way to alleviating the problem. But what is done in cases where the correlation is not reduced? After all, there still is a large proportion of the sites for which there is a marked correlation in the training set. Some discussion would be appropriate here.
- Please provide more detail on the significance test. How were the random environmental fields generated? Simple permutation, or taking spatial correlation into account. Why?
- Why were the tailoring and the significance testing not applied to the MAT reconstructions?
- The CCA seems to be the first step in the development of the transfer function model to demonstrate that T and Precip really explain the variance in the assemblages. Would it not be better placed earlier in the description? And why are the implications barely discussed?
- How are poor analogues treated? Do they occur at all after the lumping? There is some discussion in L327-332, but it is unclear what the user of the data can do with this information.
Minor issues:
L3: reconsider the use of “late quaternary” in the title. The meaning is actually rather vague and something along the lines of 30,000 years would be more informative.
L108: not sure what the policy is to refer to submitted manuscripts.
L131: please provide a bit more detail on WorldClim 2. For instance, what are the data based on, over what period are the data integrated, etc.
L385: crucially, this manuscript does not describe a fossil pollen data set, but a data set of temperature and precip
L402-404: this seems a somewhat dangerous statement. Are the two reconstructions really independent?
Why is the x axis of figure 6 on a log scale?
Whilst glancing through the code I missed the significance testing and the CCA. (But thumbs up for sharing the code.)
-
AC1: 'Reply on RC1', Chenzhi Li, 17 Oct 2022
The comment was uploaded in the form of a supplement: https://essd.copernicus.org/preprints/essd-2022-38/essd-2022-38-AC1-supplement.pdf
-
RC2: 'Comment on essd-2022-38', Patrick Bartlein, 09 Mar 2022
General comments:
This paper describes a set of pollen-based climate reconstructions for the Northern Hemisphere from the LGM to present. The paper is obviously one of three, one describing the pollen data (Herzschuh et al., submitted, which I couldn’t find), another describing the chronology (Li et al., 2022, ESSD-Disc), and this one, describing the reconstructions. There are obvious redundancies among the papers, and I think readers and potential users of the data will find it frustrating to have to track down three papers.
Overall, the paper is not that well organized, with motivations for some of the analyses (e.g. CCA) not appearing until the results section (Section 4, titled “Dataset assessment”), and tutorial material on the nature of pollen data as a palaeo-archive appearing in the discussion, as opposed to the introduction (and presumably also in the first paper of the series, which, with good cross-referencing among the papers, would make it superfluous here). Perhaps this disorganization arose in parting-out the papers.
There are several overarching issues and questions that should be addressed:
Why were January and annual temperature and annual precipitation chosen as the targets for reconstruction? A more appropriate set of climate variables might be those that mechanistically control vegetation like winter cold, summer warmth, and moisture stress. A lot of the paper is devoted to dealing with the obvious correlation between annual temperature and precipitation, but it is never actually established why this is an issue.
What was the role of the canonical correlation analysis? To simply explore the data perhaps, but in fact it represents an alternative reconstruction approach. In any case, it’s neighther clear what the purpose of the analysis is, nor are the results fully explained.
The two reconstruction approaches, weighted-averaging – partial-least-squares (WA-PLS) and the modern analogue technique (MAT) may be frequently applied, but they are not without issues themselves. WA-PLS, as is the case with some other methods, tends to “compress” the reconstructions toward the center of the distribution of the climate data (see Liu et al, 2020, Proc. Royal Soc. A, https://doi.org/10.1098/rspa.2020.0346). This will reduce the amplitude of the time series of the reconstructions. MAT suffers from the no-analogue problem, typically diagnosed by looking at the dissimilarities. The performance of the two approaches are examined in Fig. 3, but there is no attempt to account for the obvious spatial patterns.
A number of the analysis steps are not explained much at all, with the results just briefly described before moving on. In particular, the significance testing in Section 4.2 isn’t fully explained: What is the “take-home message”? What does this analysis say about the usefulness of the reconstructions.
The results are described in terms of mid-Holocene minus present (1.5 to 0.5 ka) long-term mean differences, and some unusual time series plots, but there is no attempt to assess the reasonableness of the reconstructions with respect to paleoclimatic first principles or to compare them with simulations or independent observations.
I think these issues are all basically addressible, and with a little overhauling (i.e. no new analysis, just more complete explanation and discussion), the paper(s) will make a useful contribution.
Specific comments:
line 62: “climate proxy synthesis studies”. Do you mean “syntheses of climate reconstructions” or “syntheses of climate proxies” (i.e. the pollen data)? It’s the former that can be directly compared with climate-model output.
line 71: “The evaluation of climate model outputs…” It’s actually the climate models that are being evaluated in data-model comparisons (of simulations and observations or reconstructions).
line 73: “strong changes in the climate driver” Are you alluding to changes in GHGs during the instrumental record? Changes in insolation, ice-sheet distribution and size, and GHGs between the LGM and present are much larger. For example, the companion CMIP experiment to the LGM is the 4xCO2 experiment. CO2 has yet to double from pre-industrial levels yet.
line 74: “The extratropical Northern Hemisphere … complex spatial and temporal … patterns.” Well, yes, but it’s also where most of the pollen data is from. I don’t think you need to motivate focusing on the Northern Hemisphere extratropics.
line 90: “Regarding the prevalence…”. Just say “Pollen data from … have been used…”
line 94: “high resolution”. Temporal? Spatial? Also, the last millennium is part of the Holocene, and the late-Quaternary, so you might get some push-back from dendroclimatologists about this notion.
line 102: delete “the large” (I think we know extratropical Asia is large area.)
line 103: Whitmore et al. (2005) describes the modern pollen (and climate) data set for North America, not (paleo) precipitation reconstructions.
line 108: If “Herzschuh et al., submitted” is “LegacyPollen 1.0: A taxonomically harmonized global…” then how is that different from this paper (and the data sets on Zenodo)? Does it describe just the fossil-pollen data, or the modern data set too?
line 110: “Li et al., 2022). So there are three papers, 1) the pollen data set, 2) new chronologies, and 3) this paper, right? Why not just say that?
line 116: Why reconstruct temperature and precipitation, as opposed to climate variables that are mechanistically related to vegetation?
line 136: “For consistency with the amount (number?) of taxa…”. This needs to be a little better explained. Why 70 taxa (except for tradition)?
line 147: “2000 km radius”. Why 2000 km?
line 150: “metrics”. Meaning something other than just the squared-chord distance?
line 151: “square-root transformed pollen percentages”. It might be worth pointing out that the same transformation is embedded in the use of the squared-cord distance dissimilarity measure in the MAT approach.
line 156: “co-variation”. Why is this an issue? It might be the case that covariation among predictands wouldn’t be an issue if they were mechanistically related to vegetation, as in the case of variables like MTCO and GDD (Wei, et al., 2020, Ecology http://dx.doi.org/ 10.17864/1947.194
line 161: “… partialling out the respective other variable”. Please explain.
line 161: “We applied a Canonical Correlation Analysis…”. What were the community, constraining, and conditioning matrices in this analysis? More to the point, what was the objective of this analyisi?
line 164: “the ratio … was determined…”. Why and for what purpose?
line 191: Define “RMSEP” on first use in the text.
lines 190-220: What accounts for the spatial variations in RMSEPs? Data density? Data quality (of both the pollen and climate data)? Confounding environmental factors?
line 221: “significance test”. Of what? What hypothesis does the Telford and Birks test address?
line 241: “we subtracted those means from every record”. There are two mean values (6.5 to 5.5 ka and 1.5 to 0.5 ka), and “every record” implies to me the whole data set, LGM to present. Aren’t you just looking at the difference between those two mean values? (And why 1.5 to 0.5 ka?)
line 243: “warmer and drier” Than what? (Which time period is the warmer and drier one?). Throughout this paragraph, the sense of change in climate has to be made explicit. For parallelism, you should adopt a standard way of expressing the changes, e.g. “warmer than present in the mid-Holocene” or “cooling from the mid-Holocene to present” but don’t mix states and trend.
line 250: What’s a “more gradual pattern”?
Figure 6: What exactly is plotted here? Why use a log age axis? An alternative depiction of all of the reconstructions, and their temporal and latitudinal varliations would be a Hovmöller diagram.
Figure 8: I guess we’re supposed to see that there are more correlation coefficients between temperature and precipitation close to zero in the “tailored” analyses. I’ve got nothing against violin plots, but I think a standard histogram would work a lot better.
line 301+: What are the implications of these statistics and their spatial patterns?
lines 315-343: This tutorial on pollen data, chronologies, etc. should probably be in the introduction, not the discussion.
line 378: “numerical mechanisms … reduce the reliability” Please explain.
line 410: “TraCE 21k” is a transient experiment. The model used was CCM 3.
Code and data:
I was able to run the example R code without problems. However, the data sets, described and labelled (via the extension) as .csv files (comma-separated values), are instead tab-separated files, which usually have the extension “.tab”, or sometimes “.txt”. This situation prevents a user from getting a quick look at the data using a spreadsheet program.
P.J. Bartlein
-
AC2: 'Reply on RC2', Chenzhi Li, 17 Oct 2022
The comment was uploaded in the form of a supplement: https://essd.copernicus.org/preprints/essd-2022-38/essd-2022-38-AC2-supplement.pdf
-
AC2: 'Reply on RC2', Chenzhi Li, 17 Oct 2022
Status: closed (peer review stopped)
-
RC1: 'Comment on essd-2022-38', Anonymous Referee #1, 07 Mar 2022
Review of LegacyClimate 1.0: A dataset of pollen-based climate reconstructions from 2594 Northern Hemisphere sites covering the late Quaternary by Herzschuh et al.
The authors provide temperature and precipitation reconstructions based on pollen assemblage time series. They provide three different types of reconstructions and provide a clear description of the methods. The dataset is highly valuable and the manuscript is clearly written and the figures are of high quality (if sometimes a bit small). The manuscript seems to be part of a set of articles (a trilogy?): a manuscript describing the raw pollen data, a manuscript dedicated exclusively to the chronology and the present manuscript about the pollen-derived climate reconstructions. I can to some degree follow the rationale of the sequence, but I think this (last?) article would benefit from a closer integration with the article describing the chronology. The chronology, and importantly its uncertainty, is an integral part of the climate reconstruction that the authors present here. In addition, I have some further recommendations and points that require clarification in a revised manuscript.
Major issues
Integration with chronology:
this manuscript focuses entirely on the reconstruction of temperature and precipitation, yet the time series also have a chronology with associated uncertainty. By separating these two aspects into two manuscripts it becomes unclear how the full uncertainty of the paleoclimate time series can be derived. Looking at the data (on pangaea.de) it seems that the provided error only accounts for the reconstruction, not for the chronology. This is not the full story and the manuscript would be tremendously improved if the authors made this third manuscript of the sequence a true integration of the papers on the chronology and the climate reconstruction. In L341-343 the authors even touch on this possibility, but they refrain from taking the logical next step that would make the data product more useful for other researchers.
This means that the first order analysis of the time series as shown in figures 5 and 6 should include some combined error resulting from the reconstruction and the chronology and a clear description of the methodology to combine these errors. The provided data sets should also contain uncertainties that reflect both the chronological and the reconstructions errors. This is not a complicated step, but would massively improve the value of the data product.
Meaning of reconstruction differences:
The authors also mention other reconstruction methods (L372), which begs the question why MAT and WA-PLS were chosen. Only because they are widely used, or because they yield superior results?
In addition, the authors provide three different reconstructions for each time series. What I miss is a discussion of how these different reconstructions can be used. Does the difference between them represent additional uncertainty on the reconstruction? How should the user include or use this information? Are certain reconstruction methods better than others? If so, which is to be preferred? If not, how can the (information from the) reconstructions be combined?
Reconstruction quality:
The CCA suggests that only some part of the variance in the training sets is explained by T and precip and the significance testing indicates that a shocking 60-70 % of the reconstructions are basically noise. Whilst the authors go some way and filter out the time series that do not pass the significance test, I feel that the authors hardly mention this, let alone discuss. I also realise that this manuscript should not analyse the data, but perhaps some discussion in place and the different ways in which (pollen) assemblages could be used in paleoclimate science, including forward modelling, could be highlighted.
Land use issues/human influence:
Some of the time series must bear an imprint of human influence. Can the authors briefly discuss to what degree and if and how this influences the reconstructions?
Insufficient explanation and detail in the methods:
- 2,000 km radius for training set. Please explain why this was done and why the distance is (globally) appropriate.
- Why were seven analogues used for MAT? Are the reconstructions weighted to analogue quality, or simply the arithmetic mean of the seven closest analogues?
- How is the calibration error determined? Was spatial autocorrelation taken into account? From the code it seems that this is not the case, why?
- What is the sample-specific error based on? Why is this provided and not the calibration error?
- If I am correct, the tailoring approach serves the purpose of reducing the effect of co-variation between T and P. Please mention this earlier in the methods. I understand the point and that this goes some way to alleviating the problem. But what is done in cases where the correlation is not reduced? After all, there still is a large proportion of the sites for which there is a marked correlation in the training set. Some discussion would be appropriate here.
- Please provide more detail on the significance test. How were the random environmental fields generated? Simple permutation, or taking spatial correlation into account. Why?
- Why were the tailoring and the significance testing not applied to the MAT reconstructions?
- The CCA seems to be the first step in the development of the transfer function model to demonstrate that T and Precip really explain the variance in the assemblages. Would it not be better placed earlier in the description? And why are the implications barely discussed?
- How are poor analogues treated? Do they occur at all after the lumping? There is some discussion in L327-332, but it is unclear what the user of the data can do with this information.
Minor issues:
L3: reconsider the use of “late quaternary” in the title. The meaning is actually rather vague and something along the lines of 30,000 years would be more informative.
L108: not sure what the policy is to refer to submitted manuscripts.
L131: please provide a bit more detail on WorldClim 2. For instance, what are the data based on, over what period are the data integrated, etc.
L385: crucially, this manuscript does not describe a fossil pollen data set, but a data set of temperature and precip
L402-404: this seems a somewhat dangerous statement. Are the two reconstructions really independent?
Why is the x axis of figure 6 on a log scale?
Whilst glancing through the code I missed the significance testing and the CCA. (But thumbs up for sharing the code.)
-
AC1: 'Reply on RC1', Chenzhi Li, 17 Oct 2022
The comment was uploaded in the form of a supplement: https://essd.copernicus.org/preprints/essd-2022-38/essd-2022-38-AC1-supplement.pdf
-
RC2: 'Comment on essd-2022-38', Patrick Bartlein, 09 Mar 2022
General comments:
This paper describes a set of pollen-based climate reconstructions for the Northern Hemisphere from the LGM to present. The paper is obviously one of three, one describing the pollen data (Herzschuh et al., submitted, which I couldn’t find), another describing the chronology (Li et al., 2022, ESSD-Disc), and this one, describing the reconstructions. There are obvious redundancies among the papers, and I think readers and potential users of the data will find it frustrating to have to track down three papers.
Overall, the paper is not that well organized, with motivations for some of the analyses (e.g. CCA) not appearing until the results section (Section 4, titled “Dataset assessment”), and tutorial material on the nature of pollen data as a palaeo-archive appearing in the discussion, as opposed to the introduction (and presumably also in the first paper of the series, which, with good cross-referencing among the papers, would make it superfluous here). Perhaps this disorganization arose in parting-out the papers.
There are several overarching issues and questions that should be addressed:
Why were January and annual temperature and annual precipitation chosen as the targets for reconstruction? A more appropriate set of climate variables might be those that mechanistically control vegetation like winter cold, summer warmth, and moisture stress. A lot of the paper is devoted to dealing with the obvious correlation between annual temperature and precipitation, but it is never actually established why this is an issue.
What was the role of the canonical correlation analysis? To simply explore the data perhaps, but in fact it represents an alternative reconstruction approach. In any case, it’s neighther clear what the purpose of the analysis is, nor are the results fully explained.
The two reconstruction approaches, weighted-averaging – partial-least-squares (WA-PLS) and the modern analogue technique (MAT) may be frequently applied, but they are not without issues themselves. WA-PLS, as is the case with some other methods, tends to “compress” the reconstructions toward the center of the distribution of the climate data (see Liu et al, 2020, Proc. Royal Soc. A, https://doi.org/10.1098/rspa.2020.0346). This will reduce the amplitude of the time series of the reconstructions. MAT suffers from the no-analogue problem, typically diagnosed by looking at the dissimilarities. The performance of the two approaches are examined in Fig. 3, but there is no attempt to account for the obvious spatial patterns.
A number of the analysis steps are not explained much at all, with the results just briefly described before moving on. In particular, the significance testing in Section 4.2 isn’t fully explained: What is the “take-home message”? What does this analysis say about the usefulness of the reconstructions.
The results are described in terms of mid-Holocene minus present (1.5 to 0.5 ka) long-term mean differences, and some unusual time series plots, but there is no attempt to assess the reasonableness of the reconstructions with respect to paleoclimatic first principles or to compare them with simulations or independent observations.
I think these issues are all basically addressible, and with a little overhauling (i.e. no new analysis, just more complete explanation and discussion), the paper(s) will make a useful contribution.
Specific comments:
line 62: “climate proxy synthesis studies”. Do you mean “syntheses of climate reconstructions” or “syntheses of climate proxies” (i.e. the pollen data)? It’s the former that can be directly compared with climate-model output.
line 71: “The evaluation of climate model outputs…” It’s actually the climate models that are being evaluated in data-model comparisons (of simulations and observations or reconstructions).
line 73: “strong changes in the climate driver” Are you alluding to changes in GHGs during the instrumental record? Changes in insolation, ice-sheet distribution and size, and GHGs between the LGM and present are much larger. For example, the companion CMIP experiment to the LGM is the 4xCO2 experiment. CO2 has yet to double from pre-industrial levels yet.
line 74: “The extratropical Northern Hemisphere … complex spatial and temporal … patterns.” Well, yes, but it’s also where most of the pollen data is from. I don’t think you need to motivate focusing on the Northern Hemisphere extratropics.
line 90: “Regarding the prevalence…”. Just say “Pollen data from … have been used…”
line 94: “high resolution”. Temporal? Spatial? Also, the last millennium is part of the Holocene, and the late-Quaternary, so you might get some push-back from dendroclimatologists about this notion.
line 102: delete “the large” (I think we know extratropical Asia is large area.)
line 103: Whitmore et al. (2005) describes the modern pollen (and climate) data set for North America, not (paleo) precipitation reconstructions.
line 108: If “Herzschuh et al., submitted” is “LegacyPollen 1.0: A taxonomically harmonized global…” then how is that different from this paper (and the data sets on Zenodo)? Does it describe just the fossil-pollen data, or the modern data set too?
line 110: “Li et al., 2022). So there are three papers, 1) the pollen data set, 2) new chronologies, and 3) this paper, right? Why not just say that?
line 116: Why reconstruct temperature and precipitation, as opposed to climate variables that are mechanistically related to vegetation?
line 136: “For consistency with the amount (number?) of taxa…”. This needs to be a little better explained. Why 70 taxa (except for tradition)?
line 147: “2000 km radius”. Why 2000 km?
line 150: “metrics”. Meaning something other than just the squared-chord distance?
line 151: “square-root transformed pollen percentages”. It might be worth pointing out that the same transformation is embedded in the use of the squared-cord distance dissimilarity measure in the MAT approach.
line 156: “co-variation”. Why is this an issue? It might be the case that covariation among predictands wouldn’t be an issue if they were mechanistically related to vegetation, as in the case of variables like MTCO and GDD (Wei, et al., 2020, Ecology http://dx.doi.org/ 10.17864/1947.194
line 161: “… partialling out the respective other variable”. Please explain.
line 161: “We applied a Canonical Correlation Analysis…”. What were the community, constraining, and conditioning matrices in this analysis? More to the point, what was the objective of this analyisi?
line 164: “the ratio … was determined…”. Why and for what purpose?
line 191: Define “RMSEP” on first use in the text.
lines 190-220: What accounts for the spatial variations in RMSEPs? Data density? Data quality (of both the pollen and climate data)? Confounding environmental factors?
line 221: “significance test”. Of what? What hypothesis does the Telford and Birks test address?
line 241: “we subtracted those means from every record”. There are two mean values (6.5 to 5.5 ka and 1.5 to 0.5 ka), and “every record” implies to me the whole data set, LGM to present. Aren’t you just looking at the difference between those two mean values? (And why 1.5 to 0.5 ka?)
line 243: “warmer and drier” Than what? (Which time period is the warmer and drier one?). Throughout this paragraph, the sense of change in climate has to be made explicit. For parallelism, you should adopt a standard way of expressing the changes, e.g. “warmer than present in the mid-Holocene” or “cooling from the mid-Holocene to present” but don’t mix states and trend.
line 250: What’s a “more gradual pattern”?
Figure 6: What exactly is plotted here? Why use a log age axis? An alternative depiction of all of the reconstructions, and their temporal and latitudinal varliations would be a Hovmöller diagram.
Figure 8: I guess we’re supposed to see that there are more correlation coefficients between temperature and precipitation close to zero in the “tailored” analyses. I’ve got nothing against violin plots, but I think a standard histogram would work a lot better.
line 301+: What are the implications of these statistics and their spatial patterns?
lines 315-343: This tutorial on pollen data, chronologies, etc. should probably be in the introduction, not the discussion.
line 378: “numerical mechanisms … reduce the reliability” Please explain.
line 410: “TraCE 21k” is a transient experiment. The model used was CCM 3.
Code and data:
I was able to run the example R code without problems. However, the data sets, described and labelled (via the extension) as .csv files (comma-separated values), are instead tab-separated files, which usually have the extension “.tab”, or sometimes “.txt”. This situation prevents a user from getting a quick look at the data using a spreadsheet program.
P.J. Bartlein
-
AC2: 'Reply on RC2', Chenzhi Li, 17 Oct 2022
The comment was uploaded in the form of a supplement: https://essd.copernicus.org/preprints/essd-2022-38/essd-2022-38-AC2-supplement.pdf
-
AC2: 'Reply on RC2', Chenzhi Li, 17 Oct 2022
Ulrike Herzschuh et al.
Data sets
Northern Hemisphere temperature and precipitation reconstruction from taxonomically harmonized pollen data set with revised chronologies using WA-PLS and MAT (LegacyClimate 1.0) Herzschuh, Ulrike; Böhmer, Thomas; Li, Chenzhi; Cao, Xianyong https://doi.pangaea.de/10.1594/PANGAEA.930512
Model code and software
LegacyClimate 1.0: A dataset of pollen-based climate reconstructions from 2594 Northern Hemisphere sites covering the late Quaternary Herzschuh, Ulrike; Böhmer, Thomas; Li, Chenzhi; Chevalier, Manuel; Dallmeyer, Anne; Cao, Xianyong; Bigelow, Nancy H.; Nazarova, Larisa; Novenko, Elena Y.; Park, Jungjae; Peyron, Odile; Rudaya, Natalia A.; Schlütz, Frank; Shumilovskikh, Lyudmila S.; Tarasov, Pavel E.; Wang, Yongbo; Wen, Ruilin; Xu, Qinghai; Zheng, Zhuo https://doi.org/10.5281/zenodo.5910989
Ulrike Herzschuh et al.
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
819 | 296 | 28 | 1,143 | 16 | 21 |
- HTML: 819
- PDF: 296
- XML: 28
- Total: 1,143
- BibTeX: 16
- EndNote: 21
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1