The manuscript presents the Legacy Vegetation 1.0 dataset by modelling the LegacyPollen2.0 dataset (Li et al., 2024) using the REVEALS model in an ambitious attempt at reconstructing forest cover across the Northern Hemisphere extended just beyond the Holocene (~14 ka). The manuscript concurs that using the REVEALS model accounts for pollen dispersal and production biases well when considering forest cover in Europe (as already published - Serge et al., 2023) and provides a more realistic vegetation cover when compared to modern remote sensed forest data than unmodelled pollen. Attempts to apply the REVEALS model continentally in North America and Asia suggest that, while improved over continental pollen percentage values, continental REVEALS vegetation reconstructions are far away. Due to the paucity of regional RPP data or spatial sedimentary coverage, continental reconstructions are likely not robust enough for the purpose of using palaeoecological forest cover data to train/validate climate-vegetation prediction models, as suggested in the manuscript.
Major Comments:
• No justification is given for the 14 ka cutoff in the time series; please elaborate. Is this a result of historic climate values, the length of the sedimentary records used or both? 14 ka is ending up in the deglacial period, which is a bit of a push considering the abrupt climatic change at the onset of the Holocene. The prior LegacyPollen dataset paper (Herzschuh et al., 2022) states this as such, calling it a deglacial period/transition. Thus, at 14 ka, pollen productivity may not be accounted for with Holocene interglacial PPEs. The authors should consider stopping at the Holocene boundary.
• Source Area: The selection of an 80% source area is justified by the following: “The primary objective of this calculation is to provide a clear understanding of the scale of the source area for users unfamiliar with pollen data. It highlights the regional nature of lacustrine pollen data and demonstrates the influence of lake size on this source area”, which is a valid comment. However, no justification is given for using 80% pollen source areas; why not 70%, 90% or 95% etc? Is the 80% source area just a reflection of the value set for comparing the GPM to the LSM in Theuerkauf et al. (2016) based on Prentice and Webbs’ (1986) somewhat arbitrary statement that “…significant amounts of pollen can be derived from far beyond the source area for, say, 80% of the pollen grains” found at a site.
Also, it would be worthwhile to mention that when using the GPM source, areas have been found to vary greatly between taxa based on their individual fall speeds (Theuerkauf et al., 2016).
The upper end of the 80% source area presented here is 762 km, which seems rather large.
If the purpose of the 80% source area is to show that pollen source area increases with basin radius (Fig 5), I think the words from this section would be better spent elsewhere. The inclusion of an 80% source area here is confusing, and the manuscript may improve with this information being supplementary information instead.
More importantly, no explanation of the dispersal model used to calculate the pollen source area is given, assumed to be the Gaussian plume with unstable conditions? The choice of dispersal model would substantially affect the results of this ‘source area’ calculation, hence this should be noted in the discussion. As paper is currently lacking in novelty, a good improvement would be to test dispersal models (e.g. changing settings in the GPM, but also trying the Lagrangian Stochastic model) in the validation for these regions. These is somewhat a low-hanging fruit, as the R code for REVEALSinR allows to change dispersal model, but the PPEs would need to be recalculated with the same settings to match the various attempts.
• No comparison to prior European scale reconstructions is presented, e.g. (Serge et al., 2023), which would be very helpful to position where the methodology differs/falls in comparison to prior reconstructions.
• Section 3.2 is slightly arbitrary/worded strangely; trends in pollen and REVEALS are broadly the same, and the composition of the vegetation is what changes with PPEs adjusting the pollen values. Cyperaceae is a strange choice to include for the bogs and swamps as it would be locally deposited around the coring location. Whilst the Peatland setting in REVEALSinR would assume that the Cyperaceae originates from outside of the immediate area surrounding the core, many of such pollen grains would be local.
• Figure 11 reveals that outside of Europe, continental vegetation reconstructions are not yet robust, likely due to the density of sedimentary records in North America and Asia or the implementation of continental averaged PPEs or another factor not explored here but nonetheless are better than unmodelled pollen reconstructions.
• The discussion section would benefit from sub-sectioning, perhaps between (1) more methodological issues/limitations arising from the selected method and potential solutions or a forward outlook on how a northern hemisphere may become as robust as those seen prior for just continental scale European reconstructions. (2) more generalised and outlooking insights, e.g., those in relation to the validation and training of climate vegetation models, as mentioned early on in the manuscript (Abstract and Introduction). Section 2 may require more nuance and restraint when related back to some of the clearer limitations of the presented method, neatly summarised by the reconstruction error being much greater than Europe for North America and Asia in Figure 11.
Moreover, there is a lack of discussion surrounding the role of human agency on forest cover through the Holocene, which is likely one of its biggest drivers.
• Case sensitivity in the axes and legends of figures should be consistent. Italics of species-genus names need to be implemented throughout.
Other Comments by Line:
L6: Why is the last 14 ka selected as the cutoff, 11.7 ka or 11 ka requiring less justification as clearer alternatives?
L8-9: does not read correctly, should be, e.g. “The pollen source area where 80% of the pollen originated within was calculated for large lakes (>50ha)”.
L40-45: called the Fagerlind Effect.
L44: The R-value and ERV models are constituents of the LRA. The REVEALS model is not merely a “refinement” per se of the ERV. The ERV is a key part of the REVEALS model incorporated in the PPE calculation stage of the LRA.
L53: space needed after Githumbi et al. (2022)
L75: why are Australasian and Latin American pollen databases included here? The reconstruction is for the Northern Hemisphere. See the end of the next comment in relation to this.
L79: why are some records “marine in origin”? Is this a typo? If it relates to the LegacyPollen 2.0 dataset creation, this should be covered in the LegacyPollen2.0 publication, not here it causes confusion.
Fig1: consider separating this figure into four map panels, three regions and one hemispheric map as shown currently or reduce the point size slightly.
Fig2: consider making the bars 3 stacked colours between the three categories of sites, large lakes, peatland and small lakes, or adding these as another facet element to the figures.
L85: REVEALS does not need to be defined again.
L100: what version of the DISQOVER package is being used also, is REVEALSinR a function rather than a package itself?
L104-130/Tab2: how many model runs (n) occurred for each site? It is not mentioned but is a key parameter of the REVEALSinR function defaulting to 1000; see the example given in the function information in the most recent DISQOVER package release version available here: https://github.com/MartinTheuerkauf/disqover.
REVEALSinR(
pollen,
params,
tBasin,
dBasin,
dwm = "lsm unstable",
n = 1000,
regionCutoff = 1e+05,
ppefun = rnorm_reveals,
pollenfun = rmultinom_reveals,
writeresults = FALSE,
verbose = TRUE
n number of model runs per time slice, by default 1000
L125: To avoid confusion with the usage of (n), as mentioned previously, which is the number of model runs in the REVEALSinR implementation, consider the notation for the mixing value to something else.
L160: why was 80% chosen to see the major comment.
Sec3.1: 762km is rather large as the 80% source area, especially if Zmax is set to 1000km (Tab 2 region cutoff value). Similarly, the lower 155km is still quite large, even for the GPM (Theuerkauf et al., 2016), which generally has larger pollen source areas for taxa.
L168-9: Betula, Pinus, Acer. Italicisation and case sensitivity need checking throughout the manuscript, including figures and captions.
Fig 7: the key agebin (0 ka – modern) would benefit from being presented as point data rather than a time series. Four bar charts with the three series: pollen vs REVEALS vs remote sensing, might be a good idea. The time series is interesting but is not the impactful point.
L184-185: implies that REVEALS is not working well in North America and Asia for continental averages. The difference should be larger and closer to the remote sensing for the modern modelled pollen values.
Sec3.4: while MAE is useful, I would consider using additional measures like dissimilarity to help identify/quantify the difference between REVEALS vs remote sensing, pollen vs remote sensing, and split into the three continents (Jackson and Williams, 2004; Overpeck et al., 1985). It would be possible to estimate critical values.
Jackson, S.T., Williams, J.W., 2004. MODERN ANALOGS IN QUATERNARY PALEOECOLOGY: Here Today, Gone Yesterday, Gone Tomorrow? Annu. Rev. Earth Planet. Sci. 32, 495–537. https://doi.org/10.1146/annurev.earth.32.101802.120435
Overpeck, J.T., Webb, T., Prentice, I.C., 1985. Quantitative Interpretation of Fossil Pollen Spectra: Dissimilarity Coefficients and the Method of Modern Analogs. Quat. res. 23, 87–108. https://doi.org/10.1016/0033-5894(85)90074-2
L202: Europe shows the best results, but trees are still being overrepresented here in comparison to the remote sensed cover; what is driving the overrepresentation of trees here and beyond? This is not unpacked in great enough detail in the discussion.
Fig9: REVEALS reconstruction validations for Europe can be compared with the prior validations of (Serge et al., 2023).
Fig11: Strongly suggests that robust continental reconstruction in North America and Asia is not possible yet, even when validated just on the arboreal layer.
Sec4: see the main comments section.
L232: a comparison of the results of Serge et al. (2023) is warranted. Should the forest cover for Europe be better if not the same/similar?
L255-260: DNA comments do not make sense in terms of vegetation reconstructions. A reconstruction implies quantity. SedDNA/eDNA data today remains point data, which provides no information on the quantity of vegetation around a site, merely presence and absence.
L269-271: These sentences about the reliability of data are more methodological considerations that belong in the methods or a separate discussion unpacking the limitations of the presented method. |