A Global Dataset of Forest Disturbance Regimes Derived from Satellite Biomass Observations
Abstract. Forests play a central role in the global carbon cycle by serving as critical carbon sinks for atmospheric CO2. Yet, the stability and continued capacity of these sinks are increasingly threatened by a growing number of disturbances. Accurately representing the stochastic nature of disturbance remains a major challenge and a key source of uncertainty in our understanding of carbon cycle dynamics. This study presents a novel framework for deriving disturbance regimes characterized by extent (μ), frequency (α), intensity (β), as well as background mortality (Kb) directly from landscape features of high-resolution satellite biomass data. These regimes reflect the characteristics of long term disturbances at the landscape scale rather than the properties of any single event. Our analysis inverts the forward model framework developed by Wang et al. (2024), which used a machine learning model trained on a massive synthetic dataset of over 8 million forward model simulations to link known disturbance regimes to spatial biomass patterns. Instead of predicting patterns from regimes, we use observed satellite biomass patterns to infer the underlying disturbance regimes. To ensure robustness, we first identified the optimal spatial resolution for aggregating both simulation and satellite data, minimizing discrepancies in feature value ranges and reducing extrapolation risk. Using this framework, we produced the first globally consistent, observationally constrained dataset of forest disturbance regime parameters and their associated uncertainties, provided at both a 25 × 25 km2 tile level and as a gridded 0.25° global product. Additionally, we used a Dissimilarity Index (DIK) to quantify prediction uncertainty and identify potential extrapolation by measuring observations' divergence from the training set. An empirical evaluation of borderline disturbance regimes supports the assumptions and methodological approach used to build the dataset. Our global maps of disturbance regimes provide a novel, process-based tool for investigating the coupled dynamics of disturbance, vegetation, and the carbon cycle, with potential applications for improving the representation of stochastic disturbances in large-scale ecosystem models.
The manuscript by Wang et al. entitled “A Global Dataset of Forest Disturbance Regimes Derived from Satellite Biomass Observations” describes a novel global dataset quantifying forest disturbance regimes. The data are derived from inverse modeling constrained with data on global biomass and productivity data derived from remote sensing. The dataset is highly relevant, as disturbances remain ill constrained in global carbon cycle assessments and poorly incorporated in earth system models, partly because long-term disturbance data remain missing at global scale.
I am impressed with this work and would really like to commend the authors for the general creativity and novelty of their approach. This has the potential to making a strong and important contribution to the field. That being said, there are a number of problems with the current manuscript which need to be addressed in a revised version of the work. I will describe my main concerns here, and give detailed line-level suggestions below.
First, and probably most important, it is not fully clear to me from the current text how exactly the dataset was derived. The flow chart in Fig. 1 helps, but the text remains vague and lacks details in many parts. Just to pick one example: How the random forest models were fitted, which variables were used as explained and explanatory variables, how random forest parameters were set etc. is not mentioned. Some of the technical details, e.g. addressing the scale mismatch and projection issues, are described with a high level of detail, but some of the more conceptual (ecological) aspects are rather glossed over. The problem here is that it is hard for the reader to understand how the dataset was generated, which limits confidence (particularly in combination with the issues with the evaluation presented – see below) in the dataset. I strongly suggest to revise this part and more clearly describe the process of deriving the dataset in the text.
Second, notwithstanding the merits of the author consortium in the fields of remote sensing and earth systems modeling, they need to read up a fair bit on disturbance ecology (i.e., the field they aim to contribute with their work here). This is particularly obvious by how their target variables are defined, which is not at all in line with the definitions and terminology commonly used in the field. Just to pick an example: What is described as intensity here does not describe disturbance intensity but rather disturbance severity. Also, for some of the parameters the units are missing or unclear (see more details below). These issues could lead to considerable confusion in the community and could strongly diminish the value of the work presented here. I hence suggest a revision of the terminology used and a more thorough description of the parameters (and their units) to make sure that the potential of the dataset will unfold to its full extent.
A third issue I see is with using GPP data only for a single year (2010). This means that the idiosyncratic weather patterns of this particular year (e.g., drought in one place, heat wave in another, extensive rain in yet another region) will be baked into the data. This, in turn, does not correspond well with the biomass data used, which is the integral over many years or decades (and in some cases centuries). I suggest to use a multi-year average GPP estimate here to increase the robustness of the assessment.
Some more observations:
I strongly suggest to somewhere early on state which disturbances you address here. From my understanding, it is all canopy mortality events, including both human and natural disturbances. While any disturbance that only affects the understory but not the overstory (e.g., low-severity fire) and hence has no strong signature in aboveground biomass is not considered. This is me guessing what I think it is, but readers should not need to guesstimate what the target variable of the assessment is. Please be specific here!
I am somewhat missing a more reflected discussion of limitations. The spatial aggregation approach undertaken is mainly needed because the underlying model is lacking the representation of important spatial processes (contagion). This is indeed cleverly done here and makes sense for the product. But one could also argue that this approach (which reduces the fidelity of the dataset overall by the need to spatially average) would not be needed if a better/ more appropriate/ more refined underlying model would have been used. Also, the evaluation basically tests against data that were used for generating the products, so is not an evaluation against independent data. While I understand that this is difficult to do, some tests against local data on disturbance regimes (from dendroecological sources, long-term inventory data, etc.) would have been desirable. Also, while the approach taken here is generally clever and creative, it remains unclear to me how the effects of the different aspects of disturbance can be teased apart with high certainty, given that at pixel level, disturbance rate, size, and severity will result in a similar effect on biomass (reduction). Early on in the text the authors claim that their model can do this well, but why it can and how well it actually can remains unclear for me. I am not saying that the authors should fundamentally change these things (as this will probably not be possible and would substantially change their work), but a somewhat more reflected and nuanced discussion of the limitations of their work would be highly appreciated.
More detailed line-level comments:
L20: not only number of disturbances, but also size and severity are changing, and contribute equally to the issue
L34: borderline disturbance regimes – meaning unclear
L54: change detection
L70: I am not familiar with the term “forward-modeling”, which is used quite frequently throughout the text; please explain at first usage what you mean
L129: how is the approach “adaptive”?
L131: I suggest that all 17 features are also explicitly named in the main text, maybe move the table from the supplement to the main?
L145: is it robust to use the GPP of a single year here, given that the biomass data are the aggregate over a much longer time horizon? The year selected could have been a particularly dry year in some places, while not in others, which will affect the GPP, and hence the particular weather patterns of that year will be represented in the GPP layer (while the biomass pattern is the integral over decades to centuries).
L160: and the contagious nature of disturbances, such as fires spreading through landscapes
L233: is the base for calculating extent forest area or total cell area? This should be specified. Also, I suggest to rename this to “disturbance rate”, as in disturbance ecology, extent is commonly an estimate of area affected (e.g., in ha or km²), while the annualized relative area affected (which is what you report here) is referred to as rate
L234: frequency is not a pattern in space, but a pattern in time. Whether the events are small or large is best described by gap size distribution; also unclear: what is the unit here? Without this it is very hard to interpret.
L236: The term severity and intensity are used interchangeably here, yet they refer to distinctly different properties of the disturbance regime. Please don’t use them interchangeably, as this will only confuse readers and decrease the value of your work for the community (see e.g. Turner 2010, Ecology 91, 2833-2849, Table 1 for definitions of key terms of disturbance ecology). What you report, based on your description, is severity, so please also call it that (and not intensity, which describes the energy of an event, e.g. the fire line energy, or the gust wind speed of a storm). And same comment on unit here: Should be % or rate [0,1], I assume, but is not specified explicitly – please clarify!
L238: What is the unit exactly here? I.e., what per year is reported? Biomass loss in g/m²? Percent biomass loss? Other? Please be explicit, as otherwise it is difficult for the community to understand (and use) your product!
L332: I agree when it comes to the global extent, but there are many regions of the world where these properties are well understood and where we know e.g., from dendroecological studies, long-term inventories etc. about the properties of forest disturbance regimes. So the statement is not entirely correct.
L334: Well, you used this data to produce the maps, so comparing the data you generated back to the input data you used to generate them is not a very strong test – it is not surprising at all, that you find patterns here! An independent test, even if only for some selected regions, would be much stronger.
L335: unclear, what is a scenario in your context? As far as I can see, you did not conduct any scenario analyses. Consequently, I don’t understand what extreme high/low scenarios are, high/ low with regard to what?
L359: shows a more transition… meaning unclear
L361: I think the term “scenario” is used wrongly here, see also my comment above