the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Global high-resolution forest disturbance type dataset
Abstract. Forests play a pivotal role in global carbon cycling and biodiversity conservation, yet they face increasing disturbances from both anthropogenic and natural drivers. This study presents the first high-resolution (30-m) global forest disturbance dataset (GFD) for 2000–2020, classifying 11 disturbance types by integrating Landsat-based Continuous Change Detection and Classification (CCDC) time-series analysis with spatial metrics and machine learning. A total of 57,000 expert-validated samples were used to train and validate a decision tree model, achieving an overall accuracy of 94.88 %. The results reveal that forestry disturbance (43.79±0.31 %), shifting cultivation (24.32±0.28 %), and forest fires (11.45±0.05 %) dominate global forest loss. There are regional differences in global forest disturbance, such as farmland expansion in South America and Africa, forest fires in northern regions, and shifting cultivation in tropical regions. Disturbed forests span 1,247.06±11.18 Mha, accounting for 30.87 % of the global forest area. Notably, 2.76 % of global forests were newly established, primarily in China, India, and Brazil. Spatial consistency analysis with existing datasets (R2=0.93) confirms the reliability of the GFD product. The GFD dataset advances our understanding of forest dynamics and underscores the need for targeted conservation strategies in an era of escalating environmental change. The 30 m resolution GFD generated by this study is openly available at https://doi.org/10.6084/m9.figshare.28465178 (Liu et al., 2025a).
- Preprint
(1664 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
CC1: 'Comment on essd-2025-346', zhou yuming, 27 Jun 2025
The data reveal the types of global forest disturbances, which is helpful for global intervention and protection according to local conditions, and has guiding significance for forest prediction research at the national scale.
Citation: https://doi.org/10.5194/essd-2025-346-CC1 -
CC2: 'Comment on essd-2025-346', Shu Fu, 27 Jun 2025
Forests are a massive carbon reservoir, and assessing their carbon disturbances requires comprehensive and detailed identification of forest disturbance types. This research provides reliable data and technical support for evaluating local and even global forest carbon disturbances.
Citation: https://doi.org/10.5194/essd-2025-346-CC2 -
CC3: 'Comment on essd-2025-346', Zhang Jimin, 27 Jun 2025
This pioneering study delivers the first 30-m resolution global forest disturbance dataset , classifying 11 types via Landsat time-series, spatial metrics, and machine learning. Achieving 94.88% accuracy with 57,000 samples, it quantifies dominant drivers like forestry and wildfires .this resource revolutionizes carbon accounting and conservation planning, offering unmatched precision for global environmental governance.
Citation: https://doi.org/10.5194/essd-2025-346-CC3 -
CC4: 'Comment on essd-2025-346', Chuhan Ji, 27 Jun 2025
The 30m GFD dataset is a landmark, with 94.88% accuracy in classifying 11 forest disturbances (2000–2020). It enhances understanding of dynamics, aids carbon/biodiversity studies, and supports targeted conservation via reliable, open-access data.
Citation: https://doi.org/10.5194/essd-2025-346-CC4 -
CC5: 'Comment on essd-2025-346', Tian Zhao, 27 Jun 2025
This study makes a significant contribution to global forest monitoring by providing the first high-resolution (30 m) dataset of forest disturbance types over two decades, with robust validation and open access. By integrating Landsat-based Continuous Change Detection and Classification (CCDC) with spatial metrics and decision tree algorithms, the authors developed a robust classification framework that achieved an overall accuracy of 94.88%. The resulting dataset not only improves our ability to distinguish among 11 major forest disturbance types at a fine scale, but also provides critical support for carbon accounting, biodiversity conservation, and sustainable land management under global environmental change.
Citation: https://doi.org/10.5194/essd-2025-346-CC5 -
RC1: 'Comment on essd-2025-346', Ian Evans, 14 Jul 2025
from Ian S. Evans, Durham University, U.K.
GENERAL
It is useful to have maps of world distribution of different forest disturbance types and the authors provide a higher-resolution data set. The results appear reliable and mark a significant contribution to the state of the world’s forests.
13 situations are recognised (Table 1); of these, two ‘weak disturbances’ (drought, pests&diseases) are not considered, so 11 are mapped in Fig.5, including ‘undisturbed’ and ‘newly added forest’. Excluding undisturbed and new leaves 9 types of disturbance, of which 7 are covered in Fig.3 and Table 4 (accuracy of flood and oil palm not being evaluated).
My criticisms are essentially confined to details of presentation and wording. It might be good to have more information on how the types are defined and how time series permit recognition of e.g. recovered areas. On line133 the treatment of ‘vacant areas’ is worrying: more information on this is needed, how big an area is affected?
PRESENTATION DETAILS
101 ‘… America, South …’ comma missing
132 Insert space before ‘in’
140 ‘Considering …’ -this sentence is incomplete, it is just a clause introducing something that is missing.
156 ‘Meanwhile …’ is an incomplete sentence – just a clause. I suggest replacing with ‘Weak disturbances in forest cover are highly time-bound.’
160 Delete ‘are not considered’ - duplication.
166-169 This sentence misuses punctuation (: and ; are repeated). Please re-write.
Fig.3 There is space to replace codes with brief versions of types – e.g. ‘plantation’.
Table 4 118 should be 18
254-260 There should be a space before ±
260 Not a sentence: ‘both …’ implies ‘ …and’
268 ‘Western Siberian Plain in North America’ ??
Fig.4 As each small symbol represents an area (grid square?), the colours must represent density. So ha per … ? Up to 1500 ha, so per at least 39 x 39 km. Please state resolution of this & Fig.5.
Fig.5 ‘Forestry replanting ‘ is inconsistent with text (lines 284, 288 etc.), other Figures (8 & 9) and Table 1 (‘Forestry disturbance’) and does not seem to be used elsewhere.
Actually ‘forestry disturbance’ is an unfortunate term for just one type of forest disturbance – disturbance as a disturbance type. Could it be replaced throughout by ‘forestry replanting’, ‘recovered disturbance’ or just ’replanted’ ?
284-293 Presumably Mha should be M ha
Fig. 6 caption Insert ‘Note varying scales.’
Fisg.6 & 7 maps show density, so it is necessary to state the unit area and (as these are rectangular) its dimensions.
Fig.7 What is the rationale of having red = most in a & b, but red= least in c and d? (For me, a, c and d might be considered ‘good’; b is ‘bad’.). Fig. 6 was consistent with red = most, so readers are going to be confused here.
328-330 This is misleading, based on the inclusion of ‘all’ in Fig.8b. That should be replotted excluding ‘All’. Consistency over the 5 types is thus much less, and the big deviation for Forest fire requires comment.
Figs. 8a, and 9a-d: Note that all show highly skewed distributions of both x and y variables. Calculating regressions on logarithmic scales would reduce the influence of the few high values. It would, however , increase the leverage of the numerous small values: a choice has to be made based on the absolute error margins of small versus large values. Perhaps both types of regression should be presented.
Citation: https://doi.org/10.5194/essd-2025-346-RC1 -
RC2: 'Comment on essd-2025-346', Anonymous Referee #2, 28 Jul 2025
General Comment
The manuscript describes a 30-m global forest disturbance dataset (11 disturbance types) for the time of 2000 to 2020. Disturbance is derived from Landsat data applying the CCDC analysis. My comments focus primarily on the accuracy assessment and area estimation components of the work. A primary area of improvement of the manuscript would be to provide a clear articulation of the sampling design used to collect the data for the accuracy assessment and area estimates. Without a clear description of the sampling design and additional details, it is impossible to ascertain how the accuracy and area estimates were obtained.
Specific Comments
1. Additional details related to the sampling design(s) must be provided. It is unclear how specifically the sample of 57,000 30-m sample units were selected for the model training and validation (Lines 72-73). In Section 2.3 (Lines 153-154), the text states that “8 individuals were uniquely responsible for selecting 8 types, while an additional 4 individuals conducted secondary confirmation of the selected samples.” This text seems to be referring to the process of labeling the sample units, not explaining how the sample units were selected. Did these individuals actually choose which sample units (30-m pixels) were in the sample? There is no mention of randomization in the protocol for selecting the sample, and no details presented of whether strata are present, even though later in the manuscript stratified estimation formulas for accuracy metrics are provided (equations 5 through 10). To compound the confusion, the Figure 3 confusion error matrix has a sample size of nearly 17,000, but there is no mention in the text of how these sample units were selected. Is it a random subset of the 57,000 mentioned earlier? Or are these 17,000 sample units entirely independent of the training sample of 57,000? It is essential to describe the sampling design(s) used to select these units.
2. I have several concerns with the Figure 3 confusion matrix, which I will list as separate items as follows:
a) It seems very unlikely that there would be no errors associated with the undisturbed class (which is class 0). Out of 3476 cases, there was never a commission error or omission error of “undisturbed” – this class is perfectly mapped. It seems implausible that disturbed and undisturbed forest can be classified with 100% accuracy.
b) The confusion matrix is presented in terms of sample counts, which is reasonable if the sampling design is simple random. Yet the authors present formulas for stratified sampling (equations 5-10). In particular, equation (5) indicates how the cell proportions should be estimated for a stratified sample, but that formula was not apparently used in the analysis. The confusion matrix should be presented in terms of the estimated pij (cell proportions) when stratified sampling is used. This concern links to comment 1 because the manuscript does not include description of the sampling design.
c) Row and column totals need to be added to Figure 3.
d) It is unclear what the vertical color bar on the right of the figure represents (range from 0 to 40,000). Please remove it or explain what it is. e) I will identify this comment as purely an opinion, but I am skeptical that a disturbance product can achieve the high accuracies reported. Accurately mapping forest change is exceedingly difficult, so to achieve user’s and producer’s accuracies of over 95% for many of these disturbance types doesn’t seem possible. Comment 2a is related to this same concern.
3. The accuracy estimates reported on page 12 and in Table 4 are also a cause for concern.
a) It is evident that the stratified formulas were not used to estimate producer’s accuracy and overall accuracy. If the sampling design is stratified and the stratified formulas were not used, these estimates would be incorrect.
b) It seems very likely that the standard error values are incorrect for several cases. For example, if we had a simple random sample with a sample size of n=17,000 (approximate sample size of matrix in Figure 3), the standard error of overall accuracy would be SQRT[(0.95)*(0.05)/17000]=0.0033 or 0.33%. The reported standard error for overall accuracy is 2.86% from line 253, nearly 10 times larger. The standard errors for producer’s accuracy of Types 18 and 19 (approximately 20% and 15%) are suspiciously large given the large sample sizes for these two disturbance types. Lastly, the standard errors reported for user’s accuracy also don’t match what I calculate if I apply equation (7) to the data in Figure 3. Please re-check the standard error estimates to confirm.
c) Note that Type 18 in Table 4 is accidentally mis-labeled as “118”
4. Table 5 provides estimates of area of the GFD types. Presumably these are from the inadequately described “validation” sample. The Abstract should be revised to clarify what is presented in the manuscript. The manuscript’s title suggests that the primary purpose of the manuscript is to present a new global forest disturbance dataset (i.e., a map). But key parts of the manuscript are sample-based estimates of area, which would use the disturbance map for stratification, but the key data are then the sample and disturbance type labels provided by the expert interpreters. For area estimation the role of the new disturbance map is secondary. If the main objective of the manuscript is to provide this global dataset, then sample-based area estimates would seem unnecessary and only the accuracy results would be necessary to present. This same ambiguity is present in the Conclusion section. Lines 354-358 highlight the map of disturbance. But without any transition flagging the use of sample-based area estimation, Lines 358-360 then report sample-based estimates of area (Table 5) that use only the map through stratification of the sample. Please revise the Abstract and Conclusion to more clearly identify the purpose of the map and the role of sample-based area estimation to the objectives of the manuscript.
Technical Corrections:
1. Line 15: It is not clear whether the number to the right of the +/- is a standard error or a margin of error of a confidence interval. Please identify more clearly.
2. Lines 19-20: The comparison to other datasets provides an evaluation of “agreement” or “consistency” with these other datasets. These other datasets are not “truth”. Therefore, agreement with these other datasets does not “confirm reliability” or convey “accuracy” but instead quantifies consistency with other datasets.
3. Line 43: What specifically is “subjective” about field surveys? The implication is that remote sensing is not subjective, but that would seem dubious because surely there are subjective components of remote sensing as well.
4. Lines 19, 225, 226, 321: This is a minor point, but stating that a comparison is made with “existing” datasets is not meaningful because we obviously cannot make a comparison to a dataset that does not exist. It would be better to use “other datasets” instead of “existing datasets”.
5. Page 10, equation (10): This formula for the standard error of the estimated proportion of area does not match equation (10) presented in Olofsson et al. (2014).
6. Equation (11): The use of “UA” for the standard error will be confusing because it could easily be misread as an abbreviation for “User’s Accuracy” and “UA” provides no obvious connection to standard error.
7. Equation (12): Please check this formula. It seems unlikely that there would be a “bar” above qi (indicating a mean) in the denominator but no “bar” above pi in that same denominator.
8. Line 226: Because these other datasets are not “truth”, comparisons to these datasets would represent “agreement” and “disagreement”. Use of the term “errors” does not seem appropriate here.
9. Line 234: a space should be inserted between “s” and “p” in “asp”.
10. Table 4: state what the +/- columns represent.
11. Line 288: The meaning of “robust” precision is unclear. In what sense can precision be “robust”?
12. Line 326: “MEA” should be “MAE” and the word “only” should be removed from before “13%” as that is a value judgment of magnitude of the disagreement.
13. Panel b) of Figure 8 should be deleted or perhaps converted to a small table. The R^2, MAE, and RMSE values do not make much sense for only 6 data points and the “All Types” case must have a massive influence on the summary statistics.
14. Lines 338-340: “MEA” should be “MAE” in multiple places.
15. Throughout the manuscript the word “samples” is used incorrectly. The definition of “sample” in statistics is that it is a subset of n units selected from the population. The individual elements of that sample are “sample units”, in this case a sample unit is a 30-m pixel. Thus, there are not 57,000 “samples” (e.g., Line 13), but one “sample” consisting of 57,000 sample units or sample pixels. This incorrect use of “samples” should be corrected throughout the manuscript.
Citation: https://doi.org/10.5194/essd-2025-346-RC2 -
RC3: 'Comment on essd-2025-346', Anonymous Referee #3, 28 Jul 2025
This manuscript integrated CCDC time series change detection method and CART model to map and identify forest disturbance type at a global scale. I have a few concerns on the validity and robustness of the proposed method.
- The detection of disturbed forest pixels solely depends on CCDC model. What’s the accuracy of change detection? I wonder whether the change detection error and/or modelling uncertainty of CCDC will affect the subsequence disturbance type mapping? CCDC assumes NDVI of all the forest pixels can be quantified by a linear trend term and a harmonic seasonality term (Eq. 1). In fact, not all the pixels will perfectly fit into this assumed model, which would consequently affect the fitting performance of CCDC and therefore the subsequent disturbance mapping. Besides, in addition to CCDC, there are many change detection models available, such as BEAST, BFAST, and Landtrendr. Why did the author go with CCDC? Will applying different model end up with the same change detection outcomes?
- It seems that the authors only considered and mapped abrupt forest loss, while graduate forest changes (e.g., forest degradation) and forest gain (e.g., natural regrowth and afforestation) were only mapped.
- Line 80: “CRAT” should be “CART”
- 5. Does the undisturbed area indicate no change has occurred in the pixel? What’s the omission rate (or under-detection rate) of CCDC?
- How does the proposed algorithm perform in Landsat images with dense and consistent cloud coverage (e.g., in tropical area)?
Citation: https://doi.org/10.5194/essd-2025-346-RC3
Data sets
Global forest main disturbance types between 2000 and 2020 Shidong Liu, Li Wang, Wanjuan Song https://doi.org/10.6084/m9.figshare.28465178
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
499 | 195 | 38 | 732 | 9 | 19 |
- HTML: 499
- PDF: 195
- XML: 38
- Total: 732
- BibTeX: 9
- EndNote: 19
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1