the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
A High-Resolution, Long-Term Global Radar-Based Above-Ground Biomass Dataset from 1993 to 2020
Abstract. Understanding global carbon dynamics and budgets under climate change, land-use shifts, and increasing disturbances re- mains challenging due to the limitations of existing coarse spatial resolution and short-term or discontinuous biomass datasets. In this study, we generated a new global annual above-ground biomass carbon (AGC) dataset at 8 km spatial resolution from 1993 to 2020. This dataset is derived from satellite radar backscatter data and integrates vegetation and climate information, such as tree cover, tree density, and background climate data, to enhance the accuracy of global AGC mapping. Our dataset estimates an average global above-ground carbon stock of 378 PgC, aligning with other global estimates. We observe a slight gross increase of 1.18 PgC in global vegetation above-ground biomass carbon stocks from 1993 to 2020, with relatively stable variation. This reflects a balance between above-ground biomass carbon gains and losses across different biomes. Temperate and boreal forests are the primary contributors to global vegetation above-ground biomass carbon gains from 1993 to 2020, with increases of 0.4 and 0.5 PgC, respectively. In contrast, gross above-ground biomass carbon losses are predominantly observed in global tropical forests (-10.7 PgC) and global shrublands (-1.0 PgC). This suggests non-forest vegetation may offset the large above-ground biomass losses in tropical forests. Notably, El Niño events in 2015/16 triggered significant pantropical AGC losses of approximately -2.86 PgC, and regions with reported tree mortality events (Hammond et al., 2022) exhibited local AGC density declines of -0.34 MgC/ha. This long-term, temporally continuous, and moderate-resolution dataset provides a valuable resource for understanding biomass carbon dynamics and integrating these processes into Earth System Models. The AGC dataset is openly accessible, alongside with this manuscript.
- Preprint
(4157 KB) - Metadata XML
-
Supplement
(2598 KB) - BibTeX
- EndNote
Status: final response (author comments only)
- RC1: 'Comment on essd-2025-330', Anonymous Referee #1, 07 Sep 2025
-
RC2: 'Comment on essd-2025-330', Anonymous Referee #2, 26 Sep 2025
The manuscript presents a global 8 km annual above-ground biomass carbon (AGC) dataset for 1993–2020 by upscaling the radar backscatter with Random Forest (RF). The product is openly shared and represents the longest temporal span among global radar-based AGC maps, which is of potential interest to the carbon-cycle community.
However, the current version falls short of the “high-resolution & high-quality” criteria advertised in the title and abstract, and several methodological and validation issues should be addressed before the dataset can be considered reliable.
1. The title and abstract repeatedly claim “high-resolution,” yet the actual spatial resolution is 8 km—still far coarser than the scale of most forest stands or forest disturbance patches.
2. The manuscript relies exclusively on the ESA-CCI 100 m AGB product as truth for the RF model. However, CCI carries high uncertainties in tropical evergreen and boreal forests, and no quality-layer filtering or uncertainty weighting is applied, thus these errors or uncertainties will propagate directly into the final AGC estimates and degrade their accuracy.
3. The study lacks independent direct validation against field samples such as national forest inventory (NFI) plots, GEDI footprints, or field surveys, leaving the accuracy of the 8 km AGC estimates largely unverified.
4. Leave-one-year-out cross-validation against the ESA-CCI 100 m AGB product for 2017–2020 is insufficient: it may conceal strong temporal autocorrelation and leaves the 1993–2016 period completely unvalidated. The authors should add more validations and discuss the extrapolation limitations and consider stratified modelling.
5. Present predictors used in the RF model are limited to radar backscatter, tree cover, tree density, mean annual temperature and precipitation; key carbon-related drivers such as PAR, nitrogen deposition, forest age and CO₂ are missing.
6. A single global model may not be applicable to all regions; it is recommended to develop regional and land-cover-type-specific models.
7. Accuracy is reported only at the global scale; please stratify validation by different land cover types (broad-leaved, needle-leaf, mixed, and so on), and discuss the necessity of developing biome-specific models and the potential gains in prediction accuracy.
8. Add a section in analyzing sub-pixel variance within 8 km pixels.
9. A systematic review of existing global or regional AGB or AGC products should be added.
10. The inherent limitations of C-band (saturation ~100 Mg C ha⁻¹) and Ku-band (sensitive to canopy water) are insufficiently discussed. Quantify the fraction of global forests with AGC >100 Mg C ha⁻¹, estimate potential underestimation, and warn users accordingly.
11. No climatic-gradient analysis is presented; please divide the globe into different mean-annual-temperature (MAT) and mean-annual-precipitation (MAP) bins, plot prediction error for each climate zone, and discuss any systematic over- or under-estimation detected between per-humid and semi-arid regions.
12. The impact of smoothing window length is not tested; compare 3-month vs. 12-month moving-average AGC anomalies and report additional short-term loss detected; discuss the risk of missing rapid events under the 12-month window.
In conclusion, the manuscript oversells the spatial resolution, omits both independent and field-based validation, and under-represents input-data and model uncertainties; furthermore, it lacks results on biome-stratified accuracy and full error propagation. Consequently, I cannot recommend the manuscript for publication in ESSD in its present form.
Citation: https://doi.org/10.5194/essd-2025-330-RC2 -
RC3: 'Comment on essd-2025-330', Simon Besnard, 29 Sep 2025
General assessment
The manuscript by Guohua et al. presents a new long-term dataset of above-ground biomass carbon (AGC), upscaled using a random forest model trained on ESA-CCI AGB maps. The dataset spans the years 1993-2020 at an 8 km resolution and is openly available. Such a record has the potential to fill important gaps in biomass monitoring, especially in linking disturbance events to global carbon cycle dynamics. The paper is clearly written, and the availability of the dataset is a valuable contribution to the community.
However, the core methodological choice (i.e., training solely on ESA-CCI biomass maps) means that what the authors provide is essentially an emulator of ESA-CCI biomass, extended to coarser resolution but with longer temporal coverage. This distinction is crucial: the product inherits ESA-CCI’s strengths and weaknesses, rather than providing an independent observational constraint. For this reason, I have several concerns regarding the robustness of the methodology, the framing of the product, and the validation strategy, which need to be addressed before the manuscript can be considered for publication in ESSD.
Major comments:
1. Resolution claims.
As also noted by other reviewers, referring to 8 km as “high-resolution” is misleading. While 8 km is finer than the previous radar-based time series (25 km), it remains coarse relative to the scale of disturbances and forest management. The term “moderate resolution” would be more accurate and should be used consistently throughout, including the title and abstract.2. Training and validation strategy:
The reliance on ESA-CCI biomass (2017-2020) as the sole source of training and validation is problematic. This creates a circular dependency and risks propagating ESA-CCI biases into the time series. Leave-one-year-out cross-validation within 2017-2020 does not address temporal extrapolation to earlier decades. The authors should:- Assess performance using independent data sources (e.g. NFI, GEDI, plot databases).
- Discuss limitations of back-extrapolation over two decades without ground-based constraints.- Test spatial cross-validation schemes (e.g., spatial blocking, leave-one-region-out) to quantify prediction into unknown regions (i.e., performance when extrapolating beyond areas represented in the training set.)
- Consider feature-space cross-validation approaches (e.g. K-nearest-neighbour splits in predictor space) to better capture performance across the diversity of climate and vegetation conditions.Beyond aggregate stock comparisons (Table 2), a plot-to-map validation exercise across products should be included (as done in Santoro et al. 2022 and in related IEEE work, doi:10.1109/TGRS.2022.3202559). Such an exercise would reveal regional biases, saturation effects, and biome-specific performance, allowing deviations to be attributed more clearly to differences between global products (e.g., arising from the modelling framework, radar physics, or the ESA-CCI reference data itself).
3. Model performance:
The reported test metrics (R² up to 0.99, RMSE < 10 MgC/ha) appear unrealistically high. These values reflect consistency with ESA-CCI biomass rather than independent ground truth. As noted above, the approach effectively emulates ESA-CCI biomass, reproducing its spatial patterns at a coarser resolution but extending them over time. Reported performance must therefore be contextualised within ESA-CCI's known accuracy and limitations. For example, recent evaluations of ESA-CCI against NFI and ICESat-2 canopy height data (Santoro et al., 2024, Remote Sensing of Environment, 291:113612) show a more modest agreement (R² < 0.7 in some strata, with systematic biases at both low and high biomass levels). Without this context, the exceptionally high R2 values risk being misinterpreted as true accuracy.In addition, ESA-CCI biomass provides per-pixel standard deviation layers that quantify spatially varying uncertainty. These should be used in training (e.g. as weights or for stratified validation) to prevent high-uncertainty pixels from biasing the learning process.
4. Uncertainty representation:
The uncertainty metric provided (variation across RF model variants) does not capture the primary sources of error, such as input bias, temporal extrapolation, or radar saturation. Users will require a more realistic and transparent quantification of uncertainty. I recommend including an error budget table, distinguishing:- uncertainties inherited from the training data (ESA-CCI per-pixel standard deviation layers, which could be incorporated as weights in the learning process),
- model uncertainty, for example, estimated using quantile random forests to provide prediction intervals,
- uncertainties associated with spatial extrapolation across climates and biomes, ideally assessed with an area of applicability analysis to flag where predictions fall outside the training domain (see paper https://besjournals.onlinelibrary.wiley.com/doi/full/10.1111/2041-210X.13650).
Presenting both training data uncertainties and model-derived prediction intervals would allow users to interpret the dataset more effectively, identifying where it is robust and where it should be treated with caution.
5. Comparisons with other datasets:
The comparison with existing products is selective and somewhat superficial. For example, the substantial difference between Besnard et al. (2021) (248 PgC vs. 378 PgC) warrants further in-depth analysis. Are discrepancies due to mask definitions, sensor physics, or model setup? A regional, biome-stratified comparison would strengthen the credibility of the dataset. Moreover, Besnard et al. (2021) should not be treated as a dataset. The appropriate benchmark is Santoro et al. (2022, RSE, 279:113114), which provides a global AGB product derived from C-band scatterometer data and validated against inventory plots. This dataset should replace Besnard et al. in the comparison, and side-by-side maps or regional time series would make the comparison much more meaningful.Minor comments:
L50: The latest ESA-CCI biomass release (version 6) now includes the years 2007, 2010, and 2015–2022. Please update these numbers accordingly.
L86: Please clarify how outlier pixels were removed using standard deviation thresholds. Was this masking applied consistently at the 8 km grid? If so, how can results be meaningfully compared to 25 km products, given that the effective forest mask may differ?
Figures: All figures should report uncertainty ranges (e.g., confidence intervals, standard deviations, prediction intervals), not just central values.
Concluding remarks:
The dataset is potentially valuable, but the manuscript currently oversells its resolution and robustness. The absence of independent validation and the insufficient treatment of radar limitations undermine confidence in its utility. Substantial revisions are necessary to enhance uncertainty quantification, clarify the product's scope, and provide a more balanced interpretation of the results.Citation: https://doi.org/10.5194/essd-2025-330-RC3 -
RC4: 'Comment on essd-2025-330', Anonymous Referee #4, 06 Oct 2025
This study develops a global aboveground biomass (AGC) dataset for 1993–2020 by integrating radar backscatter data, tree cover, tree density, and climatic variables using a random forest model. Overall, the manuscript is well structured, with clear writing and an informative discussion. The random forest model used to predict biomass performs well according to the reported validation results. The newly developed dataset reveals a gradual global increase in AGC, showing good consistency with existing datasets in both temporal trends and spatial patterns, and captures changes following extreme events. This dataset provides valuable support for improving our understanding of global carbon dynamics. However, I have several concerns regarding the methodological description, model validation, uncertainty quantification, and statistical analyses. My major and minor comments are summarized below for clarity.
Major comments:
- As this is a data description paper, the reliability of the dataset is of primary importance. The abstract should include key accuracy metrics (e.g., R², RMSE) of the AGC estimates.
- Some numbers in the abstract do not match those in the main text. For example, the abstract states that global AGC increased by 1.18 Pg C between 1993 and 2020, but this figure does not appear in the main text or supplementary materials. Similarly, the abstract reports AGC increases of 0.4 Pg C and 0.5 Pg C for temperate and boreal forests, respectively; yet in the main text, forests AGC in Eurasia increased by 0.4 Pg C and boreal forests by <0.1 Pg C.
- The radar backscatter data from Tao et al. (2023) have a native resolution of 8.9 km. Why were they resampled to 8 km (or 0.083°)? Please specify the resampling method (e.g., nearest neighbor, bilinear interpolation) and discuss whether resampling could introduce systematic biases that affect AGC estimates.
- The preprocessing section mentions applying a 12-month moving average to smooth VOD data. Please justify this choice and test how different window sizes (e.g., 3, 6, 9, or 12 months) affect annual mean VOD and AGC estimates. A 12-month window may overly smooth seasonal variations, especially in deciduous forest.
- The study uses the Hansen et al. (2013) tree cover map in 2000 as a static predictor in the random forest model, yet tree cover is not constant from 1993–2020. This assumption likely introduces bias in regions with substantial forest change.
- The manuscript mentions using global ecological zone and ESA CCI land-cover data to generate land-cover inputs, but does not indicate which year(s) or version(s) were used. Please clarify whether this dataset represents a specific year, a multi-year composite.
- The manuscript employs a leave-one-year-out cross-validation (CV) approach to train and test the machine learning models. Please clarify whether all global vegetated pixels were used for model training.
- The uncertainty assessment currently measures variability between candidate models rather than the inherent uncertainty of the selected optimal model (Rad_Tc_Td_Clim). This approach may not fully capture the inherent uncertainty of the applied model, potentially leading to an overestimation of dataset reliability. Please revise the uncertainty assessment to reflect prediction errors of the optimal model, such as using cross-validation residuals, bootstrap sampling, or input-propagation analysis. Additionally, update the spatial uncertainty map (to show regional differences in uncertainty).
- Random forest performance depends strongly on hyperparameters (e.g., number of trees, max depth, max features, minimum samples per leaf, bootstrap setting). The paper identifies the optimal model (Rad_Tc_Td_Clim) but does not report these parameters or tuning strategy (grid search, random search, Bayesian optimization).
- L201–205 report AGC gain and loss rates of +0.73 and −0.74 Pg C yr⁻¹, which differ from the values shown in Figures 2C and 2D. Please verify and ensure that all figures and text values are consistent, with clear units and time periods.
- The manuscript alternately refers to the 8-km resolution AGC dataset as “high-resolution”, “fine-resolution”, and “moderate-resolution”. For example, the title describes it as “high-resolution,” whereas the abstract refers to it as “moderate-resolution.” Please standardize the terminology throughout the manuscript.
- The study uses 1,186 tree-mortality events (1995–2018) from the International Tree Mortality Network to validate biomass decline detection. However, no spatial distribution map or matching description is provided. Please add a figure showing these events.
- L310 attributes AGC gains to “forest area expansion and improved management,” but Section 2.1.4 specifies that tree cover (Tc) is static at the 2000 baseline. If tree-cover change was not part of the model input, this attribution cannot be directly supported. Please clarify the evidence source or rephrase this statement more cautiously.
- When describing global AGC trends (e.g., Figure 3), it appears that a spatially averaged time series was used, but it is unclear whether this average was area-weighted. If not, the global trend may be biased toward regions with more grid cells (e.g., high latitudes).
Minor comments:
L147–148: Check the format of Equation (2).
L161: “vegatation” should be corrected to “vegetation.”
L200: Please verify whether ‘ca.’ is correctly used or a formatting issue.
L282, L285: Correct “Figure S??” citations; the referenced figure or table supporting these statements appears to be missing.
L359: Remove redundant punctuation “.)”.
Figure 1B: The unit of CV (Coefficient of Variation) in the legend is inconsistent with the formula in Equation (2), please verify whether CV is reported as a decimal or a percentage, and ensure it is consistent with the output of Equation (2).
Figure 1C: Region name clarification needed: (1) Does “Tropical Forest (American)” refer specifically to South America? Please ensure consistency with the statement in L188. (2) Does “Temperate Forest (American)” cover only North America, or the entire American continent (including South American temperate forests)?
Citation: https://doi.org/10.5194/essd-2025-330-RC4
Data sets
A High-Resolution, Long-Term Global Radar-Based Above-Ground Biomass Dataset from 1993 to 2020 Guohua Liu et al. https://doi.org/10.5281/zenodo.15735548
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
1,555 | 225 | 36 | 1,816 | 46 | 22 | 35 |
- HTML: 1,555
- PDF: 225
- XML: 36
- Total: 1,816
- Supplement: 46
- BibTeX: 22
- EndNote: 35
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
The authors present a dataset of aboveground carbon density (AGC) derived from a long record of satellite scatterometers and auxiliary datasets. The data is provided at a pixel size of approximately 8 km. Thereof several trends are discussed with fluctuations due to major El Niño events occurred within the time interval of the acquisitions. The investigation tries to provide new evidences on the carbon cycle since 1992. After reading this manuscript, I have severe doubts concerning the validity of the dataset and find the interpretation of the results very qualitative. Some of the interpretations are furthermore either misleading or ambiguous. In the current form, the manuscript is not suitable for publication. It is suggested that the study is revised first. Personally, I encourage the authors to resubmit after having addresses the issues pointed to in this review. Below the authors can find a list of general comments followed by more line-to-line specific comments.
Specific comments.
L80. What does “covering most global land areas” mean?
L81. The data were regridded to a pixel size of 8 km (not resolution). Make sure that throughout the paper you do not refer to resolution but grid cell size or pixel size. What was the purpose of regridding?
L83. Please revise references as they mostly looked at passive microwave data.
L86. Please be more specific about the detection of outliers.
L88. What is the pixel size of the map by Tootchi et al.?
L89. Please explain the rational according to which pixels with a given fraction of peatland had to be masked out? Does this include pixels with trees growing on peatland?
L90. What is your definition of “vegetation”?
L90. Is Harper the correct reference for the CCI land cover map?
L102. Strictly speaking the mean temperature cannot be derived from the min and max values. Please rephrase.
L113. Replace “temporate” with “temperate”.
L114. Before the pixel size was expressed in km, Here in degree. Please use one single unit throughout the paper.
L114. Please explain how you could average a map that consists of a classification.
L120. What is the pixel size of the tree mortality dataset (if applicable)? Could you also specify a range of extent of these events? Are they much larger than the pixel size of the radar data?
L129. Can you be more specific on selecting 3 years (which?) to train the model and 1 (which?) to test. If the training set includes data in the test set, the validation is not independent because the backscatter data was already touched during training.
L143. I suppose that you are predicting annual AGB rather than AGB changes.
L153. Why not calling C_sink Delta_C instead? I have difficulties to understand the meaning of a negative C_sink value.
L158-167. Please provide some illustrative results that show the agreement between predicted and reference AGB.
Figure 1. What is the reference for the biomes? Are the uncertainties negligible so that they are not shown in panel (c)?
Figure 2. Panel (B). I assume that the loss of stocks and the recovery in 2014-2018 is associated to the El Nino event. Do you have any means to confirm the magnitude? Could it be that the fluctuation was mostly driven by moisture and the overall magnitude of the loss/gain cycle is smaller?
L272, 278 and 279. Since no definition of the biomass variable mapped was provided, it is confusing to read that AGCD was first attributed to forests, then to global vegetation area and finally to woody vegetation.
L275-276. The explanation makes no sense. Radar/radiometers does not sense anything below a few centimeters of the ground surface (unless the surface is extremely dry). As such, none can sense “under-ground” biomass (whatever “under-ground” biomass means).
L278. What is with forests growing on peatlands? Are they included or excluded?
L281 and 285. References to supplementary figures were not updated upon submission.
Table 2. What kind of processing was applied to account for different definitions of biomass in each of the studies?
L310. I suppose that the author means an increase of the AGC stock.
L349, 356 and 406. What does “radar scatter data” mean?
L358-364. The authors provide three explanations for why the data product in 1997 different compared to adjacent years. I am not convinced that the explanations hold true. First, it is unclear why “sensor anomalies” and “anomalous radar signals” (what are these actually?) would lead to a severe drop of the estimates. They could well lead to the opposite. Second, I understand that reference data for 1997 probably do not exist and so the result for this particular year cannot be validated. However, is it plausible that the El Nino event in the same year has an immediate effect on the biomass estimates given that the biomass was predicted from an average value of all ERS observations taken in 1997?
L364. Why only in Africa? And why just 1997 if the explanation is the decay of quality of the radar signal?
L366. I would be careful with this interpretation. The dataset by Besnard et al is based on the same type of the data used to predict biomass in this study but the differences between the datasets are large. Besnard et al. and Santoro et al. were validated with reference data at multiple scales. The dataset proposed here has not been validated so that this interpretation needs thorough revision. An in-depth comparison of datasets is suggested.
L373. A single reference to a paper published in 1989 is insufficient to prove that traditional approaches have been widely used.
L375-377. This interpretation can be applied to any RS-based map. As such, it is not unique to this study.
L408. It is referred to non-forest ecosystems which have not been discussed throughout the paper.