the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
FineKarstAGB: A 30 m resolution aboveground biomass dataset for Southwest China derived by upscaling plot-level inventory using sub-meter GaoFen satellite data
Abstract. Southwest China has emerged as a key global carbon stock due to widespread forest expansion and aboveground biomass (AGB) increases driven by major ecological restoration since 2000, making accurate AGB estimations vital for assessing restoration efficacy. However, existing global and national-scale AGB products exhibit substantial limitations in this region, with little correlation with National Forest Inventory (NFI) plot and UAV LiDAR data, which is likely related to the pronounced spatial heterogeneity induced by Karst landscapes and large-scale restoration efforts that exacerbate mixed-pixel effects. To address these challenges, this study proposes a Canopy Structure-driven Multi-feature Fusion Network (CSMF-Net) designed for high-precision AGB estimation in complex regions. The method takes NFI plots data as ground truth and integrates GaoFen imagery, horizontal structure derived from tree crown segmentation and vertical structure represented by canopy height data. Based on this approach, we generated a fine-grained 30 m AGB dataset (FineKarstAGB) covering four provinces in Southwest China (Yunnan, Guizhou, Guangxi, and Hunan). Accuracy assessment against independent NFI plot data demonstrated the model's robust performance (r = 0.83, RMSE = 28.51 Mg/ha), showing no evidence of saturation in high-biomass regions. Furthermore, a structural consistency assessment using an independent UAV LiDAR-derived Canopy Height Model (CHM) confirmed that FineKarstAGB maintains high ecological consistency with the true forest vertical structure (R2 = 0.54). Other public datasets show a weak correlation with both NFI (r < 0.4) and LiDAR data (R2 < 0.1). Due to the tree-level segmentation, our dataset also quantifies AGB contributions from sparse trees outside forests, thus enabling more comprehensive and spatially explicit carbon accounting. This dataset provides critical support for regional carbon cycle assessments, fine-scale evaluations of ecological restoration outcomes, and progress toward national carbon neutrality targets. The dataset is available at https://doi.org/10.57760/sciencedb.33452 (Li et al., 2026).
- Preprint
(27705 KB) - Metadata XML
-
Supplement
(10144 KB) - BibTeX
- EndNote
Status: final response (author comments only)
- CC1: 'Comment on essd-2026-102', Zhiyu Zhang, 11 May 2026
-
RC1: 'Comment on essd-2026-102', Anonymous Referee #1, 12 May 2026
The authors propose a Canopy Structure-driven Multi-feature Fusion Network (CSMFNet) to produce a 30 m resolution aboveground biomass dataset for Southwest China by upscaling plot-level inventory using sub-meter GaoFen satellite data. Based on independent NFI plot data and UAV LiDAR derived CHM data, they show that the new regional dataset outperforms existing global and national-scale AGB products, which can provide critical support for regional studies.
While the proposed CSMFNet seems appealing by fusing the forest spectral information with its horizontal and vertical structural characteristics to achieve high-precision AGB estimation, the major issue lies in the temporal gap between ABG label and remote sensing features. The NFI plot data were collected in 2014-2018 whereas the GaoFen imagery were collected in 2023-2024. I do not think that the filtering procedure based on a random forest model (lines 180-185) can mitigate this issue, because samples used to train and validate the random forest model were associated with uncertainties and lacking of the temporal information.
The authors evaluate the agreement using R2 and r inconsistently. I suggest using the same measure consistently throughout the paper.
The tree crown segmentation dataset is an important input for the model. How accurately were the trees delineated? It is necessary to report accuracies measures such as F1, recall, and precision, as well as how the accuracy assessment was performed. What does R2 (0.82) refer to? It is not a commonly used metric in individual tree segmentation.
Section 2.2.3: does the UAV LiDAR data cover the same area of canopy height dataset or just part of it? Please provide more details about the canopy height dataset, such as collection time, point density, location, spatial resolution of CHM, etc.
Section 2.3.2: consider adding a map to show the coverage of the UAV LiDAR. When were the UAV LiDAR data collected? What’s the spatial resolution of UAV LiDAR derived CHM?
The spatial resolution of ESA CCI product is different from that of the product in this study. How did the authors address this scale mismatch between the ESA CCI pixels and NFI plots?
According to Table 1, there is significant temporal gap between the AGB datasets and NFI plots. How reliably could the comparison below evaluate the performances of different products?
Fig 5: it is difficult to judge whether the difference between products should be attributed to model performance or temporal gap.
Fig. 6: the low performance of other AGB datasets could be caused by temporal mismatch with the NFI plots.
Line 263: how the uncertainty was predicted should be explained. I could not find the uncertainty map from the given link.
Citation: https://doi.org/10.5194/essd-2026-102-RC1 -
CC2: 'Comment on essd-2026-102', Tao Yu, 12 May 2026
Overall, this study develops a novel canopy structure-driven multi-feature fusion network and produces a 30 m resolution aboveground biomass dataset for the karst region of Southwest China. The research topic is highly valuable for regional carbon cycle assessment, ecological restoration evaluation and carbon neutrality targets. The data foundation is solid, the methodological framework is innovative, the accuracy evaluation system is comprehensive, and the results are reliable with clear scientific and application significance. The manuscript is well-structured and merits acceptance after moderate revision.
There are significant spatial distribution differences between our results and those of other products (Figure 3). What are the reasons for this?
Figure 7(a), few AGB values below 50. Is this reasonable?
Citation: https://doi.org/10.5194/essd-2026-102-CC2 -
RC2: 'Comment on essd-2026-102', Anonymous Referee #2, 23 May 2026
This study generated a dataset of aboveground biomass in southwest China using high-resolution satellite data. This study aims to use direct biophysical parameters, such as height and crown size to model biomass. It should be more reliable than other datasets using environmental variables, since this region has intense human activities which could affect the statistic relationship between AGB and environmental factors. I do agree with authors about this principle.
Major concerns:
- study area: the four provinces are not typically defined southwest China. To make it aligned with commonly used definition. The area should include Sichuan, Chongqing, and Tibet.
- The influence of plot selection: authors used remote sensing indices (ndvi, crown cover-age, and canopy height) and a RF model to exclude plots which are far away from the model predictions. The remaining plots were finally used for the formal model training and validation. This approach would generate a biased and overestimated accuracy since the sample selection already used the information for model construction. The more scientific approach is to find the real reason of problematic plots and use data source which is independent to the model input to conduct plot selection.
- Uncertainty of crown segmentation: authors mentioned the minimal size of detected canopy is one pixel, but how reliable it is? How to solve overlapped crown problem in dense forests.
- Figure 5c: this figure cannot well show the distribution of NFI AGB in each bin, so we do not know the agreement between predicted AGB and NFI AGB. It is better to use violin plot
- The contribution of three inputs should be quantitively assessed.
- The assessment of topographic effects should be done, since this region has many hills which affect the remote sensing observation. I would like to know how the accuracy changes with the slope.
- Data problems after I checked the dataset, please see the attached pdf.
-
RC3: 'Comment on essd-2026-102', Wenjuan Shen, 26 May 2026
This study used the CSMF-Net to combine NFI, GF, tree crown segmentation, and canopy height to map 30m AGB in Southwest China. Then this model was validated and results were compared with the existing AGB datasets.
The overall accuracy of the existing tree canopy segmentation dataset is 0.82. However, when applying the corresponding method to other regions for use, the accuracy situation still needs to be reflected. The article does not mention this. The overall accuracy of the existing tree canopy segmentation dataset is 0.82. However, when applying the corresponding method to other regions for use, the accuracy situation still needs to be reflected. The article does not mention this.
When describing the use of GF data in the text, it is necessary to clearly specify which series of data is being referred to.
Section 2.2.3, “the errors are substantially lower than those reported for previously published canopy height products (r ≤ 0.22)”. What exactly is the canopy height product? It seems there are more than one product. Please clearly state the accuracy values for comparing them.
Lines 179-181: How these spectral and structural variables (mean NDVI, crown coverage, ad canopy height) can be combined with AGB to reduce errors?
Fig. 3: The first part of the data, the canopy segmentation data, was copied from another algorithm. It is necessary to distinguish which parts are original and which ones are already available algorithms, and it is necessary to clearly define this.
Section 3.1: The last sentence has clearly demonstrated the advantages of the CSMF-Net model in AGB prediction. Therefore, there is no need to conduct any verification regarding the purpose of this study. Overall, the research approach is clear and there is a certain degree of innovation. However, many parts of the text are not clearly explained, which affects the reliability of the results.
How much NFI survey data and LIDAR data were used? There needs to be a space to display or mark the data. Some schematic diagrams are shown in Figure 3. But were only these covered location data actually used? The text needs to be clear about this and provide spatial display.
What is the specific situation of the GF data? The article mentions some numbers of the data, but is it true that the distribution is across all regions? How are the areas without data distribution handled?
The text mentions that the AGB data from this study needs to be compared with many existing biomass data. It is suggested to pay attention to whether the definitions of forests in different data sets are consistent. If the definitions differ greatly, then comparing the AGB values with them would be of little significance, as the differences in definitions and spatial distribution would lead to significant discrepancies in the results. It is necessary to check whether the definitions of each data for the forest are consistent.
Clearly state which year's AGB data were obtained in this study.
The discussion section is not focused enough. The first and second parts are all comparing with other AGB data, but no clear discussion points have been identified. For instance, the algorithms of these products are different, and the definitions of forests in different data are also different. It is necessary to identify the comparable points.
Section 5.3.3, This section appears to be a summary and outlook, and there is still a need for a closer connection with the verification data.
Citation: https://doi.org/10.5194/essd-2026-102-RC3
Data sets
FineKarstAGB: A Fine-Grained, High-Resolution Dataset of Aboveground Biomass in Southwest China Yixiang Li, Yongqing Bai, and Zhengchao Chen https://doi.org/10.57760/sciencedb.33452
Viewed
| HTML | XML | Total | Supplement | BibTeX | EndNote | |
|---|---|---|---|---|---|---|
| 406 | 282 | 39 | 727 | 440 | 27 | 37 |
- HTML: 406
- PDF: 282
- XML: 39
- Total: 727
- Supplement: 440
- BibTeX: 27
- EndNote: 37
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
It is an Innovative work. This study adopted 0.8 m sub-meter Gaofen-2/7 satellite imagery to characterize the highly fragmented and spatially heterogeneous karst landscapes of Southwest China, effectively resolving the mixed-pixel limitation of conventional medium-resolution remote sensing data. This work produced fine-scale individual tree crown segmentation and high-precision canopy height products using sub-meter GaoFen data. By integrating spectral information, horizontal crown structure, and vertical canopy height, the proposed CSMF-Net deep learning model substantially reduced biomass underestimation and saturation in dense forests. The FineKarstAGB dataset accurately quantified the biomass of scattered trees in agroforestry mosaics, rural landscapes and urban areas, which are ignored by existing coarse-resolution products.
But the most obvious limitation of this study lies in the severe temporal mismatch between reference data and remote sensing observations. The field NFI plots were collected during 2014–2018, while the GaoFen imagery and canopy height data were acquired around 2023–2024, creating a time gap of nearly 10 years. Southwest China’s karst area is dominated by young and fast-growing forests under long-term ecological restoration. Forest stand structure and AGB have changed rapidly over the decade. As a result, the historical plot biomass values cannot truly reflect the forest actual conditions in 2024, which inevitably brings systematic bias and underestimation risks to AGB modelling. Although the authors attempted to filter inconsistent samples using high-resolution image features, this strategy can only reduce rather than fundamentally eliminate the temporal error. The study did not further quantify how much biomass was underestimated over the decade, making it difficult to fully evaluate the magnitude of uncertainty caused by time inconsistency.