the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Unveiling China's Forest Soil properties: High-Resolution, Multi-Depth Mapping of Soil Bulk Density and pH Using Machine Learning Methods
Abstract. Precise monitoring of key forest soil properties is crucial for addressing global challenges like carbon sequestration and soil acidification. However, existing national soil maps, primarily derived from comprehensive ecosystem samples, inadequately represent the distinct characteristics and high spatial heterogeneity of China's vast and diverse forest ecosystems. To bridge this gap, we present the first high-resolution (90-m), forest-specific maps of soil bulk density (BD) and pH across China. Leveraging 4,356 forest soil profiles collected through extensive field surveys and 41 environmental covariates within an optimized Quantile Regression Forests (QRF) framework incorporating forward recursive feature selection (FRFS), we generated wall-to-wall predictions for five standardized depth intervals (0–5, 5–15, 15–30, 30–60, 60–100 cm). Model performance, assessed through 10-fold cross-validation (CV) and independent validation (IV), achieved model efficiency coefficients (MEC) ranging from 0.78 to 0.89 (CV) and 0.60 to 0.66 (IV) for bulk density (BD), and from 0.83 to 0.87 (CV) and 0.71 to 0.81 (IV) for pH, indicating the product's strong capability to capture the spatial variability of forest soil properties across China. The 90-m resolution BD and pH maps contribute to the GlobalSoilMap initiative and provide forest-specific inputs for regional Earth system and land surface models. These products advance the quantification of soil acidification processes and provide critical baseline data for estimating forest soil carbon stocks across China. The dataset is available at https://doi.org/10.57760/sciencedb.25375.
- Preprint
(3325 KB) - Metadata XML
-
Supplement
(3435 KB) - BibTeX
- EndNote
Status: final response (author comments only)
- RC1: 'Comment on essd-2025-496', Anonymous Referee #1, 17 Nov 2025
-
RC2: 'Comment on essd-2025-496', Anonymous Referee #2, 04 Dec 2025
The authors address a significant topic by mapping key soil properties across China's forests using a comprehensive dataset. The resulting high-resolution products have the potential to be a valuable resource for the scientific community. However, several methodological and descriptive aspects require substantial improvement to ensure the reliability and reproducibility of the findings. I recommend a major revision to help the manuscript reach the high standards required for publication. My main comments on the manuscript are as follows:
General comments:
- Resampling covariates with diverse native resolutions to a 90-m grid introduces significant uncertainty, particularly for inputs derived from coarser scales. This issue warrants a detailed discussion and quantification to assess the reliability of the final high-resolution maps.
- The manuscript provides insufficient discussion on the variable importance for BD and pH across different soil layers. The underlying reasons for these variations require further elaboration.
- Natural and planted forests possess distinct driving mechanisms. Developing separate models for each forest type is advisable to accurately capture these specific variations.
- Providing spatial distribution maps for every covariate listed in Table S1 is advisable. The figures need to clearly display the value ranges for continuous variables and the distinct spatial patterns for each category within categorical variables. Specifying the number of sample points in both the training and validation sets for each categorical variable is recommended.
- Forest age represents a critical covariate. Incorporating this variable into the analysis is advisable to improve model performance.
- Providing the original data is necessary to facilitate the reproducibility of the study by other researchers.
- Comparing the current results with existing soil BD and pH products is recommended. The manuscript needs to clarify specific improvements and explain the reasons for these advancements.
- Presenting the spatial distribution of sample points for both the training and validation sets is necessary. The manuscript should also address whether these distributions are spatially balanced.
- A more detailed description of the raw data is necessary. The manuscript should specify the sample sizes and spatial distributions across different temporal periods, soil types, and forest types.
Minor comments:
- Lines 1-3: The general phrase "Soil properties" creates redundancy with the specific variables "Bulk Density and pH," necessitating a more concise revision such as "High-Resolution, Multi-Depth Mapping of Soil Bulk Density and pH in China's Forests Using Machine Learning"
- Lines 20-21: The claim of being 'first' is inaccurate due to the existence of prior 90-m products, so the text should be revised to focus on the specific contribution to forest ecosystems instead.
- Lines 73-74: The phrase "in heterogeneous" is grammatically incorrect.
- Lines 109-111: This sentence is redundant and should be deleted.
- Lines 111-112: Specify the quality control and data harmonization methods.
- Lines 139-155: The 41 environmental covariates lack necessary citations, and the sources or the data itself should be made accessible to readers to ensure reproducibility.
- Lines 159-160: The number of standardized soil layers should be corrected from four to five.
- Lines 222-223: 𝑞50denotes the median prediction.
- Lines 429-430: The use of "first" is an absolute claim that is prone to dispute.
Viewed
| HTML | XML | Total | Supplement | BibTeX | EndNote | |
|---|---|---|---|---|---|---|
| 380 | 57 | 25 | 462 | 39 | 23 | 22 |
- HTML: 380
- PDF: 57
- XML: 25
- Total: 462
- Supplement: 39
- BibTeX: 23
- EndNote: 22
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
The manuscript proposes a high-resolution forest-specific mapping approach for predicting soil bulk density and pH across China. It presents a substantial body of work and addresses a topic of interest, which has the potential to contribute to the field. However, in my opinion, the current manuscript requires major revision before it can be considered for publication. Below are my major concerns:
First of all, after reading the Introduction, I wasn’t fully convinced of the necessity and urgency of this study. The Introduction section begins with very basic background information on forest soil, which is too general to establish a compelling rationale. The excessive introduction about methodology doesn’t effectively build a case for the study’s significance, either. For instance, the entire second paragraph is basically saying “a lot of people have done this”, which may justify methodological reliability but not why this work is needed. The fourth paragraph focuses on the historical development of methodologies, which isn’t the main goal of an Introduction. While building a nationwide forest soil profile database is potentially valuable, the current Introduction does not sufficiently highlight how this study advances beyond simply extracting forest-covered data from existing maps.
Similarly, in the Result section, the authors keep emphasizing that their “patterns align with former maps”, which further raises questions about the novelty and importance of this work. Some findings are presented without statistical validation and therefore unconvincing. For example, L255 “BD prediction accuracy...peaking at intermediate depths (15–30 cm: MEC = 0.657) with lower accuracy in surface layers (0–5cm: MEC = 0.598) and deep layers (60–100 cm: MEC = 0.656)”. Without testing for statistical significance, how can 0.656 represent “lower accuracy” compared to 0.657? Similarly, statements such as “all predictions maintained negligible bias (|ME| ≤ 0.019) across depth intervals” lack a defined threshold for “negligible”. Descriptions like “Conversely, pH predictions demonstrated superior accuracy: CV maintained strong performance across depths” appear subjective, without definition for “superior” or “strong”. Likewise, many descriptions are excessive and repetitive (eg., L268-270, L274-279), which obscure the main message.
Additionally, abbreviations (including BD, SD and the abbreviations of models) in Tables and Figures should be clearly defined in their captions to make them self-explanatory. Why is FRFS introduced in the Introduction section but QRF in the Method? Table 1 may be presented more clearly as a figure, and currently has a confusing caption. The statement in L271 “BD values increase from the coast inland” is unclear. Figure 6 might benefit from an overall analysis across depths, and consider adding relationships between BD and MAP (or other key covariates) in supplementary materials. L85 & 91, QRF should be explained upon its first mention. L111 is redundant with L108. L251, rephrase “conversely”.
Overall, the manuscript is informative and holds value but requires further refinement. The authors are encouraged to more clearly emphasize the importance and novelty of their work, revise redundant descriptions in Results while focusing on demonstrating statistical significance. With careful revision, this manuscript has considerable potential to make a meaningful contribution to the field.