the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Soil organic carbon maps and associated uncertainty at 90 m for peninsular Spain
Abstract. Human activities have significantly disrupted the global carbon cycle, leading to increased atmospheric CO2 levels and altering ecosystems' carbon absorption capacities, with soils serving as the largest carbon reservoirs in terrestrial ecosystems. The complexity and variability of soil properties, shaped by long-term transformations, make it crucial to study these properties at various spatial and temporal scales to develop effective climate change mitigation strategies. However, integrating disparate soil databases presents challenges due to the lack of standardized protocols, necessitating collaborative efforts to standardize data collection and processing to improve the reliability of Soil Organic Carbon (SOC) estimates. This issue is particularly relevant in peninsular Spain, where variations in sampling protocols and calculation methods have resulted in significant discrepancies in SOC concentration and stock estimates. This study aimed to improve the understanding of SOC storage and distribution in peninsular Spain by focusing on two specific goals: integrating and standardizing existing soil profile databases, and modeling SOC concentrations (SOCc) and stocks (SOCs) at different depths using an ensemble machine-learning approach. The research produced four high-resolution SOC maps for peninsular Spain, detailing SOCc and SOCs at depths of 0–30 cm, 30–100 cm and the effective soil depth, along with associated uncertainties. These maps provide valuable data for national soil carbon management and contribute to compiling Spain's National Greenhouse Gas Emissions Inventory Report. Additionally, the findings support global initiatives like the Global Soil Organic Carbon Map, aligning with international efforts to improve soil carbon assessments. The soil organic carbon concentration (g/kg) maps for the 0–30 cm and 30–100 cm standard depths, along with the soil organic carbon stock (tC/ha) maps for the 0–30 cm standard depth and the effective soil depth, including their associated uncertainties, —all at a 90-meter pixel resolution— (SOCM90) are freely available at https://doi.org/10.6073/pasta/48edac6904eb1aff4c1223d970c050b4 (Durante et al., 2024).
- Preprint
(2056 KB) - Metadata XML
-
Supplement
(874 KB) - BibTeX
- EndNote
Status: open (until 16 Jan 2025)
-
RC1: 'Comment on essd-2024-431', Anonymous Referee #1, 06 Dec 2024
reply
The manuscript by Durante et al., submitted to ESSD, is an interesting contribution, particularly due to its impressive dataset on SOC concentration and stocks. However, the modelling approach is not robust and requires significant rethinking. There is abundant literature on the mapping of soil properties and spatial model ensembles, yet it is unclear why the authors have disregarded this body of work. I did not review the results section because the mapping and modelling steps lack rigor and do not make sense. Authors should get help from a digital soil mapping and spatial modelling expert.
Specific comments:
- Inconsistent data points (L. 185)
How are the authors classifying a data point within a pedogenetic horizon as “inconsistent”? Please clarify the criteria used. - SOC data transformation (L. 189-191)
On the one hand, the authors apply a log-transformation to the SOC data, on the other, they remove SOC data from organic soils. This is contradictory and lacks a clear rationale. Why were data from organic soils excluded? This is not a common practice, and the reasoning should be explicitly stated. Additionally, if data from organic soils were excluded, does this mean no predictions were made for organic soils? Please confirm because the figures of the results show prediction for all soils. - Conversion factor (L. 198-200)
The manuscript should specify how many data points were converted using the factor mentioned. Note that this conversion factor has been widely criticized within the scientific community for being overly general. - Representativeness (Section 2.1.3)
The term “representativeness” is poorly defined in the context of this study, and the entire section lacks coherence. Why are the authors using techniques designed for point patterns when the soil data are not a point pattern? The use of Maxent to evaluate the “representativity” of the data is unclear, especially since other models are used later in the study. What exactly are the authors trying to achieve with this analysis? There has been studies looking at the area of applicability of spatial models. - Data input for ML (L. 293-294)
The described step seems outdated, as most modern machine learning (ML) techniques can handle both categorical and continuous datasets as input without requiring separate preprocessing. - Bayesian analysis
Bayesian analysis and Bayesian calibrations are techniques for updating parameter distributions and fitting models, not models themselves. Which specific model was used in the Bayesian analysis? This should be explicitly stated. The three techniques for variable selection could be removed and merged with the modelling step, because the optimal variable set depends on the model. - Model selection and ensemble approach
The modeling approach is unclear. The authors used three models (QRF, EML, and AutoML) combined into an ensemble. However, one of these (QRF) is itself an ensemble of random forests. How was the ensemble constructed? Additionally, the validation step using cross-validation should be applied consistently across all three models. For the final prediction, was the ensemble constructed from all models, or was it fitted to all available data points? Please clarify. - Uncertainty estimates (L. 406)
Some models, such as QRF, return prediction intervals, while others, such as EML, likely return confidence intervals. What uncertainty measures are reported for each model? Additionally, how was the standard deviation derived from the WRF distribution? More details are needed here. - Ensemble uncertainty (L. 408-411)
The method proposed for handling uncertainty is statistically flawed. Selecting the pixel with the lowest standard deviation from different models is incorrect. Model ensembles should be constructed using specific techniques that integrate predictions from multiple models. Accurately representing uncertainty across models is more complex than the proposed approach. - Cross-validation vs. data splitting (Figure 3)
Cross-validation should be used instead of data splitting for model evaluation. - R² calculation
How was the R² calculated? Please provide details about the method used.
Citation: https://doi.org/10.5194/essd-2024-431-RC1 - Inconsistent data points (L. 185)
Data sets
Soil organic carbon and associated uncertainty at 90 m resolution for peninsular Spain P. Durante et al. https://doi.org/10.6073/pasta/48edac6904eb1aff4c1223d970c050b4
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
142 | 40 | 7 | 189 | 9 | 2 | 3 |
- HTML: 142
- PDF: 40
- XML: 7
- Total: 189
- Supplement: 9
- BibTeX: 2
- EndNote: 3
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1