the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Soil organic carbon maps and associated uncertainty at 90 m for peninsular Spain
Abstract. Human activities have significantly disrupted the global carbon cycle, leading to increased atmospheric CO2 levels and altering ecosystems' carbon absorption capacities, with soils serving as the largest carbon reservoirs in terrestrial ecosystems. The complexity and variability of soil properties, shaped by long-term transformations, make it crucial to study these properties at various spatial and temporal scales to develop effective climate change mitigation strategies. However, integrating disparate soil databases presents challenges due to the lack of standardized protocols, necessitating collaborative efforts to standardize data collection and processing to improve the reliability of Soil Organic Carbon (SOC) estimates. This issue is particularly relevant in peninsular Spain, where variations in sampling protocols and calculation methods have resulted in significant discrepancies in SOC concentration and stock estimates. This study aimed to improve the understanding of SOC storage and distribution in peninsular Spain by focusing on two specific goals: integrating and standardizing existing soil profile databases, and modeling SOC concentrations (SOCc) and stocks (SOCs) at different depths using an ensemble machine-learning approach. The research produced four high-resolution SOC maps for peninsular Spain, detailing SOCc and SOCs at depths of 0–30 cm, 30–100 cm and the effective soil depth, along with associated uncertainties. These maps provide valuable data for national soil carbon management and contribute to compiling Spain's National Greenhouse Gas Emissions Inventory Report. Additionally, the findings support global initiatives like the Global Soil Organic Carbon Map, aligning with international efforts to improve soil carbon assessments. The soil organic carbon concentration (g/kg) maps for the 0–30 cm and 30–100 cm standard depths, along with the soil organic carbon stock (tC/ha) maps for the 0–30 cm standard depth and the effective soil depth, including their associated uncertainties, —all at a 90-meter pixel resolution— (SOCM90) are freely available at https://doi.org/10.6073/pasta/48edac6904eb1aff4c1223d970c050b4 (Durante et al., 2024).
- Preprint
(2056 KB) - Metadata XML
-
Supplement
(874 KB) - BibTeX
- EndNote
Status: open (until 04 Feb 2025)
-
RC1: 'Comment on essd-2024-431', Anonymous Referee #1, 06 Dec 2024
reply
The manuscript by Durante et al., submitted to ESSD, is an interesting contribution, particularly due to its impressive dataset on SOC concentration and stocks. However, the modelling approach is not robust and requires significant rethinking. There is abundant literature on the mapping of soil properties and spatial model ensembles, yet it is unclear why the authors have disregarded this body of work. I did not review the results section because the mapping and modelling steps lack rigor and do not make sense. Authors should get help from a digital soil mapping and spatial modelling expert.
Specific comments:
- Inconsistent data points (L. 185)
How are the authors classifying a data point within a pedogenetic horizon as “inconsistent”? Please clarify the criteria used. - SOC data transformation (L. 189-191)
On the one hand, the authors apply a log-transformation to the SOC data, on the other, they remove SOC data from organic soils. This is contradictory and lacks a clear rationale. Why were data from organic soils excluded? This is not a common practice, and the reasoning should be explicitly stated. Additionally, if data from organic soils were excluded, does this mean no predictions were made for organic soils? Please confirm because the figures of the results show prediction for all soils. - Conversion factor (L. 198-200)
The manuscript should specify how many data points were converted using the factor mentioned. Note that this conversion factor has been widely criticized within the scientific community for being overly general. - Representativeness (Section 2.1.3)
The term “representativeness” is poorly defined in the context of this study, and the entire section lacks coherence. Why are the authors using techniques designed for point patterns when the soil data are not a point pattern? The use of Maxent to evaluate the “representativity” of the data is unclear, especially since other models are used later in the study. What exactly are the authors trying to achieve with this analysis? There has been studies looking at the area of applicability of spatial models. - Data input for ML (L. 293-294)
The described step seems outdated, as most modern machine learning (ML) techniques can handle both categorical and continuous datasets as input without requiring separate preprocessing. - Bayesian analysis
Bayesian analysis and Bayesian calibrations are techniques for updating parameter distributions and fitting models, not models themselves. Which specific model was used in the Bayesian analysis? This should be explicitly stated. The three techniques for variable selection could be removed and merged with the modelling step, because the optimal variable set depends on the model. - Model selection and ensemble approach
The modeling approach is unclear. The authors used three models (QRF, EML, and AutoML) combined into an ensemble. However, one of these (QRF) is itself an ensemble of random forests. How was the ensemble constructed? Additionally, the validation step using cross-validation should be applied consistently across all three models. For the final prediction, was the ensemble constructed from all models, or was it fitted to all available data points? Please clarify. - Uncertainty estimates (L. 406)
Some models, such as QRF, return prediction intervals, while others, such as EML, likely return confidence intervals. What uncertainty measures are reported for each model? Additionally, how was the standard deviation derived from the WRF distribution? More details are needed here. - Ensemble uncertainty (L. 408-411)
The method proposed for handling uncertainty is statistically flawed. Selecting the pixel with the lowest standard deviation from different models is incorrect. Model ensembles should be constructed using specific techniques that integrate predictions from multiple models. Accurately representing uncertainty across models is more complex than the proposed approach. - Cross-validation vs. data splitting (Figure 3)
Cross-validation should be used instead of data splitting for model evaluation. - R² calculation
How was the R² calculated? Please provide details about the method used.
Citation: https://doi.org/10.5194/essd-2024-431-RC1 - Inconsistent data points (L. 185)
-
RC2: 'Comment on essd-2024-431', Anonymous Referee #2, 05 Jan 2025
reply
Authors used 8, 361 soil profile samples and data of multiple environmental factors to create a digital map of SOC concentration (0-30 cm, and 30-100 cm) and stocks (0-30 cm, and effective soil depth) for peninsular Spain at 90-m resolution. Authors state that they used ensemble machine learning approach to generate SOC estimates and it’s associated uncertainty.
Numerous SOC maps at various resolutions have been published both globally and nationally. However, the authors fail to mention recent advancements in ensemble machine learning-based SOC mapping efforts in the Introduction section. I recommend that the authors thoroughly review the existing SOC DSM literature and clearly identify the knowledge gap that this manuscript aims to address. Additionally, in the appropriate section, the authors should compare their maps with existing SOC estimates, including those from Spain (I remember reviewing earlier SOC mapping study from Spain), and report the findings appropriately. Throughout the manuscript, several abbreviations are repeated multiple times; the authors should carefully review and minimize redundancy. Overall, I find this work incomplete and out of place, as it does not appropriately engage with the existing literature on this important topic.
Abstract: I didn’t find the Abstract focused, informative or structured. No information about sample size, methodological details, and prediction accuracy exist in this abstract. Too many irrelevant details which should be in materials & methods section are provided in the abstract. The text from L17-26 are unnecessary in the Abstract and should be deleted. Also Abstracts should end with a sentence stating who can use the generated information from this study, and not a self-citation. I encourage authors to read some good quality SOC DSM papers and rewrite the Abstract accordingly.
Introduction: Introduction section should summarize the existing literature on the topic of investigation and state clearly the existing knowledge gaps in current efforts. Authors should properly cite and discuss the findings of existing SOC DSM literature, specifically those studies which has used ensemble machine learning approach in other parts of the world. This study is not the first to use this approach and proper appreciation of existing literature is needed. Current Introduction suggests authors are unaware of recent developments in DSM SOC which uses ML techniques.
Materials and methods:
L179-193: This section is confusing and needs to be properly rewritten. I think Histosols are also soil types. So if authors want to report SOC stocks of peninsular Spain, Histosols must be part of it. If authors want to report SOC only in mineral soils of Spain, then that can’t be the total SOC stocks of Spain as it is presented currently, and authors should clearly mention this in relevant sections of the manuscript.
L214-221: How many samples were not included in the modeling? Was any gap filling approach employed in this study?
L303-350: This section is not relevant to ML approach. Current ML algorithms can take into account of categorical, continuous, and correlated variables.
L373: MLR is incorrectly abbreviated here. MLR is not ML approach and should not be included in the ensemble ML approach.
L351: Section 2.2.2 is confusing. Please rewrite and mention which specific models were included in the model ensemble approach applied in this study. I am surprised to see non ML methods such as MLR included/mentioned here, as I thought this manuscript was using ensemble ML approach. The current write-up suggests author used only two ML approaches (QRF and AutoML), in doing so authors can not produce a robust ML ensemble, and thus the interquartile range.
L416-426: Authors attempt to highlight a lot in the manuscript about uncertainty estimates of SOC stocks. But I am surprised to see no robust uncertainty analysis conducted in the text. Authors merely report validation statistics and interquartile range of different approach that they used. I suggest authors to define in the methods section what they mean by the term “uncertainty”. In my knowledge, without proper distributional analyses of each independent variables and SOC stocks using MonteCarlo simulations, no proper uncertainty analysis can be done.
L549: This manuscript does not have results and discussion section. Is this common for this journal? I will not accept this work unless authors provide a robust discussion in an appropriate section, mentioning how their results compare and contrast with the existing SOC literature, which has produced SOC stock estimates using ensemble ML approach.
Citation: https://doi.org/10.5194/essd-2024-431-RC2
Data sets
Soil organic carbon and associated uncertainty at 90 m resolution for peninsular Spain P. Durante et al. https://doi.org/10.6073/pasta/48edac6904eb1aff4c1223d970c050b4
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
215 | 59 | 10 | 284 | 16 | 9 | 10 |
- HTML: 215
- PDF: 59
- XML: 10
- Total: 284
- Supplement: 16
- BibTeX: 9
- EndNote: 10
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1