Soil information and soil property maps for the Kurdistan region, Dohuk governorate (Iraq)

Bellat, Mathias; Zebari, Mjahid; Glissman, Benjamin; Rentschler, Tobias; Sconzo, Paola; Kakhani, Nafiseh; Taghizadeh-Mehrjardi, Ruhollah; Kohsravani, Pegah; Brifany, Bekas; Pfälzner, Peter; Scholten, Thomas

doi:10.5194/essd-2025-418

Preprints

https://doi.org/10.5194/essd-2025-418

Preprints

15 Sep 2025

| 15 Sep 2025

Status: a revised version of this preprint was accepted for the journal ESSD and is expected to appear here in due course.

Soil information and soil property maps for the Kurdistan region, Dohuk governorate (Iraq)

Mathias Bellat, Mjahid Zebari, Benjamin Glissman, Tobias Rentschler, Paola Sconzo, Nafiseh Kakhani, Ruhollah Taghizadeh-Mehrjardi, Pegah Kohsravani, Bekas Brifany, Peter Pfälzner, and Thomas Scholten

Abstract. We present the first detailed soil property maps at multiple depths for the northwestern autonomous Kurdistan region of Iraq (Dohuk). A total of 532 soil samples from 122 sites were collected at five depth increments (0–10, 10–30, 30–50, 50–70, and 70–100 cm), and their mid-infrared (MIR) spectra were measured. A subset of 108 samples, selected via Kennard–Stone sampling, was analysed in a laboratory on ten soil properties. A Cubist model was trained and used from these measured values to predict all samples’ soil properties from their MIR spectra. Digital soil mapping was conducted using various machine learning regression techniques (ensemble learning, linear classifier, nearest neighbour classifier, decision trees), trained on the predicted soil properties and using a total of 85 covariates at 25 m pixel resolution, resulting in 50 prediction maps in total. Results were compared with the SoilGrids 2.0 product and a regional texture model. Soil depth was also mapped using a quantile random forest with 26 covariates. Our regional model outperformed global SoilGrids 2.0 predictions in resolution and accuracy, with texture RMSEs (sand: ∑RMSE = 9.35; silt: ∑RMSE = 6.8; clay: ∑RMSE = 10.28) comparable to local models. Quantile random forest achieved the best performance in 51 % of the models, and key predictors included Sentinel 2 SWIR, EVI, NDVI, and SAVI. Spatial patterns reflected the contrast between the flat areas of the Simele and Zakho plains, as opposed to the shallower and steeper Little Khabur Valley and anticline formations. Furthermore, the soil depth prediction model (R² = 0.57; RMSE = 2.59 cm^-0.5) showed strong correlation with slope and a similar pattern distribution with deeper soils in the flat areas of the Simele and Zakho plains, while shallow soils are visible in the anticline and strongly erodible areas. Our comprehensive dataset (Bellat et al., 2024a, b, c, d, 2025) offers substantial insights for soil knowledge in the region, as well as for aridic and semi-aridic areas.

Received: 18 Jul 2025 – Discussion started: 15 Sep 2025

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Preprint (PDF, 47253 KB)

Supplement (44835 KB)

Download & links

Preprint (47253 KB)
Metadata XML
Supplement (44835 KB)
BibTeX
EndNote

Status: closed

RC1:
'Comment on essd-2025-418', David G. Rossiter, 29 Oct 2025

essd-2025-418 "Soil information and soil property maps for the Kurdistan region, Dohuk governorate (Iraq)"

Bellat et al.

Review by D G Rossiter 29-Oct-2025
Summary: This exceptionally-thorough and well-explained data paper presents details of the soils in the named region based on survey and models. It used modern methods (inference from MIR spectroscopy) as part of the soil properties determination. From this dataset a standard modern digital soil mapping (DSM) exercise was carried out to produce property maps over the study area. The maps were compared to the global SoilGrids v2.0 maps and, not at all surprisingly, had significantly better point evaluation metrics. All results and workflows are available under the FAIR concept. This paper can be a reference for how such a study can be carried out.
Major Comments:
1. I appreciate the thorough review of previous mapping efforts in the region, it is good to have these listed for reference. The brief review of major pedogenetic procesess is also appreciated. Similarly for the tectonic development, it places the study within context. The study's motivation is clear. Adherence to FAIR standards is appreciated. The entire workflow, all sources and products, are available, with DOI, and explained.
2. The Conclusions mainly repeat the Abstract and sections of the Discussion. I would appreciate a broader conclusion about the success of this study, the applicability of this kind of study to similar regions, the issues of global vs. local models, the main limitations to this kind of study, the importance of reproducibility and FAIR, etc. That is, after doing all this work, what do you conclude about the project?
3. Did you consider DSM for soil classes? Perhaps using a DSMART-like approach with your additional observations? This could be compared with Fig. 6. Obviously that is not to do in the paper, but was it considered and if so, why not attempted? Related to this, it is not clear how the soil class (not classification) map (Figure 6) was created. It's implied that this was expert judgement supplemented by observations, but it's not explicit. Also see comments below re: L443.
4. Can you comment on the realism of patterns as seen in Figures 9--14? We have the point evaluation statistics, but the map shows a landscape. Do the elements we see there correspond to reality, of course by expert judgement? Are the fine details revealed by the 25 m resolution realistic or artefacts?
Detailed Comments:
The WRB 2006 has been replaced by WRB 2022: IUSS Working Group WRB: World Reference Base for Soil Resources. International soil classification system for naming soils and creating legends for soil maps, 4th ed., IUSS, Vienna, Austria, 234 pp., 2022. However I think the definitions used in this paper have not changed.
L179-80 RUSLE: how were the parameters calibrated? Were they from one of the earlier (cited) studies? Especially the K value.
L227 "the different index" -> "the different indices"
L233 "We performed a standardisation of the predicted values of the texture on 100 % with TT.normalise.sum function (Moeys et al., 2024) and a additive-log ratio transformation (Aitchison, 1986) with the alr function (Tsagris et al., 2025)." This is not clear. Was the normalization following the MIR inference/wet lab measurements? And then were the alr variables used in the mapping, followed by back-transformation (as is done in SoilGrids v2.0)?
L235 "close to a normal distribution Liu et al. (2022)" -> "close to a normal distribution (Liu et al. ,2022)"
L236 "2021" refers to what?
L258 "relative "simple"" -> "relatively simple"
\S2.4.3 and throughout the paper: what is meant by "soil depth"? Is it the solum (zone of pedogenesis) or to bedrock/completely unweathered parent material? This might be better termed "thickness" but "depth" is indeed commonly used. L365 "shallow and deep profiles" implies only the solum, is this correct?
L295 the correct reference for PICP is Eq. (2) of Malone (2011) not 2017. The formula is not found in Malone 2017. Malone, B. P., McBratney, A. B., and Minasny, B.: Empirical estimates of uncertainty for mapping continuous depth functions of soil attributes, Geoderma, 160, 614–626, https://doi.org/10.1016/j.geoderma.2010.11.013, 2011. This equation and the others need definitions of the symbols, although some are standard. For PICP what is "v"? I learn from Malone 2011 it is "he number of observations in the validation [better, evaluation] dataset". What is "PL"? L, U as lower, upper limits can be inferred. Finally, the description "we used the prediction interval coverage probability to evaluate the corresponding prediction within an interval" is not clear. The Malone 2011 description is, to me, clearer: "the PICP is the probability that all observed values fit within their prediction limits".
L309 It's interesting that silt is so poorly predicted, yet most of these soils are on the silty edge of the texture triangle. And, clay and sand are in Category A and B. Can you explain why the poor result for silt, even though there is a lot of it and with a good range in these samples? This is mentioned on L387.
L346 "river bakns" -> "river banks". Spell-check.
\S4.3 Another interesting comparison with SG2 would be the prediction ranges. SG2 likely smooths more than this study, see Table 7 where the Q1-Q3 range is always much narrower. This can be brought out in the text -- the interesting discussion is about global vs. local models. The SG2 maps are much more uniform than the maps from this study.
L387 "should be interpreted with caution—consistent"... with what?
L392 "Abdulrahman et al. 2020 " -> "Abdulrahman et al. (2020)". L390 maybe make it explicit here that this is not a DSM product, rather an expert updating from field work and manual interpretation of remote sensing products (correct?).
L401 the Hazelton & Murphy guidelines are for conventional mapping, not DSM. They are expressed in terms of map scale and cm^2 of printed map. Here the product is digital at 25 m resolution. How is the density here converted to match these guidelines? The argument about cLHS is much more relevant for DSM using machine learning from covariates.
L434 formatting problem with the URL https://mathias-bellat.shinyapps.io/Northern-Kurdistan-map, which goes over a line break so gives a 404 error if not manually adjusted
L443 "shallower resolution " -> "higher resolution"? And what is that resolution? It's nowhere stated. L435 says 1:200 000 scale, which implies polygons with a minimum legible delineation (MLD) of 160 ha (0.4 cm^2 on the map). But L390 "The updated soil classification map (Figure 6) must be interpreted with care, specially at micro-scale (<1:50,000)..." implying a smaller MLD. Figure 6 suggest that this is a polygon map.
Figure 3 the inset showing the region is not needed, that has already been shown in Figure 1 and can be found from the coordinates on the main map.
Figure 8 bicolor key y-axis partially obscured

Citation: https://doi.org/10.5194/essd-2025-418-RC1
- AC2: 'Reply on RC1', Mathias Bellat, 20 Dec 2025
  
  We would like to thank Reviewer #1 for his most welcome comments. The improvements from his suggestion will substantially enhance the manuscript's quality and scientific correctness.
  All the questions have been answered in detail in the attached file.
  
  Best regards,
  Mathias Bellat, on behalf of all the co-authors.
  
  Citation: https://doi.org/10.5194/essd-2025-418-AC2
- AC6: 'Reply on RC1 #2', Mathias Bellat, 18 Feb 2026
  
  Following the referee #4's comment, major changes have induced adaptation in the manuscript. You will find here some modifications regarding previous comments.
  "L233 "We performed a standardisation of the predicted values of the texture on 100 % with TT.normalise.sum function (Moeys et al., 2024) and a additive-log ratio transformation (Aitchison,1986) with the alr function (Tsagris et al., 2025)." This is not clear. Was the normalization following the MIR inference/wet lab measurements? And then were the alr variables used in the mapping, followed by back-transformation (as is done in SoilGrids v2.0)?"
  We removed this part on the scaling of the covariates of the DSM due to the adaptation of our methodology (cf. referee #4 comment).
  "\S4.3 Another interesting comparison with SG2 would be the prediction ranges. SG2 likely smooths more than this study, see Table 7 where the Q1-Q3 range is always much narrower. This can be brought out in the text -- the interesting discussion is about global vs. local models. The SG2 maps are much more uniform than the maps from this study."
  Changes have been made in this section. SoilGrid 2.0 is now compared to our model without the SD “harmonisation” – removal of the value outside of the SD – and the section has been re-written in consequence of the newly computed methods.
  "To assess model performance, we compared our results with the global SoilGrids 2.0 product (Poggio et al., 2021), focusing on pH, OC, Nt, and texture attributes (Table 7), for three generalised depth intervals (0-30 cm, 30-60/70 cm, and 60/70-100 cm). We scaled our predictions to match the 250 x 250 m resolution of SoilGrids 2.0, with a bilinear method from the terra package (Hijmans et al., 2025).
  Our models predicted higher values of OC and Nt, with respective increases of ≈ 1000%and 300%over those from SoilGrids 2.0. Predicted sand values were also higher (by ≈ 25%), while clay values were slightly lower (by ≈ 15%). The differences in silt and pH values were negligible, with a 3% higher value for silt and 5% lower for pH in our predictions compared to SoilGrids 2.0.
  
  The standard deviation of the SoilGrids 2.0 product is smaller, except for the pH, than for our prediction models (Table 7). This shows a narrower distribution of values, likely 455 due to the wide range of input data used for the SoilGrids 2.0. The diversity of soil types and input data at the global scale makes the SoilGrids 2.0 model respond relatively homogeneously at the regional scale. Furthermore, the SoilGrids 2.0 product shows a more skewed distribution for OC and Nt, with a higher concentration of values near the lower end of the distribution, which is consistent with the known underestimation of these properties in global models (Shi et al., 2025).
  
  We also compared the evaluation metrics of our predicted values with those obtained from SoilGrids 2.0 on an independent data set (Appendix F). Before training the models on a full data set, we splited the data retaining 20% for test and 80% for training. The models were trained in similar conditions as our main prediction models (cf. 2.4.2), before evaluation were computed on the independant test set. Overall, our models outperformed SoilGrids 2.0 across all evaluation metrics for pH, Nt, OC, sand, silt, and clay at all depth intervals. The only exception was the sand QRF model RMSE score at the 10–30 cm depth interval, which was slightly higher for the SoilGrids 2.0 model at the 15-30cm interval.”
  
  Citation: https://doi.org/10.5194/essd-2025-418-AC6
RC2:
'Comment on essd-2025-418', Anonymous Referee #2, 14 Nov 2025

This manuscript focuses on regional digital soil mapping in Iraq, using 532 soil samples and 85 covariates to produce soil maps via machine learning. While the modeling approaches are generally appropriate, the work falls short in two critical aspects: (1) Limited Geographical Scope: The investigated region is quite small. Consequently, the resulting dataset has limited implications and applicability for the broader scientific community, despite its location is in Iraq. (2) Limited Novelty: The modeling framework adopted is standard practice in digital soil mapping and lacks significant methodological novelty. Given these limitations, specifically the dataset's limited scope and the conventional nature of the modeling, this work does not meet the high standards for originality and impact required for publication in Earth System Science Data.

Citation: https://doi.org/10.5194/essd-2025-418-RC2
- AC1:
  'Reply on RC2', Mathias Bellat, 21 Nov 2025
  We sincerely appreciate the time referee 2 took to read the preprint and highlight the adapted modelling approach in our paper. The reviewer identified two critical aspects of our preprint.
  1) Indeed, the studied area (2,280 km2) is “relatively small” regarding other datasets available in ESSD. However, in other case, regional to local data are also available (e.g. Lorenz et al., 2021; Ardizzone et al., 2023; Błaszczyk et al. 2024). We do think that high-quality regional datasets are necessary to feed and improve other larger datasets. Furthermore, as referee 2 expressed, data on the Iraq region are critically lacking. No regional data set – from any kind of observations - is available on Iraq in the whole ESSD (accessed on 14/11/2025). We do think that underrepresented regions of the globe do need and deserve high-quality, standardised data, as the one proposed in this paper and, more generally, in ESSD. Qualitative data presented in the preprint (soil classes map) is also hardly expendable at a large scale, as regional patterns can not always be transposed. Finally, the comparison with the SoilGrid.2.0 product used in the study also highlights the poor quality of such global products when dealing with local problems. Henceforth, we do think a high-quality local dataset is needed and would also demonstrate the scientific interest of major reviews, such as ESSD, for a scientifically under-studied country.
  2) When mentioning the lack of novelty in the approach, we do understand the criticisms of referee 2, as no “new” method is developed. However, we do think the novelty lies in the combination of known techniques and our unique pipeline/workflow. This study is fully reproducible from the sampling strategy to the final map produced. By combining the sampling strategy, campaign results, FTIR and laboratory measurements, FTIR model predictions, and DSM models, we propose a unique new approach inspired by Malone et al. (2022) but never applied in real conditions at a regional scale.
  We do hope that these answers will incite referee 2 to reconsider the reasons for our application to ESSD journal.
  
  References used:
  Ardizzone, F., Bucci, F., Cardinali, M., Fiorucci, F., Pisano, L., Santangelo, M., & Zumpano, V. (2023). Geomorphological landslide inventory map of the Daunia Apennines, southern Italy. Earth System Science Data,15(2), 753–767. https://doi.org/10.5194/essd-15-753-2023
  
  Błaszczyk, M., Luks, B., Pętlicki, M., Puczko, D., Ignatiuk, D., Laska, M., Jania, J., & Głowacki, P. (2024). High temporal resolution records of the velocity of Hansbreen, a tidewater glacier in Svalbard. Earth System Science Data, 16(4), 1847–1860. https://doi.org/10.5194/essd-16-1847-2024
  
  Lorenz, C., Portele, T. C., Laux, P., & Kunstmann, H. (2021). Bias-corrected and spatially disaggregated seasonal forecasts: A long-term reference forecast product for the water sector in semi-arid regions. Earth System Science Data, 13(6), 2701–2722. https://doi.org/10.5194/essd-13-2701-2021
  
  Malone, B., Stockmann, U., Glover, M., McLachlan, G., Engelhardt, S., & Tuomi, S. (2022). Digital soil survey and mapping underpinning inherent and dynamic soil attribute condition assessments. Soil Security, 6, 100048. https://doi.org/10.1016/j.soisec.2022.100048
  
  Citation: https://doi.org/10.5194/essd-2025-418-AC1
RC3:
'Comment on essd-2025-418', Bas Kempen, 23 Jan 2026
REVIEW RESULTS
A comprehensive paper in a critically under represented geographical area when it comes to soil profiles/ digital soil mapping. Care has been taken with the landscape characterisations including tectonic development and parent material climate and vegetation and geomorphology and soils, as well as maps and photographs to allow the reader to really understand the study area. While the methods are not necessarily ‘new’ themselves, it is an important application of state of the art methods, the novel part of this study is the study area. The output maps are compared to SoilGrids, a global model, with inputs from WoSIS, the study mentions that WoSIS has low sampling density in this area, highlighting the need for such studies, in addition to the scarcity of other options in the area.
I agree with reviewer 1 that this is a well written and thorough study. This study is a good example of adhering to FAIR and open metadata standards, for not only the data but the methodology, and that is commendable. The methods are comprehensively described, and fit their purpose well. All methodological aspects including the code are not only made available following FAIR principles, but thoroughly documented and explained at https://mathias-bellat.github.io/DSM-Kurdistan/digital-soil-mapping.html#visualisation-and-comparison-with-soilgrid-product, creating a fine example of a fully reproducible study and the input and output data themselves are a much needed addition to more or less non-existent openly available soil data in the area. With this, this manuscript fits well within the scope of ESD.
One of the reviewers mentioned the limited geographic scope of the paper. I do agree that this scope is limited but having a DSM study published for a region like Kurdistan (Iraq) is worthwhile and to me an a welcome addition to the body of literature on this topic. Especially given that the authors make their results as well as data open from which other DSM efforts (e.g. SoilGrids) can profit. This is much appreciated.
Having said this, I do have a few comments and questions regarding the manuscript, particularly concerning the methodologies, which in my view require some further explanation and clarification. I encourage the authors to address these points, after which I would recommend publication of this article in ESSD.
SPECIFIC COMMENTS
Main comments
Lines 230-231: Could the authors explain the decision to model each depth layer separately instead of developing one model per property with using the depth as an explanatory variable? Modelling each depth separately is a valid approach, but I would like to understand the reason why the authors took this approach.
Lines 246-247: I do not understand the rationale for combining data splitting (80/20) with (repeated) cross validation. Cross-validation already produces an independent prediction for each data point, from which accuracy metrics can be calculated. What was the motivation for embedding CV within a data-splitting framework? And how does this then work? If CV was performed on the 80% training subset, this would give CV-predictions only for these points. I wonder how predictions for the 20% test set were obtained. Which trained model was used to predict at the points in the test set? In case of normal data splitting this would be the model trained on the training dataset. However, by running CV on the training dataset there is no single trained model but multiple (here 10) fold-specific models.
In addition, the manuscript states that CV was repeated three times? While repetition may improve robustness when data are limited or when using a small number of folds, with 10-fold CV I would expect only minimal variation between the repeats?
Overall, the validation approach seems a bit overcomplicated. The authors may well have had sound reasons for adopting this approach, but in case the rationale and precise implementation need to be explained more clearly.
Lines 268-273: The description of the ensemble modelling approach is unclear to me and would benefit from additional detail. Specifically, it is not clear which ‘conditions’ (l. 269) are being referred to, what criteria were used to select ‘the best one’, and how the individual model predictions were combined in the ensemble? While relevant literature is cited, I believe that a few additional lines better outlining the implementation of the ensemble approach would improve clarity to the reader.
Other comments
Line 10: The summation signs should be removed I believe.
Line 10-11: Reference is made to ‘local models’ (compared to the regional models the authors developed and the SoilGrids global model). It is unclear to me where the ‘local models’ refer to and what the basis is for the claim of the authors.
Line 14: I believe the minus sign in the superscript should be removed, assuming the RMSE values are reported for the transformed depth data. After applying the square-root transformation, the depth unit becomes cm^0.5. The unit of the MSE metric would then be cm, and taking the square root to obtain the RMSE would again give value in cm^0.5. Table 5 also reports the RMSE unit in cm^0.5. I assume the unit of the MAE (Table 5, l. 337) is also cm^0.5? The ‘-0.5’ superscript should be removed I believe (same for line 337).
Line 148: Reference is made to WRB 2006. Can the authors confirm if this was also given that there are more recent versions of the WRB? The latest from 2022 I believe.
Line 178: “potential soil layer” - inconsistent naming – previous paragraph is “potential soil properties”
Line 185: What are the ‘layers’ here? Are these the soil horizons or are these the layers for which samples were collected (l. 186)? Please clarify.
Line 187: How is ‘topsoil’ defined here? Is this the 0-10 layer? Explain a bit more how the ring samples were taken. E.g. where was the ring sample was taken in the topsoil layer: from the top, in the middle of the layer …?
Lines 236-237: References seem to be incomplete. Only years are mentioned (2021,2022)
Line 243: Could expand on why RFE was not restrictive enough/ restrictive enough for what?
Line 372: Why were SoilGrids extremes removed?
Line 401: Limitations are addressed in the paper – I am not sure if the point density, even though notably higher than WoSIS, still warrants mapping at 25m, the reference to Hazelton and Murphy (referencing cartographic scales) seems a bit of a jump. Instead of referencing Hazelton and Murphy I would rather compare to other regional DSM studies.
The Conclusion section reads like an abstract. Reviewer 1 already commented on this and based on that, the cauthors revised the text and I believe with that revision this comment is addressed sufficiently.
Figure 2: What is a negative site? – soil samples
Technical corrections
Line 21: influence local ecosystems -> influence on local ecosystems

Line 28: includes -> include

Line 33: gives information on its ability to fit or not for agricultural purposes, but also to better understand -> gives information on its ability to fit agricultural purposes and helps to better understand

Line 35: Governate -> governate (not capitalised anywhere else in the paper)

Line 70: cluster -> conditional

Line 75 & 265: Hengl and Robert -> Hengl and MacMillan

Line 74: a raw -> one raw

Line 129: climax -> climate

Line 255: McBradney -> McBratney

Line 229: remove ‘part’

Line 234: a additive -> an additive

Line 247: state of the art the art -> state of the art models

Line 270: approached -> approach

Line 346: bakns -> banks

Line 388: remove ‘consistent’

Line 401: Hazelton and Murphy pg5 -> pg4

Line 410: LU/C -> land use/cover (it is not used previously)

Line 425: profiles depth measurement -> profile depth measurements

Line 441: world -> global

Line 443: shallower resolution -> higher resolution

Table 3: Modis brightness index in the wrong column

Appendix B: units column could be named differently. It contains a mixture of units, formulas, ranges.
Citation: https://doi.org/10.5194/essd-2025-418-RC3
- EC1: 'Note to Authors - Reply on RC3', Giulio G.R. Iovine, 26 Jan 2026
  
  Note to Authors
  According to Referee Bas Kempen, the following line on the RSME metric unit should be disregarded in his comments:
  <I believe the superscript “-0.5” should be removed (the same applies to line 337).>
  
  Citation: https://doi.org/10.5194/essd-2025-418-EC1
- AC3: 'Reply on RC3', Mathias Bellat, 27 Jan 2026
  
  We would like to thank the recommender #3 for his comments, which took into account the previous modification. We are confident that these comments will enhance the overall quality of the manuscript.
  All the questions have been answered in detail in the attached file.
  
  Best regards,
  Mathias Bellat, on behalf of all the co-authors.
  
  Citation: https://doi.org/10.5194/essd-2025-418-AC3
- AC7: 'Reply on RC3 #2', Mathias Bellat, 18 Feb 2026
  
  Following the referee #4's comment, major changes have induced adaptation in the manuscript. You will find here some modifications regarding previous comments.
  Lines 246-247: I do not understand the rationale for combining data splitting (80/20) with (repeated) cross validation. Cross-validation already produces an independent prediction for each data point, from which accuracy metrics can be calculated. What was the motivation for embedding CV within a data-splitting framework? And how does this then work? If CV was performed on the 80% training subset, this would give CV-predictions only for these points. I wonder how predictions for the 20% test set were obtained. Which trained model was used to predict at the points in the test set? In case of normal data splitting this would be the model trained on the training dataset. However, by running CV on the training dataset there is no single trained model but multiple (here 10) fold-specific models.
  In addition, the manuscript states that CV was repeated three times? While repetition may improve robustness when data are limited or when using a small number of folds, with 10-fold CV I would expect only minimal variation between the repeats?
  Overall, the validation approach seems a bit overcomplicated. The authors may well have had sound reasons for adopting this approach, but in case the rationale and precise implementation need to be explained more clearly.
  We amended our methodological approach for the DSM computation to apply changes according to referees #3 and #4 by removing the data split part and relying only on the 10 k-fold CV (repeated 3 times).
  
  Lines 268-273: The description of the ensemble modelling approach is unclear to me and would benefit from additional detail. Specifically, it is not clear which ‘conditions’ (l. 269) are being referred to, what criteria were used to select ‘the best one’, and how the individual model predictions were combined in the ensemble? While relevant literature is cited, I believe that a few additional lines better outlining the implementation of the ensemble approach would improve clarity to the reader.
  Due to the change in our methodology this comment no longer applies.
  
  Line 243: Could expand on why RFE was not restrictive enough/ restrictive enough for what?
  We added a supplementary line on this topic.
  “We also performed a recursive feature elimination (RFE; Guyon et al. 2002) on the covariates with the caret package (Kuhn, 2019). The results were more conservative with the number of covariates selected (> 60 for each variables), longer in time computing capacities (800%), and provided lower accuracy scores compared to Boruta selection, for the tested 0-10 cm depth increment.”
  
  Line 372: Why were SoilGrids extremes removed?
  The extremes of the SoilGrids were not removed in the new version, only min. values present
  
  “outliers” but they do not influence the global results (Table 7, Appendix F).
  
  Citation: https://doi.org/10.5194/essd-2025-418-AC7
RC4:
'Comment on essd-2025-418', Anonymous Referee #4, 27 Jan 2026
The overall study is interesting and represents a solid piece of work in this research area. I agree with other referees that, although the study is limited to a specific locality, it presents novel data from a region that is generally under-represented in the literature. In this sense, the manuscript has clear value as a data paper and, in my opinion, deserves publication.
I will not comment on the many minor issues already addressed by other referees. However, I do have two major comments that I believe should be carefully addressed prior to publication, as well as several moderate and minor comments that would help improve clarity and rigor.
Major comments 1. Use of multiple models (Section 2.4.3; L. 246–253)
The rationale for using such a large number of models is unclear. A modelling approach should be chosen based on the nature of the data and the study objectives. Here, several individual models are used, alongside an ensemble model, without a clear purpose or benefit being articulated.
In addition:
Quantile Regression Forest (QRF) is not a fundamentally different model from Random Forest; it is simply RF.

The models are not combined in a way that preserves the ability to produce coherent uncertainty maps.

As a result, the current strategy weakens, rather than strengthens, the uncertainty assessment.

I strongly recommend using a single modelling framework. Either:
rely on QRF only, which naturally provides prediction intervals, or

use an ensemble model only if it can return proper and interpretable prediction uncertainty (which is not evident here).

2. Validation metrics and uncertainty evaluation (Section 2.4.4 and Section 3.1)
Section 2.4.4 is currently confused and mixes metrics that evaluate prediction accuracy with those intended to assess prediction uncertainty. These two aspects should be clearly separated and discussed independently.
There is also no need to report such a large number of metrics. A limited set of interpretable and complementary statistics is preferable, for example:
bias,

an error metric (e.g. RMSE or MAE),

variance explained and/or correlation.

Some specific concerns:
Metrics such as RPIQ are not clearly explained. While they are common in spectroscopy, it is unclear whether they are scaled relative to the error or to the original data, which makes interpretation difficult in this context.

The CCC combines correlation and bias, yet bias is also reported separately. It would be clearer to report correlation and bias independently rather than using CCC.

The interpretation and limitations of PICP need to be discussed more carefully; the authors should consult recent work highlighting its shortcomings (e.g. https://www.sciencedirect.com/science/article/pii/S0016706123002628).

Because of these issues, Section 3.1 needs to be rewritten. Statements such as having a CCC of 0.3 or an RPIQ < 1 are difficult to interpret scientifically. The authors should focus on metrics that have a clear statistical meaning and explicitly explain what the reported values imply for model performance and uncertainty.
Major comment 3: where are the maps of uncertainty (or I may have missed something?), authors should report prediction intervals.
Moderate and minor comments
50 and following

The issue is not spatial resolution per se. Any model can theoretically predict at very fine resolution if computationally feasible. The real limitation is data availability in the region of interest or in regions with similar soil-forming factors. Resolution is largely a computational issue and should not be framed as an objective.

Section 2 description

The description of the sampling and laboratory analyses is difficult to follow. It is unclear which samples were analysed in the lab, which were subsampled, and which were analysed using spectroscopy. This section needs to be clarified.

71

The term “cluster Latin hypercube sampling” is confusing. Clustering and LHS are conceptually incompatible. Do the authors mean conditioned Latin hypercube sampling (cLHS)?

Figures placement

Why are all figures placed at the end of the manuscript? This is inconvenient for reviewers and increasingly uncommon, especially given that reviews are conducted digitally.

171

Some of the cited papers argue the opposite of what is stated, namely that cLHS is not an optimal sampling design for spatial mapping.

173

Intervals in cLHS are not equal; this should be corrected.

225

Spelling error in the cited author’s name.

237

Please clarify whether the data were standardized (zero mean, unit variance) or normalized (scaled between 0 and 1).

246

Data splitting should be avoided when cross-validation is already used. Using CV alone would be preferable.
Citation: https://doi.org/10.5194/essd-2025-418-RC4
- AC5: 'Reply on RC4', Mathias Bellat, 13 Feb 2026
  
  We sincerly thanks the referee 4 for his comments. Some work was needed to fit to his requirement and revised the methdology but this has now been done.
  We adopted a standard QRF model for all soil properties and depth increments, and reduced the number of metrics to the ones suggested by referee 4.
  All the detailed answers are furnished in the joined pdf.
  M. Bellat and all the co-authors.
  
  Citation: https://doi.org/10.5194/essd-2025-418-AC5
RC5:
'Comment on essd-2025-418', Anonymous Referee #5, 02 Feb 2026

Soil characteristic data are crucial for both soil environment research and model parameter input, especially when large amounts of monitoring data are involved. The authors not only collected extensive samples but also conducted regional simulations, achieving a shift from point-based to area-based data collection. However, some statements in the manuscript do not align with the provided tables/figures, and careful revision by the authors is recommended.

The abbreviation for organic carbon is generally OC.
Are the two sides of the soil samples module in Figure 2 conducted simultaneously, yielding different output result maps respectively? For the rectangle at the bottom of the figure and the text below it, which part of the upper diagram do they correspond to? This is not clear.
Does Appendix 1 refer to Table A1 and Table E1 in Appendix B? The order of the appendices seems a bit confusing. Are the details of the method in the supplementary material? Which supplementary materials are you referring to? I can't seem to find the detailed discussion of the method.
2.1 Study area and Figure 3: It would be best to label abbreviations for the two parts on the Figure. Additionally, the blue dashed line representing the 2022 sampling area is just a line segment, not an enclosed region.
2.1.2 Climate and vegetation: Can you provide a distribution map of the vegetation types for the region?
I cannot find Appendix 2.
Line 175: I cannot find Annexe 1
2.3.2: Equation 1 should be provided at the end of this section, with an explanation of what each letter in the Equation represents.
Where’s the Appendix 3?
The meaning of each letter in Equations 2-9 needs to be provided in detail.
Where is Appendix 2? What are the 25 environmental factors included?
Table 4: What’s mean for the Q1 and Q3?
2.4 Models and pre-process: This section should be described in conjunction with Figure 2. Currently, the method seems somewhat disjointed between parts. Based on Figure 2, it would be better to provide an overall description of how each step unfolds and operates.
3 Result: In the results section, data analysis of the soil characteristics from the collected samples should also be presented. In many cases, actual measured values are more important. Additionally, the data in Table 2 should be expressed in the form of mean ± SD. Additionally, a distribution figure of the measured values could be provided.

Citation: https://doi.org/10.5194/essd-2025-418-RC5
- AC4: 'Reply on RC5', Mathias Bellat, 12 Feb 2026
  
  We would once again like to thank the referee for his important and useful comments. They will undoubtedly improve the quality of the manuscript.
  Regarding the revised Fig. 3, you will find it attached at the end of the full response to the comments.
  M. Bellat and all co-authors.
  
  Citation: https://doi.org/10.5194/essd-2025-418-AC4

Status: closed

RC1:
'Comment on essd-2025-418', David G. Rossiter, 29 Oct 2025

essd-2025-418 "Soil information and soil property maps for the Kurdistan region, Dohuk governorate (Iraq)"

Bellat et al.

Review by D G Rossiter 29-Oct-2025
Summary: This exceptionally-thorough and well-explained data paper presents details of the soils in the named region based on survey and models. It used modern methods (inference from MIR spectroscopy) as part of the soil properties determination. From this dataset a standard modern digital soil mapping (DSM) exercise was carried out to produce property maps over the study area. The maps were compared to the global SoilGrids v2.0 maps and, not at all surprisingly, had significantly better point evaluation metrics. All results and workflows are available under the FAIR concept. This paper can be a reference for how such a study can be carried out.
Major Comments:
1. I appreciate the thorough review of previous mapping efforts in the region, it is good to have these listed for reference. The brief review of major pedogenetic procesess is also appreciated. Similarly for the tectonic development, it places the study within context. The study's motivation is clear. Adherence to FAIR standards is appreciated. The entire workflow, all sources and products, are available, with DOI, and explained.
2. The Conclusions mainly repeat the Abstract and sections of the Discussion. I would appreciate a broader conclusion about the success of this study, the applicability of this kind of study to similar regions, the issues of global vs. local models, the main limitations to this kind of study, the importance of reproducibility and FAIR, etc. That is, after doing all this work, what do you conclude about the project?
3. Did you consider DSM for soil classes? Perhaps using a DSMART-like approach with your additional observations? This could be compared with Fig. 6. Obviously that is not to do in the paper, but was it considered and if so, why not attempted? Related to this, it is not clear how the soil class (not classification) map (Figure 6) was created. It's implied that this was expert judgement supplemented by observations, but it's not explicit. Also see comments below re: L443.
4. Can you comment on the realism of patterns as seen in Figures 9--14? We have the point evaluation statistics, but the map shows a landscape. Do the elements we see there correspond to reality, of course by expert judgement? Are the fine details revealed by the 25 m resolution realistic or artefacts?
Detailed Comments:
The WRB 2006 has been replaced by WRB 2022: IUSS Working Group WRB: World Reference Base for Soil Resources. International soil classification system for naming soils and creating legends for soil maps, 4th ed., IUSS, Vienna, Austria, 234 pp., 2022. However I think the definitions used in this paper have not changed.
L179-80 RUSLE: how were the parameters calibrated? Were they from one of the earlier (cited) studies? Especially the K value.
L227 "the different index" -> "the different indices"
L233 "We performed a standardisation of the predicted values of the texture on 100 % with TT.normalise.sum function (Moeys et al., 2024) and a additive-log ratio transformation (Aitchison, 1986) with the alr function (Tsagris et al., 2025)." This is not clear. Was the normalization following the MIR inference/wet lab measurements? And then were the alr variables used in the mapping, followed by back-transformation (as is done in SoilGrids v2.0)?
L235 "close to a normal distribution Liu et al. (2022)" -> "close to a normal distribution (Liu et al. ,2022)"
L236 "2021" refers to what?
L258 "relative "simple"" -> "relatively simple"
\S2.4.3 and throughout the paper: what is meant by "soil depth"? Is it the solum (zone of pedogenesis) or to bedrock/completely unweathered parent material? This might be better termed "thickness" but "depth" is indeed commonly used. L365 "shallow and deep profiles" implies only the solum, is this correct?
L295 the correct reference for PICP is Eq. (2) of Malone (2011) not 2017. The formula is not found in Malone 2017. Malone, B. P., McBratney, A. B., and Minasny, B.: Empirical estimates of uncertainty for mapping continuous depth functions of soil attributes, Geoderma, 160, 614–626, https://doi.org/10.1016/j.geoderma.2010.11.013, 2011. This equation and the others need definitions of the symbols, although some are standard. For PICP what is "v"? I learn from Malone 2011 it is "he number of observations in the validation [better, evaluation] dataset". What is "PL"? L, U as lower, upper limits can be inferred. Finally, the description "we used the prediction interval coverage probability to evaluate the corresponding prediction within an interval" is not clear. The Malone 2011 description is, to me, clearer: "the PICP is the probability that all observed values fit within their prediction limits".
L309 It's interesting that silt is so poorly predicted, yet most of these soils are on the silty edge of the texture triangle. And, clay and sand are in Category A and B. Can you explain why the poor result for silt, even though there is a lot of it and with a good range in these samples? This is mentioned on L387.
L346 "river bakns" -> "river banks". Spell-check.
\S4.3 Another interesting comparison with SG2 would be the prediction ranges. SG2 likely smooths more than this study, see Table 7 where the Q1-Q3 range is always much narrower. This can be brought out in the text -- the interesting discussion is about global vs. local models. The SG2 maps are much more uniform than the maps from this study.
L387 "should be interpreted with caution—consistent"... with what?
L392 "Abdulrahman et al. 2020 " -> "Abdulrahman et al. (2020)". L390 maybe make it explicit here that this is not a DSM product, rather an expert updating from field work and manual interpretation of remote sensing products (correct?).
L401 the Hazelton & Murphy guidelines are for conventional mapping, not DSM. They are expressed in terms of map scale and cm^2 of printed map. Here the product is digital at 25 m resolution. How is the density here converted to match these guidelines? The argument about cLHS is much more relevant for DSM using machine learning from covariates.
L434 formatting problem with the URL https://mathias-bellat.shinyapps.io/Northern-Kurdistan-map, which goes over a line break so gives a 404 error if not manually adjusted
L443 "shallower resolution " -> "higher resolution"? And what is that resolution? It's nowhere stated. L435 says 1:200 000 scale, which implies polygons with a minimum legible delineation (MLD) of 160 ha (0.4 cm^2 on the map). But L390 "The updated soil classification map (Figure 6) must be interpreted with care, specially at micro-scale (<1:50,000)..." implying a smaller MLD. Figure 6 suggest that this is a polygon map.
Figure 3 the inset showing the region is not needed, that has already been shown in Figure 1 and can be found from the coordinates on the main map.
Figure 8 bicolor key y-axis partially obscured

Citation: https://doi.org/10.5194/essd-2025-418-RC1
- AC2: 'Reply on RC1', Mathias Bellat, 20 Dec 2025
  
  We would like to thank Reviewer #1 for his most welcome comments. The improvements from his suggestion will substantially enhance the manuscript's quality and scientific correctness.
  All the questions have been answered in detail in the attached file.
  
  Best regards,
  Mathias Bellat, on behalf of all the co-authors.
  
  Citation: https://doi.org/10.5194/essd-2025-418-AC2
- AC6: 'Reply on RC1 #2', Mathias Bellat, 18 Feb 2026
  
  Following the referee #4's comment, major changes have induced adaptation in the manuscript. You will find here some modifications regarding previous comments.
  "L233 "We performed a standardisation of the predicted values of the texture on 100 % with TT.normalise.sum function (Moeys et al., 2024) and a additive-log ratio transformation (Aitchison,1986) with the alr function (Tsagris et al., 2025)." This is not clear. Was the normalization following the MIR inference/wet lab measurements? And then were the alr variables used in the mapping, followed by back-transformation (as is done in SoilGrids v2.0)?"
  We removed this part on the scaling of the covariates of the DSM due to the adaptation of our methodology (cf. referee #4 comment).
  "\S4.3 Another interesting comparison with SG2 would be the prediction ranges. SG2 likely smooths more than this study, see Table 7 where the Q1-Q3 range is always much narrower. This can be brought out in the text -- the interesting discussion is about global vs. local models. The SG2 maps are much more uniform than the maps from this study."
  Changes have been made in this section. SoilGrid 2.0 is now compared to our model without the SD “harmonisation” – removal of the value outside of the SD – and the section has been re-written in consequence of the newly computed methods.
  "To assess model performance, we compared our results with the global SoilGrids 2.0 product (Poggio et al., 2021), focusing on pH, OC, Nt, and texture attributes (Table 7), for three generalised depth intervals (0-30 cm, 30-60/70 cm, and 60/70-100 cm). We scaled our predictions to match the 250 x 250 m resolution of SoilGrids 2.0, with a bilinear method from the terra package (Hijmans et al., 2025).
  Our models predicted higher values of OC and Nt, with respective increases of ≈ 1000%and 300%over those from SoilGrids 2.0. Predicted sand values were also higher (by ≈ 25%), while clay values were slightly lower (by ≈ 15%). The differences in silt and pH values were negligible, with a 3% higher value for silt and 5% lower for pH in our predictions compared to SoilGrids 2.0.
  
  The standard deviation of the SoilGrids 2.0 product is smaller, except for the pH, than for our prediction models (Table 7). This shows a narrower distribution of values, likely 455 due to the wide range of input data used for the SoilGrids 2.0. The diversity of soil types and input data at the global scale makes the SoilGrids 2.0 model respond relatively homogeneously at the regional scale. Furthermore, the SoilGrids 2.0 product shows a more skewed distribution for OC and Nt, with a higher concentration of values near the lower end of the distribution, which is consistent with the known underestimation of these properties in global models (Shi et al., 2025).
  
  We also compared the evaluation metrics of our predicted values with those obtained from SoilGrids 2.0 on an independent data set (Appendix F). Before training the models on a full data set, we splited the data retaining 20% for test and 80% for training. The models were trained in similar conditions as our main prediction models (cf. 2.4.2), before evaluation were computed on the independant test set. Overall, our models outperformed SoilGrids 2.0 across all evaluation metrics for pH, Nt, OC, sand, silt, and clay at all depth intervals. The only exception was the sand QRF model RMSE score at the 10–30 cm depth interval, which was slightly higher for the SoilGrids 2.0 model at the 15-30cm interval.”
  
  Citation: https://doi.org/10.5194/essd-2025-418-AC6
RC2:
'Comment on essd-2025-418', Anonymous Referee #2, 14 Nov 2025

This manuscript focuses on regional digital soil mapping in Iraq, using 532 soil samples and 85 covariates to produce soil maps via machine learning. While the modeling approaches are generally appropriate, the work falls short in two critical aspects: (1) Limited Geographical Scope: The investigated region is quite small. Consequently, the resulting dataset has limited implications and applicability for the broader scientific community, despite its location is in Iraq. (2) Limited Novelty: The modeling framework adopted is standard practice in digital soil mapping and lacks significant methodological novelty. Given these limitations, specifically the dataset's limited scope and the conventional nature of the modeling, this work does not meet the high standards for originality and impact required for publication in Earth System Science Data.

Citation: https://doi.org/10.5194/essd-2025-418-RC2
- AC1:
  'Reply on RC2', Mathias Bellat, 21 Nov 2025
  We sincerely appreciate the time referee 2 took to read the preprint and highlight the adapted modelling approach in our paper. The reviewer identified two critical aspects of our preprint.
  1) Indeed, the studied area (2,280 km2) is “relatively small” regarding other datasets available in ESSD. However, in other case, regional to local data are also available (e.g. Lorenz et al., 2021; Ardizzone et al., 2023; Błaszczyk et al. 2024). We do think that high-quality regional datasets are necessary to feed and improve other larger datasets. Furthermore, as referee 2 expressed, data on the Iraq region are critically lacking. No regional data set – from any kind of observations - is available on Iraq in the whole ESSD (accessed on 14/11/2025). We do think that underrepresented regions of the globe do need and deserve high-quality, standardised data, as the one proposed in this paper and, more generally, in ESSD. Qualitative data presented in the preprint (soil classes map) is also hardly expendable at a large scale, as regional patterns can not always be transposed. Finally, the comparison with the SoilGrid.2.0 product used in the study also highlights the poor quality of such global products when dealing with local problems. Henceforth, we do think a high-quality local dataset is needed and would also demonstrate the scientific interest of major reviews, such as ESSD, for a scientifically under-studied country.
  2) When mentioning the lack of novelty in the approach, we do understand the criticisms of referee 2, as no “new” method is developed. However, we do think the novelty lies in the combination of known techniques and our unique pipeline/workflow. This study is fully reproducible from the sampling strategy to the final map produced. By combining the sampling strategy, campaign results, FTIR and laboratory measurements, FTIR model predictions, and DSM models, we propose a unique new approach inspired by Malone et al. (2022) but never applied in real conditions at a regional scale.
  We do hope that these answers will incite referee 2 to reconsider the reasons for our application to ESSD journal.
  
  References used:
  Ardizzone, F., Bucci, F., Cardinali, M., Fiorucci, F., Pisano, L., Santangelo, M., & Zumpano, V. (2023). Geomorphological landslide inventory map of the Daunia Apennines, southern Italy. Earth System Science Data,15(2), 753–767. https://doi.org/10.5194/essd-15-753-2023
  
  Błaszczyk, M., Luks, B., Pętlicki, M., Puczko, D., Ignatiuk, D., Laska, M., Jania, J., & Głowacki, P. (2024). High temporal resolution records of the velocity of Hansbreen, a tidewater glacier in Svalbard. Earth System Science Data, 16(4), 1847–1860. https://doi.org/10.5194/essd-16-1847-2024
  
  Lorenz, C., Portele, T. C., Laux, P., & Kunstmann, H. (2021). Bias-corrected and spatially disaggregated seasonal forecasts: A long-term reference forecast product for the water sector in semi-arid regions. Earth System Science Data, 13(6), 2701–2722. https://doi.org/10.5194/essd-13-2701-2021
  
  Malone, B., Stockmann, U., Glover, M., McLachlan, G., Engelhardt, S., & Tuomi, S. (2022). Digital soil survey and mapping underpinning inherent and dynamic soil attribute condition assessments. Soil Security, 6, 100048. https://doi.org/10.1016/j.soisec.2022.100048
  
  Citation: https://doi.org/10.5194/essd-2025-418-AC1
RC3:
'Comment on essd-2025-418', Bas Kempen, 23 Jan 2026
REVIEW RESULTS
A comprehensive paper in a critically under represented geographical area when it comes to soil profiles/ digital soil mapping. Care has been taken with the landscape characterisations including tectonic development and parent material climate and vegetation and geomorphology and soils, as well as maps and photographs to allow the reader to really understand the study area. While the methods are not necessarily ‘new’ themselves, it is an important application of state of the art methods, the novel part of this study is the study area. The output maps are compared to SoilGrids, a global model, with inputs from WoSIS, the study mentions that WoSIS has low sampling density in this area, highlighting the need for such studies, in addition to the scarcity of other options in the area.
I agree with reviewer 1 that this is a well written and thorough study. This study is a good example of adhering to FAIR and open metadata standards, for not only the data but the methodology, and that is commendable. The methods are comprehensively described, and fit their purpose well. All methodological aspects including the code are not only made available following FAIR principles, but thoroughly documented and explained at https://mathias-bellat.github.io/DSM-Kurdistan/digital-soil-mapping.html#visualisation-and-comparison-with-soilgrid-product, creating a fine example of a fully reproducible study and the input and output data themselves are a much needed addition to more or less non-existent openly available soil data in the area. With this, this manuscript fits well within the scope of ESD.
One of the reviewers mentioned the limited geographic scope of the paper. I do agree that this scope is limited but having a DSM study published for a region like Kurdistan (Iraq) is worthwhile and to me an a welcome addition to the body of literature on this topic. Especially given that the authors make their results as well as data open from which other DSM efforts (e.g. SoilGrids) can profit. This is much appreciated.
Having said this, I do have a few comments and questions regarding the manuscript, particularly concerning the methodologies, which in my view require some further explanation and clarification. I encourage the authors to address these points, after which I would recommend publication of this article in ESSD.
SPECIFIC COMMENTS
Main comments
Lines 230-231: Could the authors explain the decision to model each depth layer separately instead of developing one model per property with using the depth as an explanatory variable? Modelling each depth separately is a valid approach, but I would like to understand the reason why the authors took this approach.
Lines 246-247: I do not understand the rationale for combining data splitting (80/20) with (repeated) cross validation. Cross-validation already produces an independent prediction for each data point, from which accuracy metrics can be calculated. What was the motivation for embedding CV within a data-splitting framework? And how does this then work? If CV was performed on the 80% training subset, this would give CV-predictions only for these points. I wonder how predictions for the 20% test set were obtained. Which trained model was used to predict at the points in the test set? In case of normal data splitting this would be the model trained on the training dataset. However, by running CV on the training dataset there is no single trained model but multiple (here 10) fold-specific models.
In addition, the manuscript states that CV was repeated three times? While repetition may improve robustness when data are limited or when using a small number of folds, with 10-fold CV I would expect only minimal variation between the repeats?
Overall, the validation approach seems a bit overcomplicated. The authors may well have had sound reasons for adopting this approach, but in case the rationale and precise implementation need to be explained more clearly.
Lines 268-273: The description of the ensemble modelling approach is unclear to me and would benefit from additional detail. Specifically, it is not clear which ‘conditions’ (l. 269) are being referred to, what criteria were used to select ‘the best one’, and how the individual model predictions were combined in the ensemble? While relevant literature is cited, I believe that a few additional lines better outlining the implementation of the ensemble approach would improve clarity to the reader.
Other comments
Line 10: The summation signs should be removed I believe.
Line 10-11: Reference is made to ‘local models’ (compared to the regional models the authors developed and the SoilGrids global model). It is unclear to me where the ‘local models’ refer to and what the basis is for the claim of the authors.
Line 14: I believe the minus sign in the superscript should be removed, assuming the RMSE values are reported for the transformed depth data. After applying the square-root transformation, the depth unit becomes cm^0.5. The unit of the MSE metric would then be cm, and taking the square root to obtain the RMSE would again give value in cm^0.5. Table 5 also reports the RMSE unit in cm^0.5. I assume the unit of the MAE (Table 5, l. 337) is also cm^0.5? The ‘-0.5’ superscript should be removed I believe (same for line 337).
Line 148: Reference is made to WRB 2006. Can the authors confirm if this was also given that there are more recent versions of the WRB? The latest from 2022 I believe.
Line 178: “potential soil layer” - inconsistent naming – previous paragraph is “potential soil properties”
Line 185: What are the ‘layers’ here? Are these the soil horizons or are these the layers for which samples were collected (l. 186)? Please clarify.
Line 187: How is ‘topsoil’ defined here? Is this the 0-10 layer? Explain a bit more how the ring samples were taken. E.g. where was the ring sample was taken in the topsoil layer: from the top, in the middle of the layer …?
Lines 236-237: References seem to be incomplete. Only years are mentioned (2021,2022)
Line 243: Could expand on why RFE was not restrictive enough/ restrictive enough for what?
Line 372: Why were SoilGrids extremes removed?
Line 401: Limitations are addressed in the paper – I am not sure if the point density, even though notably higher than WoSIS, still warrants mapping at 25m, the reference to Hazelton and Murphy (referencing cartographic scales) seems a bit of a jump. Instead of referencing Hazelton and Murphy I would rather compare to other regional DSM studies.
The Conclusion section reads like an abstract. Reviewer 1 already commented on this and based on that, the cauthors revised the text and I believe with that revision this comment is addressed sufficiently.
Figure 2: What is a negative site? – soil samples
Technical corrections
Line 21: influence local ecosystems -> influence on local ecosystems

Line 28: includes -> include

Line 33: gives information on its ability to fit or not for agricultural purposes, but also to better understand -> gives information on its ability to fit agricultural purposes and helps to better understand

Line 35: Governate -> governate (not capitalised anywhere else in the paper)

Line 70: cluster -> conditional

Line 75 & 265: Hengl and Robert -> Hengl and MacMillan

Line 74: a raw -> one raw

Line 129: climax -> climate

Line 255: McBradney -> McBratney

Line 229: remove ‘part’

Line 234: a additive -> an additive

Line 247: state of the art the art -> state of the art models

Line 270: approached -> approach

Line 346: bakns -> banks

Line 388: remove ‘consistent’

Line 401: Hazelton and Murphy pg5 -> pg4

Line 410: LU/C -> land use/cover (it is not used previously)

Line 425: profiles depth measurement -> profile depth measurements

Line 441: world -> global

Line 443: shallower resolution -> higher resolution

Table 3: Modis brightness index in the wrong column

Appendix B: units column could be named differently. It contains a mixture of units, formulas, ranges.
Citation: https://doi.org/10.5194/essd-2025-418-RC3
- EC1: 'Note to Authors - Reply on RC3', Giulio G.R. Iovine, 26 Jan 2026
  
  Note to Authors
  According to Referee Bas Kempen, the following line on the RSME metric unit should be disregarded in his comments:
  <I believe the superscript “-0.5” should be removed (the same applies to line 337).>
  
  Citation: https://doi.org/10.5194/essd-2025-418-EC1
- AC3: 'Reply on RC3', Mathias Bellat, 27 Jan 2026
  
  We would like to thank the recommender #3 for his comments, which took into account the previous modification. We are confident that these comments will enhance the overall quality of the manuscript.
  All the questions have been answered in detail in the attached file.
  
  Best regards,
  Mathias Bellat, on behalf of all the co-authors.
  
  Citation: https://doi.org/10.5194/essd-2025-418-AC3
- AC7: 'Reply on RC3 #2', Mathias Bellat, 18 Feb 2026
  
  Following the referee #4's comment, major changes have induced adaptation in the manuscript. You will find here some modifications regarding previous comments.
  Lines 246-247: I do not understand the rationale for combining data splitting (80/20) with (repeated) cross validation. Cross-validation already produces an independent prediction for each data point, from which accuracy metrics can be calculated. What was the motivation for embedding CV within a data-splitting framework? And how does this then work? If CV was performed on the 80% training subset, this would give CV-predictions only for these points. I wonder how predictions for the 20% test set were obtained. Which trained model was used to predict at the points in the test set? In case of normal data splitting this would be the model trained on the training dataset. However, by running CV on the training dataset there is no single trained model but multiple (here 10) fold-specific models.
  In addition, the manuscript states that CV was repeated three times? While repetition may improve robustness when data are limited or when using a small number of folds, with 10-fold CV I would expect only minimal variation between the repeats?
  Overall, the validation approach seems a bit overcomplicated. The authors may well have had sound reasons for adopting this approach, but in case the rationale and precise implementation need to be explained more clearly.
  We amended our methodological approach for the DSM computation to apply changes according to referees #3 and #4 by removing the data split part and relying only on the 10 k-fold CV (repeated 3 times).
  
  Lines 268-273: The description of the ensemble modelling approach is unclear to me and would benefit from additional detail. Specifically, it is not clear which ‘conditions’ (l. 269) are being referred to, what criteria were used to select ‘the best one’, and how the individual model predictions were combined in the ensemble? While relevant literature is cited, I believe that a few additional lines better outlining the implementation of the ensemble approach would improve clarity to the reader.
  Due to the change in our methodology this comment no longer applies.
  
  Line 243: Could expand on why RFE was not restrictive enough/ restrictive enough for what?
  We added a supplementary line on this topic.
  “We also performed a recursive feature elimination (RFE; Guyon et al. 2002) on the covariates with the caret package (Kuhn, 2019). The results were more conservative with the number of covariates selected (> 60 for each variables), longer in time computing capacities (800%), and provided lower accuracy scores compared to Boruta selection, for the tested 0-10 cm depth increment.”
  
  Line 372: Why were SoilGrids extremes removed?
  The extremes of the SoilGrids were not removed in the new version, only min. values present
  
  “outliers” but they do not influence the global results (Table 7, Appendix F).
  
  Citation: https://doi.org/10.5194/essd-2025-418-AC7
RC4:
'Comment on essd-2025-418', Anonymous Referee #4, 27 Jan 2026
The overall study is interesting and represents a solid piece of work in this research area. I agree with other referees that, although the study is limited to a specific locality, it presents novel data from a region that is generally under-represented in the literature. In this sense, the manuscript has clear value as a data paper and, in my opinion, deserves publication.
I will not comment on the many minor issues already addressed by other referees. However, I do have two major comments that I believe should be carefully addressed prior to publication, as well as several moderate and minor comments that would help improve clarity and rigor.
Major comments 1. Use of multiple models (Section 2.4.3; L. 246–253)
The rationale for using such a large number of models is unclear. A modelling approach should be chosen based on the nature of the data and the study objectives. Here, several individual models are used, alongside an ensemble model, without a clear purpose or benefit being articulated.
In addition:
Quantile Regression Forest (QRF) is not a fundamentally different model from Random Forest; it is simply RF.

The models are not combined in a way that preserves the ability to produce coherent uncertainty maps.

As a result, the current strategy weakens, rather than strengthens, the uncertainty assessment.

I strongly recommend using a single modelling framework. Either:
rely on QRF only, which naturally provides prediction intervals, or

use an ensemble model only if it can return proper and interpretable prediction uncertainty (which is not evident here).

2. Validation metrics and uncertainty evaluation (Section 2.4.4 and Section 3.1)
Section 2.4.4 is currently confused and mixes metrics that evaluate prediction accuracy with those intended to assess prediction uncertainty. These two aspects should be clearly separated and discussed independently.
There is also no need to report such a large number of metrics. A limited set of interpretable and complementary statistics is preferable, for example:
bias,

an error metric (e.g. RMSE or MAE),

variance explained and/or correlation.

Some specific concerns:
Metrics such as RPIQ are not clearly explained. While they are common in spectroscopy, it is unclear whether they are scaled relative to the error or to the original data, which makes interpretation difficult in this context.

The CCC combines correlation and bias, yet bias is also reported separately. It would be clearer to report correlation and bias independently rather than using CCC.

The interpretation and limitations of PICP need to be discussed more carefully; the authors should consult recent work highlighting its shortcomings (e.g. https://www.sciencedirect.com/science/article/pii/S0016706123002628).

Because of these issues, Section 3.1 needs to be rewritten. Statements such as having a CCC of 0.3 or an RPIQ < 1 are difficult to interpret scientifically. The authors should focus on metrics that have a clear statistical meaning and explicitly explain what the reported values imply for model performance and uncertainty.
Major comment 3: where are the maps of uncertainty (or I may have missed something?), authors should report prediction intervals.
Moderate and minor comments
50 and following

The issue is not spatial resolution per se. Any model can theoretically predict at very fine resolution if computationally feasible. The real limitation is data availability in the region of interest or in regions with similar soil-forming factors. Resolution is largely a computational issue and should not be framed as an objective.

Section 2 description

The description of the sampling and laboratory analyses is difficult to follow. It is unclear which samples were analysed in the lab, which were subsampled, and which were analysed using spectroscopy. This section needs to be clarified.

71

The term “cluster Latin hypercube sampling” is confusing. Clustering and LHS are conceptually incompatible. Do the authors mean conditioned Latin hypercube sampling (cLHS)?

Figures placement

Why are all figures placed at the end of the manuscript? This is inconvenient for reviewers and increasingly uncommon, especially given that reviews are conducted digitally.

171

Some of the cited papers argue the opposite of what is stated, namely that cLHS is not an optimal sampling design for spatial mapping.

173

Intervals in cLHS are not equal; this should be corrected.

225

Spelling error in the cited author’s name.

237

Please clarify whether the data were standardized (zero mean, unit variance) or normalized (scaled between 0 and 1).

246

Data splitting should be avoided when cross-validation is already used. Using CV alone would be preferable.
Citation: https://doi.org/10.5194/essd-2025-418-RC4
- AC5: 'Reply on RC4', Mathias Bellat, 13 Feb 2026
  
  We sincerly thanks the referee 4 for his comments. Some work was needed to fit to his requirement and revised the methdology but this has now been done.
  We adopted a standard QRF model for all soil properties and depth increments, and reduced the number of metrics to the ones suggested by referee 4.
  All the detailed answers are furnished in the joined pdf.
  M. Bellat and all the co-authors.
  
  Citation: https://doi.org/10.5194/essd-2025-418-AC5
RC5:
'Comment on essd-2025-418', Anonymous Referee #5, 02 Feb 2026

Soil characteristic data are crucial for both soil environment research and model parameter input, especially when large amounts of monitoring data are involved. The authors not only collected extensive samples but also conducted regional simulations, achieving a shift from point-based to area-based data collection. However, some statements in the manuscript do not align with the provided tables/figures, and careful revision by the authors is recommended.

The abbreviation for organic carbon is generally OC.
Are the two sides of the soil samples module in Figure 2 conducted simultaneously, yielding different output result maps respectively? For the rectangle at the bottom of the figure and the text below it, which part of the upper diagram do they correspond to? This is not clear.
Does Appendix 1 refer to Table A1 and Table E1 in Appendix B? The order of the appendices seems a bit confusing. Are the details of the method in the supplementary material? Which supplementary materials are you referring to? I can't seem to find the detailed discussion of the method.
2.1 Study area and Figure 3: It would be best to label abbreviations for the two parts on the Figure. Additionally, the blue dashed line representing the 2022 sampling area is just a line segment, not an enclosed region.
2.1.2 Climate and vegetation: Can you provide a distribution map of the vegetation types for the region?
I cannot find Appendix 2.
Line 175: I cannot find Annexe 1
2.3.2: Equation 1 should be provided at the end of this section, with an explanation of what each letter in the Equation represents.
Where’s the Appendix 3?
The meaning of each letter in Equations 2-9 needs to be provided in detail.
Where is Appendix 2? What are the 25 environmental factors included?
Table 4: What’s mean for the Q1 and Q3?
2.4 Models and pre-process: This section should be described in conjunction with Figure 2. Currently, the method seems somewhat disjointed between parts. Based on Figure 2, it would be better to provide an overall description of how each step unfolds and operates.
3 Result: In the results section, data analysis of the soil characteristics from the collected samples should also be presented. In many cases, actual measured values are more important. Additionally, the data in Table 2 should be expressed in the form of mean ± SD. Additionally, a distribution figure of the measured values could be provided.

Citation: https://doi.org/10.5194/essd-2025-418-RC5
- AC4: 'Reply on RC5', Mathias Bellat, 12 Feb 2026
  
  We would once again like to thank the referee for his important and useful comments. They will undoubtedly improve the quality of the manuscript.
  Regarding the revised Fig. 3, you will find it attached at the end of the full response to the comments.
  M. Bellat and all co-authors.
  
  Citation: https://doi.org/10.5194/essd-2025-418-AC4

Supplement

https://doi.org/10.5194/essd-2025-418-supplement

Data sets

Digital soil mapping predicted on mid-infrared (MIR) spectroscopy measurements in North-Western Kurdistan region, Iraq (netCDF and GeoTIFF files) [dataset] Mathias Bellat et al. https://doi.org/10.1594/PANGAEA.973764

Soil bulk density and soil depth from on-site observations in the North-Western Kurdistan region, Iraq [dataset] Mathias Bellat et al. https://doi.org/10.1594/PANGAEA.973714

Soil properties in the North-Western Kurdistan region, Iraq, derived from laboratory measurements [dataset] Mathias Bellat et al. https://doi.org/10.1594/PANGAEA.973701

Soil properties predicted on mid-infrared (MIR) spectroscopy measurements in North-Western Kurdistan region, Iraq [dataset]. Mathias Bellat et al. https://doi.org/10.1594/PANGAEA.973700

Soil information in Kurdistan region, Dohuk governorate (Iraq) Mathias Bellat et al. https://doi.org/10.57754/FDAT.e2k10-sf012

Model code and software

DSM-Kurdistan code release 1.1.0 Mathias Bellat; Nafiseh Kakhani https://github.com/mathias-bellat/DSM-Kurdistan.git

Interactive computing environment

Soil information in Kurdistan region, Dohuk governorate (Iraq), supplementary material Mathias Bellat; Nafiseh Kakhani https://mathias-bellat.github.io/DSM-Kurdistan/

Digital soil maping Mathias Bellat https://mathias-bellat.shinyapps.io/Northern-Kurdistan-map/

Viewed

Total article views: 2,210 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	Supplement	BibTeX	EndNote
1,297	847	66	2,210	125	40	53

HTML: 1,297
PDF: 847
XML: 66
Total: 2,210
Supplement: 125
BibTeX: 40
EndNote: 53

Views and downloads (calculated since 15 Sep 2025)

Month	HTML	PDF	XML	Total
Sep 2025	608	16	7	631
Oct 2025	117	97	8	222
Nov 2025	97	107	7	211
Dec 2025	87	90	9	186
Jan 2026	166	190	10	366
Feb 2026	141	139	17	297
Mar 2026	81	208	8	297

Cumulative views and downloads (calculated since 15 Sep 2025)

Month	HTML	PDF	XML	Total
Sep 2025	608	16	7	631
Oct 2025	117	97	8	222
Nov 2025	97	107	7	211
Dec 2025	87	90	9	186
Jan 2026	166	190	10	366
Feb 2026	141	139	17	297
Mar 2026	81	208	8	297

Viewed (geographical distribution)

Total article views: 2,180 (including HTML, PDF, and XML) Thereof 2,180 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 21 Mar 2026

Download

Preprint (47253 KB)
Metadata XML

Short summary

This dataset presents the first soil maps for the region produced using digital mapping techniques. It includes predictions for ten major physical and chemical soil properties at various depths, plus a map of total soil depth. For each property, we selected the most accurate models and key environmental drivers. In Southwestern Asia and many arid or semi-arid regions, detailed soil data are often missing. This dataset fills that gap, supporting agriculture, research, planning, and local policy.


Total:	0
HTML:	0
PDF:	0
XML:	0