the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Soil information and soil property maps for the Kurdistan region, Dohuk governorate (Iraq)
Abstract. We present the first detailed soil property maps at multiple depths for the northwestern autonomous Kurdistan region of Iraq (Dohuk). A total of 532 soil samples from 122 sites were collected at five depth increments (0–10, 10–30, 30–50, 50–70, and 70–100 cm), and their mid-infrared (MIR) spectra were measured. A subset of 108 samples, selected via Kennard–Stone sampling, was analysed in a laboratory on ten soil properties. A Cubist model was trained and used from these measured values to predict all samples’ soil properties from their MIR spectra. Digital soil mapping was conducted using various machine learning regression techniques (ensemble learning, linear classifier, nearest neighbour classifier, decision trees), trained on the predicted soil properties and using a total of 85 covariates at 25 m pixel resolution, resulting in 50 prediction maps in total. Results were compared with the SoilGrids 2.0 product and a regional texture model. Soil depth was also mapped using a quantile random forest with 26 covariates. Our regional model outperformed global SoilGrids 2.0 predictions in resolution and accuracy, with texture RMSEs (sand: ∑RMSE = 9.35; silt: ∑RMSE = 6.8; clay: ∑RMSE = 10.28) comparable to local models. Quantile random forest achieved the best performance in 51 % of the models, and key predictors included Sentinel 2 SWIR, EVI, NDVI, and SAVI. Spatial patterns reflected the contrast between the flat areas of the Simele and Zakho plains, as opposed to the shallower and steeper Little Khabur Valley and anticline formations. Furthermore, the soil depth prediction model (R2 = 0.57; RMSE = 2.59 cm-0.5) showed strong correlation with slope and a similar pattern distribution with deeper soils in the flat areas of the Simele and Zakho plains, while shallow soils are visible in the anticline and strongly erodible areas. Our comprehensive dataset (Bellat et al., 2024a, b, c, d, 2025) offers substantial insights for soil knowledge in the region, as well as for aridic and semi-aridic areas.
- Preprint
(47253 KB) - Metadata XML
-
Supplement
(44835 KB) - BibTeX
- EndNote
Status: open (until 25 Nov 2025)
- RC1: 'Comment on essd-2025-418', David G. Rossiter, 29 Oct 2025 reply
-
RC2: 'Comment on essd-2025-418', Anonymous Referee #2, 14 Nov 2025
reply
This manuscript focuses on regional digital soil mapping in Iraq, using 532 soil samples and 85 covariates to produce soil maps via machine learning. While the modeling approaches are generally appropriate, the work falls short in two critical aspects: (1) Limited Geographical Scope: The investigated region is quite small. Consequently, the resulting dataset has limited implications and applicability for the broader scientific community, despite its location is in Iraq. (2) Limited Novelty: The modeling framework adopted is standard practice in digital soil mapping and lacks significant methodological novelty. Given these limitations, specifically the dataset's limited scope and the conventional nature of the modeling, this work does not meet the high standards for originality and impact required for publication in Earth System Science Data.
Citation: https://doi.org/10.5194/essd-2025-418-RC2 -
AC1: 'Reply on RC2', Mathias Bellat, 21 Nov 2025
reply
We sincerely appreciate the time referee 2 took to read the preprint and highlight the adapted modelling approach in our paper. The reviewer identified two critical aspects of our preprint.
1) Indeed, the studied area (2,280 km2) is “relatively small” regarding other datasets available in ESSD. However, in other case, regional to local data are also available (e.g. Lorenz et al., 2021; Ardizzone et al., 2023; Błaszczyk et al. 2024). We do think that high-quality regional datasets are necessary to feed and improve other larger datasets. Furthermore, as referee 2 expressed, data on the Iraq region are critically lacking. No regional data set – from any kind of observations - is available on Iraq in the whole ESSD (accessed on 14/11/2025). We do think that underrepresented regions of the globe do need and deserve high-quality, standardised data, as the one proposed in this paper and, more generally, in ESSD. Qualitative data presented in the preprint (soil classes map) is also hardly expendable at a large scale, as regional patterns can not always be transposed. Finally, the comparison with the SoilGrid.2.0 product used in the study also highlights the poor quality of such global products when dealing with local problems. Henceforth, we do think a high-quality local dataset is needed and would also demonstrate the scientific interest of major reviews, such as ESSD, for a scientifically under-studied country.
2) When mentioning the lack of novelty in the approach, we do understand the criticisms of referee 2, as no “new” method is developed. However, we do think the novelty lies in the combination of known techniques and our unique pipeline/workflow. This study is fully reproducible from the sampling strategy to the final map produced. By combining the sampling strategy, campaign results, FTIR and laboratory measurements, FTIR model predictions, and DSM models, we propose a unique new approach inspired by Malone et al. (2022) but never applied in real conditions at a regional scale.
We do hope that these answers will incite referee 2 to reconsider the reasons for our application to ESSD journal.
References used:
- Ardizzone, F., Bucci, F., Cardinali, M., Fiorucci, F., Pisano, L., Santangelo, M., & Zumpano, V. (2023). Geomorphological landslide inventory map of the Daunia Apennines, southern Italy. Earth System Science Data,15(2), 753–767. https://doi.org/10.5194/essd-15-753-2023
- Błaszczyk, M., Luks, B., Pętlicki, M., Puczko, D., Ignatiuk, D., Laska, M., Jania, J., & Głowacki, P. (2024). High temporal resolution records of the velocity of Hansbreen, a tidewater glacier in Svalbard. Earth System Science Data, 16(4), 1847–1860. https://doi.org/10.5194/essd-16-1847-2024
- Lorenz, C., Portele, T. C., Laux, P., & Kunstmann, H. (2021). Bias-corrected and spatially disaggregated seasonal forecasts: A long-term reference forecast product for the water sector in semi-arid regions. Earth System Science Data, 13(6), 2701–2722. https://doi.org/10.5194/essd-13-2701-2021
- Malone, B., Stockmann, U., Glover, M., McLachlan, G., Engelhardt, S., & Tuomi, S. (2022). Digital soil survey and mapping underpinning inherent and dynamic soil attribute condition assessments. Soil Security, 6, 100048. https://doi.org/10.1016/j.soisec.2022.100048
-
AC1: 'Reply on RC2', Mathias Bellat, 21 Nov 2025
reply
Data sets
Digital soil mapping predicted on mid-infrared (MIR) spectroscopy measurements in North-Western Kurdistan region, Iraq (netCDF and GeoTIFF files) [dataset] Mathias Bellat et al. https://doi.org/10.1594/PANGAEA.973764
Soil bulk density and soil depth from on-site observations in the North-Western Kurdistan region, Iraq [dataset] Mathias Bellat et al. https://doi.org/10.1594/PANGAEA.973714
Soil properties in the North-Western Kurdistan region, Iraq, derived from laboratory measurements [dataset] Mathias Bellat et al. https://doi.org/10.1594/PANGAEA.973701
Soil properties predicted on mid-infrared (MIR) spectroscopy measurements in North-Western Kurdistan region, Iraq [dataset]. Mathias Bellat et al. https://doi.org/10.1594/PANGAEA.973700
Soil information in Kurdistan region, Dohuk governorate (Iraq) Mathias Bellat et al. https://doi.org/10.57754/FDAT.e2k10-sf012
Model code and software
DSM-Kurdistan code release 1.1.0 Mathias Bellat; Nafiseh Kakhani https://github.com/mathias-bellat/DSM-Kurdistan.git
Interactive computing environment
Soil information in Kurdistan region, Dohuk governorate (Iraq), supplementary material Mathias Bellat; Nafiseh Kakhani https://mathias-bellat.github.io/DSM-Kurdistan/
Digital soil maping Mathias Bellat https://mathias-bellat.shinyapps.io/Northern-Kurdistan-map/
Viewed
| HTML | XML | Total | Supplement | BibTeX | EndNote | |
|---|---|---|---|---|---|---|
| 783 | 192 | 18 | 993 | 24 | 22 | 26 |
- HTML: 783
- PDF: 192
- XML: 18
- Total: 993
- Supplement: 24
- BibTeX: 22
- EndNote: 26
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
essd-2025-418 "Soil information and soil property maps for the Kurdistan region, Dohuk governorate (Iraq)"
Bellat et al.
Review by D G Rossiter 29-Oct-2025
Summary: This exceptionally-thorough and well-explained data paper presents details of the soils in the named region based on survey and models. It used modern methods (inference from MIR spectroscopy) as part of the soil properties determination. From this dataset a standard modern digital soil mapping (DSM) exercise was carried out to produce property maps over the study area. The maps were compared to the global SoilGrids v2.0 maps and, not at all surprisingly, had significantly better point evaluation metrics. All results and workflows are available under the FAIR concept. This paper can be a reference for how such a study can be carried out.
Major Comments:
1. I appreciate the thorough review of previous mapping efforts in the region, it is good to have these listed for reference. The brief review of major pedogenetic procesess is also appreciated. Similarly for the tectonic development, it places the study within context. The study's motivation is clear. Adherence to FAIR standards is appreciated. The entire workflow, all sources and products, are available, with DOI, and explained.
2. The Conclusions mainly repeat the Abstract and sections of the Discussion. I would appreciate a broader conclusion about the success of this study, the applicability of this kind of study to similar regions, the issues of global vs. local models, the main limitations to this kind of study, the importance of reproducibility and FAIR, etc. That is, after doing all this work, what do you conclude about the project?
3. Did you consider DSM for soil classes? Perhaps using a DSMART-like approach with your additional observations? This could be compared with Fig. 6. Obviously that is not to do in the paper, but was it considered and if so, why not attempted? Related to this, it is not clear how the soil class (not classification) map (Figure 6) was created. It's implied that this was expert judgement supplemented by observations, but it's not explicit. Also see comments below re: L443.
4. Can you comment on the realism of patterns as seen in Figures 9--14? We have the point evaluation statistics, but the map shows a landscape. Do the elements we see there correspond to reality, of course by expert judgement? Are the fine details revealed by the 25 m resolution realistic or artefacts?
Detailed Comments:
The WRB 2006 has been replaced by WRB 2022: IUSS Working Group WRB: World Reference Base for Soil Resources. International soil classification system for naming soils and creating legends for soil maps, 4th ed., IUSS, Vienna, Austria, 234 pp., 2022. However I think the definitions used in this paper have not changed.
L179-80 RUSLE: how were the parameters calibrated? Were they from one of the earlier (cited) studies? Especially the K value.
L227 "the different index" -> "the different indices"
L233 "We performed a standardisation of the predicted values of the texture on 100 % with TT.normalise.sum function (Moeys et al., 2024) and a additive-log ratio transformation (Aitchison, 1986) with the alr function (Tsagris et al., 2025)." This is not clear. Was the normalization following the MIR inference/wet lab measurements? And then were the alr variables used in the mapping, followed by back-transformation (as is done in SoilGrids v2.0)?
L235 "close to a normal distribution Liu et al. (2022)" -> "close to a normal distribution (Liu et al. ,2022)"
L236 "2021" refers to what?
L258 "relative "simple"" -> "relatively simple"
\S2.4.3 and throughout the paper: what is meant by "soil depth"? Is it the solum (zone of pedogenesis) or to bedrock/completely unweathered parent material? This might be better termed "thickness" but "depth" is indeed commonly used. L365 "shallow and deep profiles" implies only the solum, is this correct?
L295 the correct reference for PICP is Eq. (2) of Malone (2011) not 2017. The formula is not found in Malone 2017. Malone, B. P., McBratney, A. B., and Minasny, B.: Empirical estimates of uncertainty for mapping continuous depth functions of soil attributes, Geoderma, 160, 614–626, https://doi.org/10.1016/j.geoderma.2010.11.013, 2011. This equation and the others need definitions of the symbols, although some are standard. For PICP what is "v"? I learn from Malone 2011 it is "he number of observations in the validation [better, evaluation] dataset". What is "PL"? L, U as lower, upper limits can be inferred. Finally, the description "we used the prediction interval coverage probability to evaluate the corresponding prediction within an interval" is not clear. The Malone 2011 description is, to me, clearer: "the PICP is the probability that all observed values fit within their prediction limits".
L309 It's interesting that silt is so poorly predicted, yet most of these soils are on the silty edge of the texture triangle. And, clay and sand are in Category A and B. Can you explain why the poor result for silt, even though there is a lot of it and with a good range in these samples? This is mentioned on L387.
L346 "river bakns" -> "river banks". Spell-check.
\S4.3 Another interesting comparison with SG2 would be the prediction ranges. SG2 likely smooths more than this study, see Table 7 where the Q1-Q3 range is always much narrower. This can be brought out in the text -- the interesting discussion is about global vs. local models. The SG2 maps are much more uniform than the maps from this study.
L387 "should be interpreted with caution—consistent"... with what?
L392 "Abdulrahman et al. 2020 " -> "Abdulrahman et al. (2020)". L390 maybe make it explicit here that this is not a DSM product, rather an expert updating from field work and manual interpretation of remote sensing products (correct?).
L401 the Hazelton & Murphy guidelines are for conventional mapping, not DSM. They are expressed in terms of map scale and cm^2 of printed map. Here the product is digital at 25 m resolution. How is the density here converted to match these guidelines? The argument about cLHS is much more relevant for DSM using machine learning from covariates.
L434 formatting problem with the URL https://mathias-bellat.shinyapps.io/Northern-Kurdistan-map, which goes over a line break so gives a 404 error if not manually adjusted
L443 "shallower resolution " -> "higher resolution"? And what is that resolution? It's nowhere stated. L435 says 1:200 000 scale, which implies polygons with a minimum legible delineation (MLD) of 160 ha (0.4 cm^2 on the map). But L390 "The updated soil classification map (Figure 6) must be interpreted with care, specially at micro-scale (<1:50,000)..." implying a smaller MLD. Figure 6 suggest that this is a polygon map.
Figure 3 the inset showing the region is not needed, that has already been shown in Figure 1 and can be found from the coordinates on the main map.
Figure 8 bicolor key y-axis partially obscured