Authors have answered most of the questions raised about the first version of the manuscript. They have improved the manuscript based on the suggestions, but clarity and information on uncertainty needs further input. It is understandable that quantifying the uncertainty of the variables is quite challenging for the authors. It would need significant amount of time to perform it, especially to find the correct method for each mapped variable/soil property. It might be an interesting analysis for a separate paper. The presentation of the present state of the dataset is already valuable. In this manuscript the main strength is the description on how the enormous and very valuable soil data collected in line with the Soviet system could be translated into a dataset which can be used for quantitative analyses. The followings should be further described and highlighted in case of all mapped variables under discussion:
- how the variables were derived, e.g.: based on expert rules/ characteristic mean values are assigned/ computed with PTFs based on characteristic particle size distribution, etc.;
- what are the limitations of their use.
Still terminology used in international literature should be adapted in the manuscript. It is needed not only for the texture, but for all expressions related to soil science in the entire text, tables, e.g.: in WRB the word “soil type” is not used, there are reference soil groups (http://www.fao.org/3/i3794en/I3794en.pdf). Please provide information in the text about how WRB reference groups and qualifiers were derived and mention possible limitation of providing this information.
May be the derived particle size distribution (SOL_SAND, SOL_SILT, SOL_CLAY) could be referred in the manuscript as clay, silt and sand content characteristic for the texture class of the unit and you could add reference of the conversion method.
Further checking the grammar and terminology would be important, e.g. among others P15 L13, P17 L9, L12-13, P18 L15. Some suggestions are provided under specific comments.
On the map (EstSoil-EH_v1.0.shp) available from https://zenodo.org/record/3473290#.XmdtHkq6qUk there is no information about “wrb_main”, “EST_CRS1-4”, “Huumus”, “ao_hor_thick”, “ao_hor_type”, “geometry” are mentioned in Table 6.The map includes “Boniteet” and “Varv”, but not included in Table 6. You mention that you might not include information on soil fertility. Please harmonize Table 6 and the attributes of the map.
Just randomly checking the database for some cases it was not logical that nlayers=1 and clay, silt, sand and rock content is given for layer 2 as well, e.g.: orig_fig=194041, 191116, or nlayer is 2 and clay, silt, sand and rock content is given for layer 1 and 3, e.g.: orig_fig=372656. It might be useful to not use zero (0) for those rows where there is no data. Please check and correct these.
Please clarify in the text the followings, which were mentioned in the first round of the review as well:
- How were Estonian soil types translated into WRB reference soil groups and qualifiers? Is there a reference document for it?
- Predictors used in the random forest method could be listed under materials and methods section.
P2 L9-12: please finish the sentence
P2 L14: please reduce repetition of soil and terrain
P2 L15: Please add meaning of ESD.
P2 L20-21: … soil information derived with machine learning methods …
P2 L23-24: … sand, silt and clay content, amount of coarse fragments, organic carbon content and carbon stocks at seven soil depths … please use the word “content” in the entire text in case of the above mentioned soil properties.
P2 L25-26: please consider that also SoilGrids provide harmonized soil database for Europe, please rephrase the sentence. The following can be deleted: “, and also covers Estonia”, it is logical.
P2 L30-31: it is not clear why HYPRES dataset is mentioned. In this case LUCAS or EU-HYDI datasets could be mentioned as well. Please consider the message of the text and revise the sentence accordingly.
P3 L23: … related to water and carbon cycle … is it correct? AWC is missing from the listing.
P3 L27: … There is no countrywide spatial dataset of soil organic carbon content and bulk density for Estonia ... is it correct this way? Please finish this thread, e.g.: it was needed to derive predictions for both soil properties which made it possible to map them.
P3 L27-30: could be moved after L18.
P4 L18-19: instead of soil profiles would it be appropriate to write soil layering?
P4 L20: … related to water and carbon cycle… is it correct?
P4 L21: … from the historical soil maps of Estonia – surveyed between 1949 and 1991 – to support modelling … is it correct?
P5 L1: … based on organoleptic field judgement (feel methods) and …
P5 L6: … combinations have been described considering the texture of soil layers … is it correct?
P6 L2: the following can be deleted: “instead of 9, 21 (9+12) or 108 (9x12)”. It is not clear why 87240 unique values are recorded in the previous dataset. Does it come from the combination of soil type + texture + layering + level of erosion + slope position – similarly to the explanation you provided under previous answers?
P6 L6: … to derive …
P6 L19: please provide reference literature for translating Estonian soil types to WRB reference soil groups and qualifiers.
P7 L6-7: … can also read the analysed depth from the top and bottom depth of the layer and defined them as SOL_Z# …
P7 L8: … the number of layers described in the profile …
P8 L27: … to USDA texture classes … is it correct?
P9 L13: How did you calculate the accuracy of organoleptic determination of clay content? Through the organoleptic determination didn’t you determine the soil texture class? Or do you determine directly the clay content? Please clarify it.
P10 L12: … We compared the derived sand, silt and clay content values with two different datasets. … is it correct?
P10 L22, L26: on P13 L20 you mention that Ksat was calculated with Rosetta PTFs, thus it was not derived from EU-SoilHydroGrids. Please revise it.
P10 L27-29: Please describe more detailed the differences between EstSoil-Eh and SoilGrids.
P12 L15: … as predictor variables for the calculation of SOC and BD …
P12 L19: … in Estonia, random forest (RF) method was …
P12 L21: please note that number of randomly selected variables – during each split – and number of trees in the forest are usually optimized.
P12 L30: the following can be deleted: “for machine learning”.
P13 L3: please list predictor variables.
P13 L5: based on your answer and Equation 4 texture was not used to calculate BD, please delete the following: „texture values and”.
P13 L6: please shortly describe why you choose that PTF to compute BD, why not other PTF was used, e.g.: applicability/ training set used to derive the PTF was similar.
P13 L9: there is no information in the brackets, please check it.
P13 L17: … We included two variables …
P13 L23: … Rosetta3 …
P14 L10: please check the reference, humic or peaty topsoils do not have blocky, platy or massive structure.
P14 L11-12: Please provide more information with reference about how soil structural class was derived. Based on solely texture and amount of course material structure cannot be given.
P14 L1-27: based on the present information, deriving structural class is a weak point. Therefore, I would suggest to not include the USLE K erodibility factor in EstSoil-EH dataset and in the manuscript.
P16 L10-11: It is the repetition of the information given in materials and methods, therefore could be deleted.
P16 L15-24: please put these information to a table.
P16 L16: please add if that is R2 or something else.
P16 L25: Is it correct that accuracy of BD could not be analysed because there is no Estonian dataset with measured values? If that is true, please mention it. If there is a dataset where accuracy can be calculated, please perform the analysis.
P16 L27: please describe map of BD as well.
P16 L29-30: mentioned in the materials and methods, please delete the sentence.
P17 L1-2: if structure class cannot be derived with a more robust method, I would suggest to delete information on USLE K from the entire manuscript and the database.
P18 L14-15: may be the following could be considered: … with a reproducible workflow, which is unique in the case of Estonian soil datasets …
P17 L24-25: “are informed to some extent by previous reports” it is not clear please rephrase it.
P17 L30-31: sentence starting with “A direct” is not clear, please rephrase it. It would be better to move the sentence starting with “From the point” under results section.
P17 L31-32: … based on the layering of the original texture code per mapped soil units…
P18 L8-12: It is a very good idea to use an additional class for peat, but the following sentence might be confusing therefore should be revised or deleted: “From that perspective peat soil units are currently modelled with assumptions to have a similar behaviour to clay hydrologically.” Several studies have shown that the shape of the shrinkage characteristics of peat soils were significantly different from those of clay soils (Van den Akker and Hendriks, 1997; Oleszczuk et al., 2003; Hendriks, 2004.)
Figure 1: it could be indicated that A, B and C parts are directly included as well in the EstSoil_EH v1.0 dataset – not only indirectly. Please indicate it with further arrows.
Figure 2: please do not use abbreviation in the caption of the figure and for what the training sample was used.
Figure 3: please format labelling of x and y axis and add unit.
Figure 4? please note that there is no heavy clay class in the USDA soil texture terminology (Soil Survey Staff, 1975), or add in text the reference for USDA texture classes.
Figure 4, 5: please correct topsoil in the legend’s title.
Figure 5: … weight % …, … volume % … in the legends. … National map of a) sand, b) silt and c) clay content, and d) amount of coarse fragments characteristic for the soil texture class of the mapping unit. … or something similar in the caption.
Figure 6: … National map of a) soil organic carbon content and b) bulk density of the first soil layer, derived with pedotransfer functions. … in caption. … weight % … in the legend. Please check if you have any polygons with BD lower than 0.2 g/cm3. Blue polygons are not visible on the figure, but it might be because only a few and small polygons have BD lower than 0.2 g/cm3. ..
Figure 7: … Map of soil hydraulic parameters a) saturated hydraulic conductivity (Ksat) and b) available water capacity (AWC) in the first soil layer. Ksat was derived with pedotransfer function, AWC was retrieved from the EU-SoilHydroGrids.
Figure 8: please consider above comments on USLE K.
Figure 9: please check and correct if the word “SoilGrids” is written properly in the entire text. Please add to the caption the meaning of the abbreviations. What does SGR_K_SAT1 mean? Ksat is not included in SoilGrids. Numbers on the histograms are not visible, please reformat them.
Table 2: please write full name of USDA texture classes. The following can be deleted: “These rules were selected by the authors.”
Table 3: the following can be deleted: “as the singular value required by the SWAT model”.
Table 4: please consider above comments on USLE K.
Table 5: please recheck descriptive statistics of SOL_BD1, the mean value should not be 0 g/cm3. Please add unit of the variables. Ksat is not included in SoilGrids, please revise SGR_K_SAT1. If that was derived from the EU-SoilHydriGrids, please note that the value available from the download site is: Saturated hydraulic conductivity (KS)[cm day−1] × 100 .
Table 6: it has to be harmonized with the attribute table of the shp file available from zenodo repository. Further notes:
- the table needs to be more coherent, e.g.: add both “description” and “unit” of the mapped variable, in the present version you provide or description of the variable or unit of the variable,
- what do you mean by “object” under “data type”?
- please be precise in providing the information, e.g.: for some variables you just give, that it is standard deviation, please add of which variable,
- the description of WRB_code is not clear, please clarify it,
- wrb_main might be WRB Reference Soil Group, please revise it,
- USLE_K: please consider above comments on USLE K.
For the users it would be helpful if the above information would be available in an .xls file from the zenodo repository of the EstSoil-EH v1.0 dataset.