Improving Latin American Soil Information Database for Digital Soil Mapping enhances its usability and scalability
Abstract. Spatial soil databases can help model complex phenomena in which soils are decisive, for example, evaluating agricultural potential or estimating carbon storage capacity. The Soil Information System for Latin America and the Caribbean, SISLAC, is a regional initiative promoted by the FAO's South American Soil Partnership to contribute to the sustainable management of soil. SISLAC includes data coming from 49,084 soil profiles distributed unevenly across the continent, making it the region's largest soil database. However, some problems hinder its usages, such as the quality of the data and its high dimensionality. The objective of this research is twofold. First, to evaluate the quality of SISLAC and its data values and generate a new, improved version that meets the minimum quality requirements to be used by different interests or practical applications. Second, to demonstrate the potential of improved soil profile databases to generate more accurate information on soil properties, by conducting a case study to estimate the spatial variability of the percentage of soil organic carbon using 192 profiles in a 1473 km2 region located in the department of Valle del Cauca, Colombia. The findings show that 15 percent of the existing soil profiles had an inaccurate description of the diagnostic horizons. Further correction of an 4.5 additional percent of existing inconsistencies improved overall data quality. The improved database consists of 41,691 profiles and is available for public use at https://doi.org/10.5281/zenodo.6540710 (Díaz-Guadarrama, S. & Guevara, M., 2022). The updated profiles were segmented using algorithms for quantitative pedology to estimate the spatial variability. We generated segments one centimeter thick along with each soil profile data, then the values of these segments were adjusted using a spline-type function to enhance vertical continuity and reliability. Vertical variability was estimated up to 150 cm in-depth, while ordinary kriging predicts horizontal variability at three depth intervals, 0 to 5, 5 to 15, and 15 to 30 cm, at 250 m-spatial resolution, following the standards of the GlobalSoilMap project. Finally, the leave-one-out cross-validation provides information for evaluating the kriging model performance, obtaining values for the RMSE index between 1.77 % and 1.79 % and the R2 index greater than 0.5. The results show the usability of SISLAC database to generate spatial information on soil properties and suggest further efforts to collect a more significant amount of data to guide sustainable soil management.
Sergio Díaz-Guadarrama et al.
Status: final response (author comments only)
RC1: 'Comment on essd-2022-291', Jinshi Jian, 08 Oct 2022
- AC1: 'Reply on RC1', Sergio Diaz, 28 May 2023
RC2: 'Comment on essd-2022-291', José Lucas Safanelli, 12 Oct 2022
- AC2: 'Reply on RC2', Sergio Diaz, 28 May 2023
Sergio Díaz-Guadarrama et al.
Revised database of the Soil Information System of Latin America and the Caribbean, SISLAC https://doi.org/10.5281/zenodo.6540710
Sergio Díaz-Guadarrama et al.
Viewed (geographical distribution)
The manuscript “Improving Latin American Soil Information Database for Digital Soil Mapping enhances its usability and scalability” submitted to ESSD described a method to identify the main problems in the SISLAC profiles occur systematically in Latin American countries, and provided a work flow to identify the errors in SISLAC, and finally, the authors carefully checked the errors in the SISLAC database and provided a quality improved SISLAC. This work shows the potential of improved soil databases for the generation of spatial information such as SOC or any other property which have been surveyed in existing regional or national scale soil datasets, and it has the potential to improve the global scale soil datasets. I only have few minor suggestions for the authors to consider and to correct. Other than that, I believe this work contributed to improve the quality of an existing soil dataset and their works is important in data science community.
Some minor suggestions:
Line 160: how about the sites coincided with their respective country, but may have other issues?
Line 162: Figure 3c is an example of coordinates inverted, but why it was marked as correct in the figure (marked as √)?
Line 174: can you explain when and why gaps exist?
Line 314: “This work is a effort” should be “This work is an effort”.
Line 314-324: this paragraph talked about improving SISLAC contribute to a better data in the region (national results such as Colombia, Ecuador, and Argentina), how about its contribution to the global soil dataset? Are SISLAC be included in the global soil datasets such as SoilGrid, SoilGrid2, HWSD? How and whether the approach used in this study can be applied to improve global soil datasets?
Line 322: “Y. Zhang (2020)” should be “Zhang (2020)”, check this issue for the entire manuscript, please.
Discussion: I suggest that subtitles can be added to increase the readability of the discussion.
Captions of some tables and figures are too simple, and the necessary descriptions should be added to make the tables and figures self-explanatory.
Table 1: it has a period sign (.) at the end of the table caption, but table 2 does not has one, same issue for figures, please check all figure and table captions.
Table 2: PDDL, ODC-By, ODC-ODbL, CC-BY, CC-BY-NC, CC-BY-NC-ND; those are all acronyms, they should be explained.
Table 4: can you also give an example of gaps between layers exist?
Table 5: “Assign the value of the upper limit of the last layer plus 10”, need to explain why “plus 10”.
Table 6: for the first case (Organic layer), I see no difference between “Inconsistency” and “Correction Guideline”. Should the top be “-5” in the correction guideline column? (i.e., organic layer should be -5 to 0).
Figure 3: in the brackets, panel a, b, and c were explained, why there is no description about panel d? Panel c was an example of coordinates inverted, why labeled as √ ?
Figure 8: this figure looks not correct, should y axis “Residual” rather than “Predicted values”? And what are dashed lines and solid lines? They should be explained in the figure caption. Why the solid line is necessary in this figure?