Improving Latin American Soil Information Database for Digital Soil Mapping enhances its usability and scalability
- 1Departamento de Agronomía, Facultad de Ciencias Agrarias. Universidad Nacional de Colombia, Bogotá, Colombia
- 2Centro de Geociencias - Universidad Nacional Autónoma de México Campus Juriquilla, Querétaro, 76230, México
- 3University of California, Riverside, Department of Environmental Sciences, Riverside CA. 92507, USA
- 4United States Department of Agriculture, Soil Salinity National Laboratory, Riverside CA. 92507, USA
- 5FAO, Vialle de Terme di Caracalla, Rome, Italy
- 6Corporación Colombiana de Investigación Agropecuaria AGROSAVIA, C.I. Tibaitatá, Bogotá, CO-0571, Colombia
- 7Facultad de Ciencias/ Universidad Nacional Autónoma de Honduras, Honduras
- 8Departamento de Agronomía, Edif. CITEIIB. Universidad de Almería. Almería, 04120, España
- 9Dirección General de Asuntos Ambientales Agrarios, Ministerio de Desarrollo Agrario y Riego, Perú
- 10Subdirección Agrología, Instituto Geográfico Agustín Codazzi, Bogotá, Colombia
- 11Servicio Agrícola y Ganadero, Santiago de Chile, Chile
- 12Embrapa Solos, Rio de Janeiro, 22460-000, Brasil
- 13Direccion General de Recursos Naturales, Ministerio de Ganadería, Agricultura y Pesca, Montevideo, Uruguay
- 14Facultad de Ciencias Agrarias de la Universidad Nacional de Asunción, Asunción, Paraguay
- 15Sociedad Boliviana de la Ciencia del Suelo, La Paz, Bolivia
- 16Department of Agroecology, Faculty of Science and Technology, Aarhus University, Tjele, DK-8830 Denmark
- 17Ministerio de Agricultura y Ganadería, Quito, 170516, Ecuador
- 18Facultad de Agronomía e INBA (CONICET/UBA), Universidad de Buenos Aires, Buenos Aires, 1417, Argentina
- 19Estación Experimental Agropecuaria Cerro Azul, Instituto Nacional de Tecnología Agropecuaria, Misiones, Argentina
- 20Subdirección de Geografía, Instituto Geográfico Agustín Codazzi - IGAC, Bogotá, 111321, Colombia
- 21Secretaría de Agricultura y Desarrollo Rural, México
- 22Ministerio de Agricultura, Ganadería y Pesca (MAGYP), Argentina
- 23Departamento de Ingeniería y Suelos, Facultad de Ciencias Agronómicas, Universidad de Chile, Santiago, Chile
- 24Instituto de Investigación Agropecuaria de Panamá, Ciudad de Panamá, Panamá
- 25Departamento de Ciencias del Suelo y Ordenamiento Territorial, Universidad Nacional de Asunción, Paraguay
- 26Ministerio de Medio Ambiente, Santo Domingo, República Dominicana
- 27Instituto de Suelos (CIRN), Instituto Nacional de Tecnología Agropecuaria, Hurlingham, Buenos Aires, B1686, Argentina
- 28Instituto de Innovación en Transferencia y Tecnología Agropecuaria, San José, Costa Rica
- 29Ministerio de Ambiente y Recursos Naturales, Guatemala
- 30Universidad Central de Venezuela, Maracay, Venezuela
- 1Departamento de Agronomía, Facultad de Ciencias Agrarias. Universidad Nacional de Colombia, Bogotá, Colombia
- 2Centro de Geociencias - Universidad Nacional Autónoma de México Campus Juriquilla, Querétaro, 76230, México
- 3University of California, Riverside, Department of Environmental Sciences, Riverside CA. 92507, USA
- 4United States Department of Agriculture, Soil Salinity National Laboratory, Riverside CA. 92507, USA
- 5FAO, Vialle de Terme di Caracalla, Rome, Italy
- 6Corporación Colombiana de Investigación Agropecuaria AGROSAVIA, C.I. Tibaitatá, Bogotá, CO-0571, Colombia
- 7Facultad de Ciencias/ Universidad Nacional Autónoma de Honduras, Honduras
- 8Departamento de Agronomía, Edif. CITEIIB. Universidad de Almería. Almería, 04120, España
- 9Dirección General de Asuntos Ambientales Agrarios, Ministerio de Desarrollo Agrario y Riego, Perú
- 10Subdirección Agrología, Instituto Geográfico Agustín Codazzi, Bogotá, Colombia
- 11Servicio Agrícola y Ganadero, Santiago de Chile, Chile
- 12Embrapa Solos, Rio de Janeiro, 22460-000, Brasil
- 13Direccion General de Recursos Naturales, Ministerio de Ganadería, Agricultura y Pesca, Montevideo, Uruguay
- 14Facultad de Ciencias Agrarias de la Universidad Nacional de Asunción, Asunción, Paraguay
- 15Sociedad Boliviana de la Ciencia del Suelo, La Paz, Bolivia
- 16Department of Agroecology, Faculty of Science and Technology, Aarhus University, Tjele, DK-8830 Denmark
- 17Ministerio de Agricultura y Ganadería, Quito, 170516, Ecuador
- 18Facultad de Agronomía e INBA (CONICET/UBA), Universidad de Buenos Aires, Buenos Aires, 1417, Argentina
- 19Estación Experimental Agropecuaria Cerro Azul, Instituto Nacional de Tecnología Agropecuaria, Misiones, Argentina
- 20Subdirección de Geografía, Instituto Geográfico Agustín Codazzi - IGAC, Bogotá, 111321, Colombia
- 21Secretaría de Agricultura y Desarrollo Rural, México
- 22Ministerio de Agricultura, Ganadería y Pesca (MAGYP), Argentina
- 23Departamento de Ingeniería y Suelos, Facultad de Ciencias Agronómicas, Universidad de Chile, Santiago, Chile
- 24Instituto de Investigación Agropecuaria de Panamá, Ciudad de Panamá, Panamá
- 25Departamento de Ciencias del Suelo y Ordenamiento Territorial, Universidad Nacional de Asunción, Paraguay
- 26Ministerio de Medio Ambiente, Santo Domingo, República Dominicana
- 27Instituto de Suelos (CIRN), Instituto Nacional de Tecnología Agropecuaria, Hurlingham, Buenos Aires, B1686, Argentina
- 28Instituto de Innovación en Transferencia y Tecnología Agropecuaria, San José, Costa Rica
- 29Ministerio de Ambiente y Recursos Naturales, Guatemala
- 30Universidad Central de Venezuela, Maracay, Venezuela
Abstract. Spatial soil databases can help model complex phenomena in which soils are decisive, for example, evaluating agricultural potential or estimating carbon storage capacity. The Soil Information System for Latin America and the Caribbean, SISLAC, is a regional initiative promoted by the FAO's South American Soil Partnership to contribute to the sustainable management of soil. SISLAC includes data coming from 49,084 soil profiles distributed unevenly across the continent, making it the region's largest soil database. However, some problems hinder its usages, such as the quality of the data and its high dimensionality. The objective of this research is twofold. First, to evaluate the quality of SISLAC and its data values and generate a new, improved version that meets the minimum quality requirements to be used by different interests or practical applications. Second, to demonstrate the potential of improved soil profile databases to generate more accurate information on soil properties, by conducting a case study to estimate the spatial variability of the percentage of soil organic carbon using 192 profiles in a 1473 km2 region located in the department of Valle del Cauca, Colombia. The findings show that 15 percent of the existing soil profiles had an inaccurate description of the diagnostic horizons. Further correction of an 4.5 additional percent of existing inconsistencies improved overall data quality. The improved database consists of 41,691 profiles and is available for public use at https://doi.org/10.5281/zenodo.6540710 (Díaz-Guadarrama, S. & Guevara, M., 2022). The updated profiles were segmented using algorithms for quantitative pedology to estimate the spatial variability. We generated segments one centimeter thick along with each soil profile data, then the values of these segments were adjusted using a spline-type function to enhance vertical continuity and reliability. Vertical variability was estimated up to 150 cm in-depth, while ordinary kriging predicts horizontal variability at three depth intervals, 0 to 5, 5 to 15, and 15 to 30 cm, at 250 m-spatial resolution, following the standards of the GlobalSoilMap project. Finally, the leave-one-out cross-validation provides information for evaluating the kriging model performance, obtaining values for the RMSE index between 1.77 % and 1.79 % and the R2 index greater than 0.5. The results show the usability of SISLAC database to generate spatial information on soil properties and suggest further efforts to collect a more significant amount of data to guide sustainable soil management.
Sergio Díaz-Guadarrama et al.
Status: open (until 12 Feb 2023)
-
RC1: 'Comment on essd-2022-291', Jinshi Jian, 08 Oct 2022
reply
The manuscript “Improving Latin American Soil Information Database for Digital Soil Mapping enhances its usability and scalability” submitted to ESSD described a method to identify the main problems in the SISLAC profiles occur systematically in Latin American countries, and provided a work flow to identify the errors in SISLAC, and finally, the authors carefully checked the errors in the SISLAC database and provided a quality improved SISLAC. This work shows the potential of improved soil databases for the generation of spatial information such as SOC or any other property which have been surveyed in existing regional or national scale soil datasets, and it has the potential to improve the global scale soil datasets. I only have few minor suggestions for the authors to consider and to correct. Other than that, I believe this work contributed to improve the quality of an existing soil dataset and their works is important in data science community.
Some minor suggestions:
Line 160: how about the sites coincided with their respective country, but may have other issues?
Line 162: Figure 3c is an example of coordinates inverted, but why it was marked as correct in the figure (marked as √)?
Line 174: can you explain when and why gaps exist?
Line 314: “This work is a effort” should be “This work is an effort”.
Line 314-324: this paragraph talked about improving SISLAC contribute to a better data in the region (national results such as Colombia, Ecuador, and Argentina), how about its contribution to the global soil dataset? Are SISLAC be included in the global soil datasets such as SoilGrid, SoilGrid2, HWSD? How and whether the approach used in this study can be applied to improve global soil datasets?
Line 322: “Y. Zhang (2020)” should be “Zhang (2020)”, check this issue for the entire manuscript, please.
Discussion: I suggest that subtitles can be added to increase the readability of the discussion.
Captions of some tables and figures are too simple, and the necessary descriptions should be added to make the tables and figures self-explanatory.
Table 1: it has a period sign (.) at the end of the table caption, but table 2 does not has one, same issue for figures, please check all figure and table captions.
Table 2: PDDL, ODC-By, ODC-ODbL, CC-BY, CC-BY-NC, CC-BY-NC-ND; those are all acronyms, they should be explained.
Table 4: can you also give an example of gaps between layers exist?
Table 5: “Assign the value of the upper limit of the last layer plus 10”, need to explain why “plus 10”.
Table 6: for the first case (Organic layer), I see no difference between “Inconsistency” and “Correction Guideline”. Should the top be “-5” in the correction guideline column? (i.e., organic layer should be -5 to 0).
Figure 3: in the brackets, panel a, b, and c were explained, why there is no description about panel d? Panel c was an example of coordinates inverted, why labeled as √ ?
Figure 8: this figure looks not correct, should y axis “Residual” rather than “Predicted values”? And what are dashed lines and solid lines? They should be explained in the figure caption. Why the solid line is necessary in this figure?
-
RC2: 'Comment on essd-2022-291', José Lucas Safanelli, 12 Oct 2022
reply
General comments:
The paper “Improving Latin American Soil Information Database for Digital Soil Mapping enhances its usability and scalability” describes the effort of gathering and harmonizing Latin America soil data from historical surveys, which was promoted by FAO's South American Soil Partnership and involved several collaborators across from region. The authors presented a quality assessment analysis, described a new improved version of the dataset, and demonstrated the potential of SISLAC for generating new soil information through digital soil mapping. This type of work is important in order to document soil data integration efforts and document the best practices for harmonizing heterogeneous soil datasets. In addition, it makes clear that avoiding removing a lot of data that can be simply adjusted has an enormous impact on the final number of samples and potentially the spatial representation across a region. Overall, the authors did a great job in describing their quality analysis, but I was not convinced by the results from digital soil mapping. I think the authors could rather explore the dataset with a denser descriptive analysis, avoiding a predictive approach (which was very simple and suboptimal). Therefore, I don’t have any major objection to its publication. However, I think that a moderate revision of the second goal is required before reaching a final decision. Finally, I congratulate the authors for making available the improved SISLAC dataset on a public persistent repository (Zenodo) with an open-access license.
Specific comments:
Although the first introduction paragraphs describe what soil is and how they form, the current structure seems a bit overloaded to me. For example, the first three sentences have a lot of information that is hard to grasp at first moment. I would suggest starting from line 72 and relocating those first sentences after explaining the soil importance, bringing the definitions after a gentler introduction.
The data are well described. I was able to access their online website (http://54.229.242.119/sislac/es) and check some soil profiles. However, I had some issues with signing up to the portal (could not confirm my email address to log in). The public access does not have any download button, but it seems the user can copy and paste single profile tabular data. They do not mention any application programming interface (API) in this data section, which is a characteristic of modern web 2.0 platforms (https://en.wikipedia.org/wiki/Web_2.0). I would suggest at least discussing data distribution through APIs and explaining in the manuscript if this feature is planned as a potential improvement of future SISLAC versions.
It is not clear in the manuscript if the SISLAC from their website is the older or the improved version.
When navigating their website, I found that many samples come from the WoSIS snapshot of 2016. There are other datasets, such as the SISINTA. I just wonder if the authors could provide an overview of the original sources (WoSIS, SISINTA, etc.) similarly to what they did with country numbers. This new table could be placed as supplementary material to help readers quickly evaluate the difference between SISLAC and other available public datasets, such as WoSIS.
How do the authors expect to update SISLAC when newer versions of the original sources are released? Have they automated the quality analysis keeping in mind new updates or has this current work involved a workforce for manual inspection?
Why the authors defined 150 cm as the bottom limit instead of 200 cm? 200 cm is an arbitrary convention from pedology but at least is the standard limit of GlobalSoilMap. A simple justification would be enough in my view, as reprocessing the data would be very expensive.
Both good-of-fitness equations have minor mistakes, although the result will not be impacted as the difference between observed and predicted are squared. However, the sum of squared residuals should be observed-predicted in both RMSE and R2 numerator.
The authors did a good job of describing and reporting their quality assessment analysis. I wonder if they used some published guidelines or proposed those based on the issues they faced in the project development. I think this data description paper and methods can help many other efforts for soil data integration and harmonization.
I only have serious concerns about the results from the data usability section. The authors provided reasonable summary statistics and visualizations. However, the cross-validation statistics are very intriguing, at least from the current scatterplot visualization. In my view, it is impossible to get moderate to good R2 from the scatter distribution they plotted, especially for the third panel where they reached an R2 of 0.83. All the fitted lines are almost flat, with a narrower predicted variance compared to the original values. In addition, when many data points are overlapped, it is common to present a scatterplot with point density, making possible the evaluation of the linear trend around the fitted line. The bias of these models is really high, so other performance metrics like Lin’s correlation concordance coefficient (CCC) would indicate a potential unsatisfactory performance. Therefore, I’m not convinced with the results from this data usability section and even question the authors if they are willing to keep these results in their manuscript. Instead of presenting these questionable results from digital soil mapping or another predictive approach, I think the authors could rather crunch the dataset with a denser exploratory data analysis with summary statistics, multivariate data analysis using PCA in combination with grouping factors (coloring by color, biome, or any other physical information), some spatial statistics (like Moran's index, or even screening variograms for the whole region), etc. In my opinion, those results would be a greater fit for the manuscript type, which is a data description paper. If they follow this suggestion, I think they should adjust the paper title.
The discussion is well developed; however, I would only suggest adjusting it if the digital soil mapping results are revised.
Technical corrections:
Overall, the paper is clear and well-structured. I’m not an English native speaker, but I think the readers would benefit from a proofread version of the paper.
In line 214, I think the authors should define ordinary kriging as an interpolation method rather than a method to estimate SOC, e.g.: “On the other hand, ordinary kriging (OK) was used for horizontal variability assessment, a method frequently used to spatially predict SOC …”
Sergio Díaz-Guadarrama et al.
Data sets
Revised database of the Soil Information System of Latin America and the Caribbean, SISLAC Sergio Díaz-Guadarrama, Mario Guevara https://doi.org/10.5281/zenodo.6540710
Sergio Díaz-Guadarrama et al.
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
444 | 116 | 14 | 574 | 8 | 9 |
- HTML: 444
- PDF: 116
- XML: 14
- Total: 574
- BibTeX: 8
- EndNote: 9
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1