the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Improving the Latin America and Caribbean Soil Information System (SISLAC) database enhances its usability and scalability
Sergio Díaz-Guadarrama
Viviana M. Varón-Ramírez
Iván Lizarazo
Mario Guevara
Marcos Angelini
Gustavo A. Araujo-Carrillo
Jainer Argeñal
Daphne Armas
Rafael A. Balta
Adriana Bolivar
Nelson Bustamante
Ricardo O. Dart
Martin Dell Acqua
Arnulfo Encina
Hernán Figueredo
Fernando Fontes
Joan S. Gutiérrez-Díaz
Wilmer Jiménez
Raúl S. Lavado
Jesús F. Mansilla-Baca
Maria de Lourdes Mendonça-Santos
Lucas M. Moretti
Iván D. Muñoz
Carolina Olivera
Guillermo Olmedo
Christian Omuto
Sol Ortiz
Carla Pascale
Marco Pfeiffer
Iván A. Ramos
Danny Ríos
Rafael Rivera
Lady M. Rodriguez
Darío M. Rodríguez
Albán Rosales
Kenset Rosales
Guillermo Schulz
Víctor Sevilla
Leonardo M. Tenti
Ronald Vargas
Gustavo M. Vasques
Yusuf Yigini
Yolanda Rubiano
Download
- Final revised paper (published on 11 Mar 2024)
- Preprint (discussion started on 14 Sep 2022)
Interactive discussion
Status: closed
-
RC1: 'Comment on essd-2022-291', Jinshi Jian, 08 Oct 2022
The manuscript “Improving Latin American Soil Information Database for Digital Soil Mapping enhances its usability and scalability” submitted to ESSD described a method to identify the main problems in the SISLAC profiles occur systematically in Latin American countries, and provided a work flow to identify the errors in SISLAC, and finally, the authors carefully checked the errors in the SISLAC database and provided a quality improved SISLAC. This work shows the potential of improved soil databases for the generation of spatial information such as SOC or any other property which have been surveyed in existing regional or national scale soil datasets, and it has the potential to improve the global scale soil datasets. I only have few minor suggestions for the authors to consider and to correct. Other than that, I believe this work contributed to improve the quality of an existing soil dataset and their works is important in data science community.
Some minor suggestions:
Line 160: how about the sites coincided with their respective country, but may have other issues?
Line 162: Figure 3c is an example of coordinates inverted, but why it was marked as correct in the figure (marked as √)?
Line 174: can you explain when and why gaps exist?
Line 314: “This work is a effort” should be “This work is an effort”.
Line 314-324: this paragraph talked about improving SISLAC contribute to a better data in the region (national results such as Colombia, Ecuador, and Argentina), how about its contribution to the global soil dataset? Are SISLAC be included in the global soil datasets such as SoilGrid, SoilGrid2, HWSD? How and whether the approach used in this study can be applied to improve global soil datasets?
Line 322: “Y. Zhang (2020)” should be “Zhang (2020)”, check this issue for the entire manuscript, please.
Discussion: I suggest that subtitles can be added to increase the readability of the discussion.
Captions of some tables and figures are too simple, and the necessary descriptions should be added to make the tables and figures self-explanatory.
Table 1: it has a period sign (.) at the end of the table caption, but table 2 does not has one, same issue for figures, please check all figure and table captions.
Table 2: PDDL, ODC-By, ODC-ODbL, CC-BY, CC-BY-NC, CC-BY-NC-ND; those are all acronyms, they should be explained.
Table 4: can you also give an example of gaps between layers exist?
Table 5: “Assign the value of the upper limit of the last layer plus 10”, need to explain why “plus 10”.
Table 6: for the first case (Organic layer), I see no difference between “Inconsistency” and “Correction Guideline”. Should the top be “-5” in the correction guideline column? (i.e., organic layer should be -5 to 0).
Figure 3: in the brackets, panel a, b, and c were explained, why there is no description about panel d? Panel c was an example of coordinates inverted, why labeled as √ ?
Figure 8: this figure looks not correct, should y axis “Residual” rather than “Predicted values”? And what are dashed lines and solid lines? They should be explained in the figure caption. Why the solid line is necessary in this figure?
Citation: https://doi.org/10.5194/essd-2022-291-RC1 -
AC1: 'Reply on RC1', Sergio Diaz, 28 May 2023
We thank you very much for the time you have spent on our manuscript. In the attached document we respond to the suggestions you have indicated, which served to improve our manuscript. There are two important changes in the document; the first is that in a joint effort with FAO´s Latin America and the Caribbean Soil Partnership, a review was made of the databases available in the region and we were able to consolidate a larger database, which has grown from 41,000 records to almost 67,000. This is reflected in the new manuscript along with the new DOI of the dataset, which are made available to the soil science scientific community under the FAIR (Findable, Accessible, Interoperable and Reusable) principles. The second is that at the suggestion of another reviewer, the digital soil mapping part has been excluded in order to focus on the description of the SISLAC database and the methodology used for its analysis. Once again, we thank you for your time and comments that have enriched this work.
Kind regards,
SISLAC Team
-
AC3: 'Reply on RC1', Sergio Diaz, 09 Jan 2024
Dear Reviewer,
Following some adjustments, we have enhanced the consistency of the database and conducted a principal component analysis (PCA). Consequently, we have redirected the focus of the article towards data cleaning and description. We believe that these modifications will strengthen the overall quality of our submission. We sincerely apologize for any inconvenience caused by the extended development time.
Best regards,
SISLAC Team
-
AC1: 'Reply on RC1', Sergio Diaz, 28 May 2023
-
RC2: 'Comment on essd-2022-291', José Lucas Safanelli, 12 Oct 2022
General comments:
The paper “Improving Latin American Soil Information Database for Digital Soil Mapping enhances its usability and scalability” describes the effort of gathering and harmonizing Latin America soil data from historical surveys, which was promoted by FAO's South American Soil Partnership and involved several collaborators across from region. The authors presented a quality assessment analysis, described a new improved version of the dataset, and demonstrated the potential of SISLAC for generating new soil information through digital soil mapping. This type of work is important in order to document soil data integration efforts and document the best practices for harmonizing heterogeneous soil datasets. In addition, it makes clear that avoiding removing a lot of data that can be simply adjusted has an enormous impact on the final number of samples and potentially the spatial representation across a region. Overall, the authors did a great job in describing their quality analysis, but I was not convinced by the results from digital soil mapping. I think the authors could rather explore the dataset with a denser descriptive analysis, avoiding a predictive approach (which was very simple and suboptimal). Therefore, I don’t have any major objection to its publication. However, I think that a moderate revision of the second goal is required before reaching a final decision. Finally, I congratulate the authors for making available the improved SISLAC dataset on a public persistent repository (Zenodo) with an open-access license.
Specific comments:
Although the first introduction paragraphs describe what soil is and how they form, the current structure seems a bit overloaded to me. For example, the first three sentences have a lot of information that is hard to grasp at first moment. I would suggest starting from line 72 and relocating those first sentences after explaining the soil importance, bringing the definitions after a gentler introduction.
The data are well described. I was able to access their online website (http://54.229.242.119/sislac/es) and check some soil profiles. However, I had some issues with signing up to the portal (could not confirm my email address to log in). The public access does not have any download button, but it seems the user can copy and paste single profile tabular data. They do not mention any application programming interface (API) in this data section, which is a characteristic of modern web 2.0 platforms (https://en.wikipedia.org/wiki/Web_2.0). I would suggest at least discussing data distribution through APIs and explaining in the manuscript if this feature is planned as a potential improvement of future SISLAC versions.
It is not clear in the manuscript if the SISLAC from their website is the older or the improved version.
When navigating their website, I found that many samples come from the WoSIS snapshot of 2016. There are other datasets, such as the SISINTA. I just wonder if the authors could provide an overview of the original sources (WoSIS, SISINTA, etc.) similarly to what they did with country numbers. This new table could be placed as supplementary material to help readers quickly evaluate the difference between SISLAC and other available public datasets, such as WoSIS.
How do the authors expect to update SISLAC when newer versions of the original sources are released? Have they automated the quality analysis keeping in mind new updates or has this current work involved a workforce for manual inspection?
Why the authors defined 150 cm as the bottom limit instead of 200 cm? 200 cm is an arbitrary convention from pedology but at least is the standard limit of GlobalSoilMap. A simple justification would be enough in my view, as reprocessing the data would be very expensive.
Both good-of-fitness equations have minor mistakes, although the result will not be impacted as the difference between observed and predicted are squared. However, the sum of squared residuals should be observed-predicted in both RMSE and R2 numerator.
The authors did a good job of describing and reporting their quality assessment analysis. I wonder if they used some published guidelines or proposed those based on the issues they faced in the project development. I think this data description paper and methods can help many other efforts for soil data integration and harmonization.
I only have serious concerns about the results from the data usability section. The authors provided reasonable summary statistics and visualizations. However, the cross-validation statistics are very intriguing, at least from the current scatterplot visualization. In my view, it is impossible to get moderate to good R2 from the scatter distribution they plotted, especially for the third panel where they reached an R2 of 0.83. All the fitted lines are almost flat, with a narrower predicted variance compared to the original values. In addition, when many data points are overlapped, it is common to present a scatterplot with point density, making possible the evaluation of the linear trend around the fitted line. The bias of these models is really high, so other performance metrics like Lin’s correlation concordance coefficient (CCC) would indicate a potential unsatisfactory performance. Therefore, I’m not convinced with the results from this data usability section and even question the authors if they are willing to keep these results in their manuscript. Instead of presenting these questionable results from digital soil mapping or another predictive approach, I think the authors could rather crunch the dataset with a denser exploratory data analysis with summary statistics, multivariate data analysis using PCA in combination with grouping factors (coloring by color, biome, or any other physical information), some spatial statistics (like Moran's index, or even screening variograms for the whole region), etc. In my opinion, those results would be a greater fit for the manuscript type, which is a data description paper. If they follow this suggestion, I think they should adjust the paper title.
The discussion is well developed; however, I would only suggest adjusting it if the digital soil mapping results are revised.
Technical corrections:
Overall, the paper is clear and well-structured. I’m not an English native speaker, but I think the readers would benefit from a proofread version of the paper.
In line 214, I think the authors should define ordinary kriging as an interpolation method rather than a method to estimate SOC, e.g.: “On the other hand, ordinary kriging (OK) was used for horizontal variability assessment, a method frequently used to spatially predict SOC …”
Citation: https://doi.org/10.5194/essd-2022-291-RC2 -
AC2: 'Reply on RC2', Sergio Diaz, 28 May 2023
Dear Editor (RC2)
We appreciate your time in reviewing our manuscript. After your suggestions we have decided to exclude the part of the digital soil mapping to focus on the database and its description. In addition, after working together with FAO´s Latin America and the Caribbean Soil Partnership in the last few months we were able to consolidate a larger database, which has grown from 41 thousand records to a little more than 66 thousand after the revision and incorporation of other soil databases available in the region. This is reflected in the new manuscript together with the new DOI of the dataset (https://doi.org/10.5281/zenodo.787673). This is undoubtedly great news for the soil science community in the region. And it is hoped that with other new similar efforts this database will continue to grow and that digital cartographic products can be generated that are supported by these data. Once again, we thank you for your time and we send here with the response to your comments.
Kind regards,
SISLAC Team
-
AC4: 'Reply on RC2', Sergio Diaz, 09 Jan 2024
Dear Reviewer,
As previously mentioned, the article has been redirected to emphasize the database. In accordance with your suggestions, a Principal Component Analysis (PCA) of attributes with the highest availability has been conducted. Consequently, we have refocused the article on data cleaning and description. We trust that these changes will enhance the solidity of our contribution. We sincerely apologize for any inconvenience caused by the extended development time.
Best regards,
SISLAC Team
-
AC2: 'Reply on RC2', Sergio Diaz, 28 May 2023