the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Providing quality-assessed and standardised soil data to support global mapping and modelling (WoSIS snapshot 2023)
Abstract. Snapshots derived from the World Soil Information Service (WoSIS) are served freely to the international community. These static datasets provide quality-assessed and standardised soil profile data that can be used to support digital soil mapping and environmental applications at broad scale levels. Since the release of the preceding snapshot in 2019, new ETL (Extract, Load, Transform) procedures for screening, ingesting and standardising disparate source data have been developed. In conjunction with this, the WoSIS data model was overhauled making it compatible with the ISO 28258 and Observations and Measurements (O&M) domain models. Additional procedures for querying, serving, and downloading the publicly available standardised data have been implemented using open software (e.g. GraphQL API). Following up on a short discussion of these methodological developments we discuss the structure and content of the “WoSIS 2023-snapshot”. A range of new soil datasets was shared with us, registered in the ISRIC World Data Centre for Soils (WDC-Soils) data repository, and subsequently processed in accordance with the licences specified by the data providers. An important effort has been the processing of forest soil data collated in the framework of the EU-HoliSoils project. We paid special attention to the standardisation of soil property definitions, description of the soil analytical procedures, and standardisation of the units of measurement. The “2023 snapshot” considers the following soil chemical properties (total carbon, organic carbon, inorganic carbon (total carbonate equivalent), total nitrogen, phosphorus (extractable-P, total-P, and P-retention), soil pH, cation exchange capacity, and electrical conductivity) and physical properties (soil texture (sand, silt, and clay), bulk density, coarse fragments, and water retention), grouped according to analytical procedures that are operationally comparable. Method options are defined for each analytical procedure (e.g. pH measured in water, KCl or CaCl2 solution, molarity of the solution, and soil/solution ratio). For each profile we also provide the original soil classification (i.e. FAO, WRB and USDA system with their version) and pedological horizon designations as far as these have been specified in the source databases. Three measures for “fitness-for-intended-use” are provided to facilitate informed data use: a) positional uncertainty of the profile’s site location, b) possible uncertainty associated with the operationally defined analytical procedures, and c) date of sampling. The most recent (i.e. dynamic) dataset, called wosis_latest, is freely accessible via various webservices. To permit consistent referencing and citation we also provide a static snapshot (in casu, December 2023). This snapshot comprises quality-assessed and standardised data for 228 k geo-referenced profiles. The data come from 174 countries and represent more than 900 k soil layers (or horizons) and over 6 million records. The number of measurements for each soil property vary (greatly) between profiles and with depth, this generally depending on the objectives of the initial soil sampling programmes. In the coming years, we aim to gradually fill gaps in the geographic distribution of the profiles, as well as in the soil observations themselves, this subject to the sharing of a wider selection of “public” soil data by prospective data contributors. The WoSIS 2023-snapshot is archived and freely available at https://doi.org/10.17027/isric-wdcsoils-20231130 (Calisto et al., 2023).
- Preprint
(1132 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comments on 'Providing quality-assessed and standardised soil data to support global mapping and modelling (WoSIS snapshot 2023)'', Alessandro Samuel-Rosa, 04 May 2024
GENERAL COMMENTS
ISRIC World Soil Information has long been recognized for its efforts in collecting, organizing, and disseminating quality soil data for the international community. The current paper, the third in the series, succinctly describes the procedures adopted by ISRIC for cleaning and disseminating global soil profile data. The international community will certainly appreciate the release of the third static WoSIS snapshot, providing quality-assessed and standardized data for 228k geo-referenced profiles. Similar to the previous two manuscripts, the present manuscript aims to inform soil data producers and users about the current status of WoSIS and the availability of standardized point soil data for digital soil mapping and earth system modeling. Overall, I believe that the manuscript delivers this information in a clear, concise, and organized manner.
The present manuscript is the third in a series of manuscripts describing WoSIS snapshots. As ISRIC continues its role in collating more soil data from multiple sources, we expect to see new snapshots and accompanying manuscripts in the coming years. For this reason, I believe that the present manuscript should be prepared in a manner that presents the history of WoSIS and how its soil data processing strategies have evolved over time. The current paper reads more like a report and provides insufficient context regarding its relation to the previous two versions. While informative for someone familiar with WoSIS, it does not adequately educate new users about the database's past history. Adding more historical context to the manuscript would greatly enhance its comprehensiveness and usefulness.
The manuscript could benefit from further discussion on the future trajectory of WoSIS. The initial snapshot in 2016 contained 96K soil profiles, with a significant increase to 196k profiles in the second snapshot in 2019. However, the third snapshot in 2023, four years later, only brought the count to 228k profiles, accompanied by a notable rise in restricted soil profiles within WoSIS's internal database. Given the finite nature of soil data, I anticipate that future snapshots may exhibit smaller increases in data availability. This trend mirrors our experiences in Brazil with efforts to rescue legacy soil data for inclusion in the Brazilian Soil Data Repository (Soil Data). I am curious to know if you share this expectation and whether the manuscript will address strategies already in place or those to be implemented to continue enhancing soil data availability.
I would like to offer specific feedback on three sections of the manuscript, which align with my earlier observations on the manuscript's overall quality. I trust that you will find my comments valuable.
SPECIFIC COMMENTS
WoSIS data model and workflow
While I appreciate the explanation of the data model used in WoSIS, I believe the section could benefit from additional context. It's unclear whether users of the 2023 snapshot need a comprehensive understanding of the data model to utilize the data effectively. Therefore, I suggest explaining to data users how the data model impacts them. It would be helpful to include a figure or table, as the topic is quite technical and may require some abstraction. It's important to consider that your data users may not be database experts.You mention that the cleaned and standardized (processed) data are copied into the WoSIS database and subsequently removed from the staging area. However, it's unclear what happens to the source raw data. Is it still accessible, or is only the processed data published? I would appreciate more information on this aspect, including whether there is any form of data versioning in place. Additionally, while discussing data versioning, please elaborate on how you document the data processing steps. For example, is there a script stored along with the data, and is it released alongside the data as well?
You mention that the data model was improved and new ETL procedures were developed, resulting in changes to the workflow. However, it's not clear why these changes were necessary. I suggest providing at least a brief explanation of why these changes were deemed necessary and how they contribute to enhancing WoSIS. It's important to illustrate how these changes benefit data users. For instance, what advantages do these developments offer? Additionally, how do the new ETL procedures differ from the previous ones? Given that various data integrity checks were already in place, what aspects are genuinely new, and why are they significant for users of the 2023 snapshot? Furthermore, how is the ETL procedure checklist managed? Is there a published document outlining this process? Providing more information and context for your readers would be beneficial.
Overall, I find this section to be overly technical, which may not be informative for data users unfamiliar with databases and data processing. Additionally, it could be enhanced by clearly highlighting the effective changes or improvements compared to the previous version. For instance, did the list of standardized soil properties remain the same, or were there any modifications? If so, why were these changes made? Are there any new data provisioning forms, or are the same channels as before still being utilized? Furthermore, I recommend providing URLs for the assets in the text to facilitate access to the data. Ensuring easy access to information is crucial for users.
You mention that all datasets shared with ISRIC are initially registered in the ISRIC Data Repository along with their metadata. However, it's unclear whether the metadata is open. This question arises from the explanation that third parties seeking access to "restricted" datasets would need to contact the source data provider. Accessing metadata would enable identification of restricted datasets and their owners. I recommend providing more information on this aspect, including the URL of the ISRIC Data Repository where one can find the source raw data (and possibly the metadata) shared with ISRIC.
Data screening, quality control and standardisation
Please specify if there are any changes in the consistency checks compared to the previous version.Please specify if there are any changes in the screening for duplicate profiles compared to the previous version.
Please describe how the list shown at the top of page 10 differs from the content of subsection '2.4 Soil properties standardised'.
You provide three measures for fitness-for-intended-use. How is this different from the previous snapshot?
Spatial distribution of soil profiles and number of observations
I think that you should present a map with the spatial distribution of samples of previous snapshot as well. This would give a better idea of the gains with the current snapshot.Citation: https://doi.org/10.5194/essd-2024-14-RC1 - AC1: 'Reply on RC1', Niels Batjes, 04 Jun 2024
-
RC2: 'Comment on essd-2024-14', Anonymous Referee #2, 05 Jul 2024
I have reviewed the ESSD paper entitled "Providing quality-assessed and standardised soil data to support global mapping and modelling (WoSIS snapshot 2023)".
The paper is well written and provides a comprehensive description and technical guidance on the available data. As it is a continuation of previous "snapshots", the documentation base is solid and it is nice to see the continuous improvement by Dr Batjes and his team in terms of quality assurance and expansion of the dataset. My comments are minor, also because reviewer 1 has already provided excellent feedback and the authors have responded to these issues (which I will not repeat unless my opinion on the comment or feedback differs). The large number of profiles added is impressive, but I think the paper could benefit from being clearer about some things and highlighting potential problems.
- Uncertainty assessment. I like the idea of the multi-stage uncertainty assessment in terms of location, time of sampling, methods. However, given that the intended data users here will often not be soil scientists (which is good, soil data should be used), I think it is important to explain more clearly why these limitations are important. In addition, clear warnings should be given to non-soil scientists about the incomplete coverage of certain land uses, and about soil data from the global south (certain sub-Saharan African or Arctic regions have not seen much improvement in data coverage since the last snapshot), or also information about which soil layers are covered (most data are probably still rich in topsoil data, but not in subsoil). If pedotransfer functions are used, it is important to know to what extent the profiles are genetically sound, or whether they are highly disturbed or not representative of a particular soil region. I know this is a lot to ask of the authors and you may disagree with what the purpose of this paper is, but I feel there is a risk that some soil profiles here may be interpreted as representative when we know they are clearly not, for the reasons and examples I have given above.
- Related to this point, the maps provided by the authors in response to a similar comment from reviewer 1 are a good first step towards more information about where we have soil data and where we don't (at least in this database). However, I think more clarity could be provided by providing some sort of meta-analysis on such clear limitations as which regions can be mapped from, or whether soil profiles are reasonably representative of a region. In my opinion, the authors need to make it clear whether it makes sense to colour an area on these snapshot maps at all when we know that we only have a handful of profiles and no systematic soil data from these regions (anything yellow on the maps provided is essentially no data). Again, these limitations may be clear to soil scientists, but are often overlooked when the data are used by other research communities who will have a strong interest in soil data from these regions (which should be encouraged if these limitations are understood).
- Similar to the tables for the total number and spatio-temporal variation of profiles in the database, I would find it useful to have more information available in terms of surface vs. subsurface data, land use, climate zone or soil type.
- Data inaccessibility: I find it quite shocking how much data is still not freely available, even from well-funded regions such as the EU, where essentially all soil data production is funded by taxpayers, no matter what the opinion of the individual data producer may be. This is not the fault of the authors, of course, but perhaps WoSIS needs to think about a mechanism to enforce true open access to all data (I have no idea what this would be, but if tens of thousands of profiles are not fully accessible, something is wrong with the system and against the spirit of open access).
- This is more a question of interest or something to consider for the future: How should the dataset be viewed given the growing discrepancy between the time of sampling and assessment of soil parameters and the reliability of the values for a modern user? As we know that soil properties change over time, does this mean that we need to 'phase out' certain parameters from profiles where we know that they may be significantly different today than they were decades ago? On a related note, Table 3 shows that more than a quarter of all profiles have no date. I think that's almost as bad as not knowing where these profiles are. Are these data points worth keeping at all, or will they cause confusion over time?
Citation: https://doi.org/10.5194/essd-2024-14-RC2 - AC2: 'Reply on RC2', Niels Batjes, 08 Jul 2024
Data sets
Standardised soil profile data for the world (WoSIS, December snapshot) [Dataset] Luis Calisto, Luis M. de Sousa, and Niels H. Batjes https://doi.org/10.17027/isric-wdcsoils-20231130
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
1,200 | 245 | 49 | 1,494 | 32 | 42 |
- HTML: 1,200
- PDF: 245
- XML: 49
- Total: 1,494
- BibTeX: 32
- EndNote: 42
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1