the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Time-series of Landsat-based bi-monthly and annual spectral indices for continental Europe for 2000–2022
Abstract. The production and evaluation of the Analysis Ready and Cloud Optimized (ARCO) data cube for continental Europe (including Ukraine, the UK, and Turkey), derived from the Landsat Analysis Ready Data version 2 (ARD V2) produced by Global Land Analysis and Discovery team (GLAD) and covering the period from 2000 to 2022 is described. The data cube consists of 17TB of data at a 30–meter resolution and includes bimonthly, annual, and long-term spectral indices on various thematic topics, including: surface reflectance bands, Normalized Difference Vegetation Index (NDVI), Soil Adjusted Vegetation Index (SAVI), Fraction of Absorbed Photosynthetically Active Radiation (FAPAR), Normalized Difference Snow Index (NDSI), Normalized Difference Water Index (NDWI), Normalized Difference Tillage Index (NDTI), minimum Normalized Difference Tillage Index (minNDTI), Bare Soil Fraction (BSF), Number of Seasons (NOS), and Crop Duration Ratio (CDR). The data cube was developed with the intention of providing a comprehensive feature space for environmental modeling and soil, vegetation, and land cover mapping. To evaluate its effectiveness for this purpose, the quality of the produced time series was assessed by: (1) visual examination for artifacts and inconsistencies, (2) plausibility checks with ground survey data, and (3) predictive modeling tests, examples with soil organic carbon (SOC) and land cover (LC) classification. The results of visual examination indicate that the gap-filled product is complete and consistent, except for winter periods in northern latitudes and high-altitude areas where high cloud and snow density make gap-filling complex, and hence many artifacts remain. The plausibility results further show that the indices effectively help differentiate landscapes and crop types: the BSF index showed a strong negative correlation (-0.73) with crop coverage data, effectively detecting soil exposure. The minNDTI index had a moderate positive correlation (0.57) with the Eurostat tillage practices survey data, indicating valuable information on the intensity of the tillage. The detailed temporal resolution and long-term characteristics provided by different tiers of predictors in this data cube proved to be important for both soil organic carbon regression and LC classification experiments based on the 60,723 LUCAS observations: long-term characteristics (tier 4) were particularly valuable for predictive mapping of SOC and LC coming on the top of variable importance assessment. Crop-specific indices (NOS and CDR) provided limited value for the tested applications, possibly due to noise or insufficient quantification methods. The data cube is made available under a CC-BY license and will be continuously updated.
- Preprint
(24116 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on essd-2024-266', Anonymous Referee #1, 11 Oct 2024
Although there are many online platforms and data distribution systems for Landsat, I appreciate OpenGeoHub’s effort in producing their own version of Landsat analysis-ready data. All processing code and metadata availability observe open-access principles, and this benefits a lot the community in reusing their data. The inclusion of precalculated annual and long-term indices offers an opportunity for improved environmental modeling and mapping. Also, this is a dataset that will evolve in time with new mapped years and the potential inclusion or refinement of the current list of products. That said, I don’t have any major objection to its publication, but I think we would greatly benefit from further clarification, which can also improve the quality of the data and the paper.
- Why did the authors produce bimonthly products rather than monthly or even 16-day intervals? One of the greatest advantages of Landsat over Sentinel and other EO data is the long time-series archive. Good data can be fetched from 1984 since Landsat-5. As this is an evolving data cube, please consider producing at least a monthly time series, as this would allow better integration with other products like climate data. This also helps in capturing temporal changes in terrestrial ecosystems.
- Please clarify if the ARCO bimonthly bands represent a true measurement from a scene or a synthetic value based on statistics, like an average. If it is a synthetic value, do you expect it will reduce the capacity to assess fine spatiotemporal changes in the landscape, as the synthetic image mixes different pixels with distinct temporal and spatial indexes?
- Bare soil fraction: There are several papers indicating that NDVI alone cannot separate bare soil pixels well, as dry vegetation has a very similar spectral profile compared to soils. The classification must use a combination of indices and in some cases, land use masks. Please check the methods of Rogge et al. (2017), Diel et al. (2017), Safanelli et al. (2020), and Heiden et al. (2022).
Rogge, D., Bauer, A., Zeidler, J., Mueller, A., Esch, T., & Heiden, U. (2018). Building an exposed soil composite processor (SCMaP) for mapping spatial and temporal characteristics of soils with Landsat imagery (1984–2014). In Remote Sensing of Environment (Vol. 205, pp. 1–17). Elsevier BV. https://doi.org/10.1016/j.rse.2017.11.004.
Diek, S., Fornallaz, F., Schaepman, M. E., & De Jong, R. (2017). Barest Pixel Composite for Agricultural Areas Using Landsat Time Series. In Remote Sensing (Vol. 9, Issue 12, p. 1245). MDPI AG. https://doi.org/10.3390/rs9121245.
Safanelli, J. L., Chabrillat, S., Ben-Dor, E., & Demattê, J. A. M. (2020). Multispectral Models from Bare Soil Composites for Mapping Topsoil Properties over Europe. In Remote Sensing (Vol. 12, Issue 9, p. 1369). MDPI AG. https://doi.org/10.3390/rs12091369.
Heiden, U., d’Angelo, P., Schwind, P., Karlshöfer, P., Müller, R., Zepp, S., Wiesmeier, M., & Reinartz, P. (2022). Soil Reflectance Composites—Improved Thresholding and Performance Evaluation. In Remote Sensing (Vol. 14, Issue 18, p. 4526). MDPI AG. https://doi.org/10.3390/rs14184526.
Citation: https://doi.org/10.5194/essd-2024-266-RC1 -
AC1: 'Reply on RC1', Xuemeng Tian, 18 Oct 2024
Thank you for recognizing the efforts of OpenGeoHub in producing a unique version of Landsat analysis-ready data. We deeply appreciate your constructive suggestions. We agree that there is room for further development, and we aim to refine and expand the data cube to make it even more robust and adaptable for various environmental monitoring and modeling applications. We hope that the data cube will prove increasingly useful to others and that it will foster additional research and applications within and beyond our current projects.
RE to Q1
Thank you for this insightful suggestion. We fully recognize the benefits of achieving monthly granularity and it is indeed our goal. Our choice of a bimonthly resolution initially came from the challenge of gap filling, as finer resolutions like monthly or 16-day intervals tend to have larger gaps due to cloud cover and other data quality issues (this issue is discussed in detail in Consoli et al. 2024). By aggregating all available scenes over a longer period, such as two months in our case, before applying gap-filling techniques, we significantly reduce the spatial gaps, thereby simplifying the gap-filling process. Additionally, for our current environmental modeling scenarios, soil organic carbon prediction and land cover classification, a two-month resolution is adequate (see e.g. Tian et al. 2024).Our goal is to extend the temporal coverage of this data cube while achieving finer temporal resolution e.g. monthly even 16-day granularity. We are currently exploring more sophisticated gap-filling options to enable finer temporal resolutions without compromising data quality, including utilizing spatially close valid pixels and implementing tailored gap-filling strategies for different environmental strata. We aim to continuously update and refine this data cube to support a broader range of environmental monitoring and modeling applications within and beyond OpenGeoHub’s work. However, this could take months and requires significant resources; on the other hand, we believe that use of bimonthly or monthly temporal support does not affect the content of our article in the sense of methods applied, main results and data usability.
- Consoli, D., Parente, L., Simoes, R., Şahin, M., Tian, X., Witjes, M., ... & Hengl, T. (2024). A computational framework for processing time-series of Earth Observation data based on discrete convolution: global-scale historical Landsat cloud-free aggregates at 30 m spatial resolution. Submitted to PeerJ; https://dx.doi.org/10.21203/rs.3.rs-4465582/v2
- Tian, X., de Bruin, S., Simoes, R., Isik, M. S., Minarik, R., Ho, Y. F., ... & Hengl, T. (2024). Spatiotemporal prediction of soil organic carbon density for Europe (2000--2022) in 3D+ T based on Landsat-based spectral indices time-series. Submitted to PeerJ; https://doi.org/10.21203/rs.3.rs-5128244/v1
RE to Q2
This is a good point and we will add a clarification in the methods section. In our dataset, the ARCO bimonthly bands represent a synthetic value, specifically weighted average derived from several scenes (usually 6–7 scenes) available within and adjacent to the two-month period. The weights are assigned based on the clear sky faction of each tile, which is calculated from the number of available, non-cloudy pixels. This approach minimizes image gaps because even a single observed pixel during the period fills in the gaps; additionally, using tile-quality-based weights effectively reduces the impact of potential cloud contamination in available pixels. This ensures that our data remains based on actual observations as much as possible, not reconstructions from past data.However, in situations where more frequent observations are crucial, this approach could potentially limit our ability to detect rapid changes. To address this concern, as suggested by the reviewer in the previous question, we are actively working to refine our methodologies and migrate to monthly composites. This will enhance temporal resolution and allow our approach to be adapted to a broader range of scenarios.
RE to Q3
This is indeed a valuable point. In fact, several of the references mentioned, such as Rogge et al. (2017) and Diel et al. (2017), are already cited in our paper and have influenced our methodology. We will study and consider adding other references that you have listed above. Our use of the Bare Soil Fraction (BSF) is intended primarily as a proxy to indicate the general bareness of pixels, rather than to identify bare soil with high accuracy. The BSF is derived from NDVI time series analysis and serves as an initial approach. As discussed in our paper, we plan to incorporate additional data sources and indices in future analyses to more accurately quantify soil bareness, following the suggestions and methodologies recommended by the reviewer.Once again, thank you for your thorough review and for facilitating the advancement of our work. We look forward to integrating your valuable feedback into our dataset.
Citation: https://doi.org/10.5194/essd-2024-266-AC1
-
RC2: 'Comment on essd-2024-266', Anonymous Referee #2, 07 Nov 2024
The comment was uploaded in the form of a supplement: https://essd.copernicus.org/preprints/essd-2024-266/essd-2024-266-RC2-supplement.pdf
-
RC3: 'Comment on essd-2024-266', Anonymous Referee #3, 12 Nov 2024
General comments
The availability of analysable large geodata is of great importance for lowering the inhibition threshold for potential users and thus enabling informed and comprehensible decisions and political action. Against this background, the work represents an important contribution that I recommend for publication, apart from minor suggestions for changes (see below). Against the background of Data-Fitness-For-Use/Data-Fitness-For-Purpose assessment approaches (Lacagnina et al., 2022; Pôças et al., 2014; Wentz & Shimizu, 2018; Yang et al., 2013) and the associated trustworthiness of geodata (products) (Lokers et al., 2016), however, I would ask the authors also to inform how metadata or additional information can help potential users to assess the suitability of the datasets for further use.
Specific comments
- Page 2, Line 28: Spelling error: Copernicus Data Spac Ecosystem ⇒Copernicus Data Space Ecosystem
- Figure 1: Spelling error: Per-pixel count of available value ⇒Per-pixel count of available values
- Figure 2: To make the manuscript easier to read, I suggest placing the figure at the beginning of shapter 2.3 and briefly explaining the basic methodological process with reference to the corresponding subsections.
- Page 4, Line 20: Could you elaborate on your perception/definition of the term (spatial) “plausibility” and differentiate it from “accuracy/uncertainty”?
- Page 11, Line 20: Could you add a reference for “typical CRC values for each tillage type”?
- Page 12: Could you provide a kind of principle workflow for both modelling experiments?
- Section 2.4.4: You may consider deleting section 2.4.4 or integrating elements into the results section. For example, the explanation seems somewhat contrived “These visual representations complement the statistical analysis by highlighting spatial patterns that may”. In addition, the paragrapgh on page 12 and line 5 can be used as an introduction to the results section.
- Page 13, lines 13–26: Although in my view there is no need to list the formulas of the validation metrics (F1-score, CCC), references should at least be mentioned.
- Page 36, lines 22–34: It is not entirely clear to me why emphasis is placed on the supposed limitations of the Bare Soil Composite (BSC). In principle, BSC represent a filtered view of the Landsat and Sentinel-2 time series with a focus on agricultural areas in order to identify stable soil patterns. The “accusation” of regional applicability also does not reflect the complexity of digital soil mapping (DSM), as the transferability of DSM approaches depends on many factors such as the representativeness of soil samples, suitable explanatory variables, or DSM models that take into account the spatial variability of soil landscapes (e.g., Broeg et al., 2024). In this respect, BSF products face the same challenge. More relevant would be a discussion of differences in the generation of BSC and BSF products. This concerns, for example, approaches to temporal-dynamic filtering taking phenology into account (Zepp et al., 2023), which would be a nice feature of your products in the future.
- Page 38, lines 20-21: This result is in line with Zepp et al., 2023.
- Page38, section 4.3: Both use cases represent current topics. I would therefore welcome it if the discussion referred to a few relevant works.
- Page 39, lines 6ff Could you support your conclusions on the feature importance and selection together with
scientific references?
References
- Broeg, T., Don, A., Gocht, A., Scholten, T., Taghizadeh-Mehrjardi, R., & Erasmi, S. (2024). Using local ensemble models and Landsat bare soil composites for large-scale soil organic carbon maps in cropland. Geoderma, 444, 116850. https://doi.org/10.1016/j.geoderma.2024.116850
- Lacagnina, C., David, R., Nikiforova, A., Kuusniemi, M.-E., Cappiello, C., Biehlmaier, O., Wright, L., Schubert, C., Bertino, A., Thiemann, H., & Dennis, R. (2022). Towards a data quality framework for EOSC authorship community (tech. rep.). EOSC Association. https://doi.org/10.5281/zenodo.7515816
Lokers, R., Knapen, R., Janssen, S., van Randen, Y., & Jansen, J. (2016). Analysis of Big Data technologies for use in agro-environmental science. Environmental Modelling & Software, 84, 494–504. https://doi.org/10.1016/j.envsoft.2016.07.017 - Pôças, I., Gonçalves, J., Marcos, B., Alonso, J., Castro, P., & Honrado, J. P. (2014). Evaluating the fitness for use of spatial data sets to promote quality in ecological assessment and monitoring. International Journal of Geographical Information Science, 28(11), 2356–2371. https://doi.org/10.1080/13658816.2014.924627
- Wentz, E. A., & Shimizu, M. (2018). Measuring spatial data fitness-for-use through multiple criteria decision making. Annals of the American Association of Geographers, 108(4),1150–1167. https://doi.org/10.1080/24694452.2017.1411246
- Yang, X., Blower, J. D., Bastin, L., Lush, V., Zabala, A., Masó, J., Cornford, D., Díaz, P., & Lumsden, J. (2013). An integrated view of data quality in Earth observation. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 371(1983), 20120072. https://doi.org/10.1098/rsta.2012.0072
- Zepp, S., Heiden, U., Bachmann, M., Möller, M., Wiesmeier, M., & VanWesemael, B. (2023). Optimized bare soil compositing for soil organic carbon prediction of topsoil croplands in Bavaria using Landsat. ISPRS Journal of Photogrammetry and Remote Sensing, 202, 287–302. https://doi.org/10.1016/j.isprsjprs.2023.06.003
Citation: https://doi.org/10.5194/essd-2024-266-RC3
Data sets
Landsat-based Spectral Indices for pan-EU 2000-2022 Xuemeng Tian, Davide Consoli, Leandro Parente, Yu-Feng Ho, and Tomislav Hengl https://doi.org/10.5281/zenodo.10776891
Model code and software
AI4SoilHealth/SoilHealthDataCube: v20240726-1 Xuemeng Tian, Davide Consoli, Martijn Witjes, Leandro Parente, and Yu-Feng Ho https://doi.org/10.5281/zenodo.12907281
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
428 | 102 | 15 | 545 | 6 | 8 |
- HTML: 428
- PDF: 102
- XML: 15
- Total: 545
- BibTeX: 6
- EndNote: 8
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1