the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
HyBEAR: A Hyperspectral Benchmark for Bare Soil Detection
Abstract. Detecting bare soil areas is an important step in the analysis of Earth observation data in a variety of Precision Agriculture (PA) applications focused on quantifying soil properties and assessing soil quality. In this paper, we introduce the HyBEAR benchmark – a novel large-scale collection of high-resolution hyperspectral aerial images (with 2 m ground sampling distance) accompanied with manual bare soil annotations verified with domain experts. Usually, the bare soil detection problem is tackled at the pixel level, meaning that detection methods classify all pixels as either bare soil or background. In contrast to this approach, we provide pixel-level annotations for the entire agricultural parcels (if the parcel is labeled as bare soil, then all pixels within that parcel are labeled accordingly), and aim to support the development of methods that identify entire fields with no vegetation. Commonly, such fields undergo further analysis to determine specific soil parameters and characteristics that are important while planning various PA activities, such as fertilization. The HyBEAR🐻 benchmark includes (i) the largest-to-date (108,064,591 pixels, corresponding to 43,225 hectares) and most heterogeneous dataset for bare soil detection, as well as (ii) the validation procedure (training-test splits and quality metrics) and a set of baseline results, obtained for a set of machine learning bare soil detection models. From the FULL collection of 1954 images in HyBEAR, which we divided into 5 spatially-disjoint folds, we additionally selected a random, stratified subset (MINI) of the images which may be useful for designing and verifying bare soil detection algorithms. Overall, HyBEAR is a step toward standardizing the way the community builds and confronts bare soil detection algorithms in a thorough, reproducible, and unbiased way.
- Preprint
(3932 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
- RC1: 'Comment on essd-2026-64', Anonymous Referee #1, 25 May 2026
-
RC2: 'Comment on essd-2026-64', Nataliia Kussul, 25 May 2026
General comments
The manuscript presents HyBEAR, a hyperspectral benchmark dataset for bare soil detection at the agricultural field scale. The topic is relevant for Earth system science data, precision agriculture, hyperspectral remote sensing, and machine learning applications. The paper clearly motivates the need for bare soil detection as a preprocessing step for soil property estimation and other precision agriculture workflows.
The dataset is potentially valuable for the community. It provides high-resolution airborne hyperspectral imagery, ground sampling distance of 2 m, 430 spectral bands, manually prepared and expert-verified bare soil annotations, predefined cross-validation folds, baseline machine learning results, and code/data availability through Zenodo. The size of the dataset is also substantial for this specific task.
I appreciate that the authors provide both FULL and MINI versions of the dataset. This is useful because the FULL dataset is relatively large, while the MINI version allows users to test workflows and reproduce some baseline results without immediately downloading and processing the full 96 GB dataset.
I also checked the data availability and basic usability of the supplementary materials. The DOI is valid, the data can be downloaded without an additional request, and the files can be opened both in QGIS and using the provided code. The MINI dataset and the provided trained models allow reproduction of the reported results for Logistic Regression and SVM, and the reproduced values are consistent with the reported tables. This is an important positive aspect of the manuscript.
However, I think the manuscript requires revision before publication. The main issues concern metadata completeness, reproducibility of the full baseline workflow, the limited spatial diversity of the dataset, the interpretation of the cross-validation protocol, and the mismatch between the stated “whole-field bare soil” objective and the primarily pixel-level baseline evaluation.
Specific comments
Data availability, metadata, and usability
The data are available through Zenodo and can be downloaded without restriction. The file structure is generally understandable, and the TIFF/GT files can be opened in standard geospatial software and through Python. This is a strong point of the paper.
However, the metadata description should be improved. The hyperspectral image files contain 430 bands, but the bands are not clearly named or documented in a user-friendly way. For a hyperspectral benchmark, the mapping between band index and wavelength is essential. At the moment, users need to inspect the code to understand which bands correspond to RGB visualization, for example. This creates an unnecessary barrier for reuse, especially for users who are not already familiar with this sensor configuration.
I recommend adding a separate metadata file, for example bands.csv or wavelengths.csv, with band index, central wavelength, spectral range, and sensor source if applicable. The manuscript should also clearly state whether this wavelength information is encoded inside the TIFF metadata or only provided externally.
Ground truth quality and annotation protocol
The description of the ground truth preparation is generally clear. The authors explain that the annotation process used RGB, CIR, and NDVI representations, followed by expert verification. The distinction between SOIL and MAYBE-SOIL during the annotation process is useful, and the conversion of ambiguous cases into final SOIL or NON-SOIL classes is reasonable.
Nevertheless, the annotation protocol should be described in more detail. Since this is a benchmark dataset, confidence in the ground truth is central. The authors should clarify whether several annotators independently labeled the same areas, whether inter-annotator agreement was assessed, and how disagreements or uncertain cases were resolved. It would also be useful to provide more explicit rules for sparse vegetation, crop residues, shadows, soil moisture effects, roads, field margins, and other ambiguous cases.
At present, the trust in the GT relies mainly on the statement that the annotations were manually checked and verified by domain experts. This is acceptable as a starting point, but for a benchmark dataset the quality control procedure should be more transparent.
Whole-field bare soil concept versus pixel-level evaluation
One of the most interesting aspects of the paper is the claim that HyBEAR supports bare soil detection at the level of entire agricultural fields rather than only isolated bare-soil pixels. This is important and relevant for precision agriculture workflows, where the decision is often made at the field or parcel level.
However, the baseline experiments are still mainly formulated as pixel-wise classification. The models operate on 430-band spectral vectors for individual pixels, and the reported metrics are also pixel-level metrics. This creates a methodological tension between the stated “whole-field bare soil” objective and the actual evaluation protocol.
I recommend that the authors clarify this point. If the dataset is intended to support whole-field bare soil detection, then field-level or parcel-level evaluation metrics should be added. For example, the authors could aggregate pixel predictions within each field and evaluate whether the field is correctly classified as bare soil or non-bare soil. This would make the benchmark more consistent with the stated objective.
At minimum, the manuscript should explicitly state that the current baselines are pixel-wise baselines and do not fully exploit the field-level nature of the annotations.
Spatial diversity and cross-validation protocol
The five-fold cross-validation protocol is useful and clearly described. The use of spatially disjoint folds is appropriate and helps reduce the risk of overly optimistic results due to spatial leakage.
However, the interpretation of this protocol should be more cautious. The dataset is large in terms of patches and pixels, but it comes from only two geographic locations in southern Poland, acquired on the same day and within a short time interval. Therefore, the dataset has limited spatial, seasonal, and agroecological diversity.
This limitation is visible in the results. Fold 0 behaves quite differently from Folds 1–4 for several models. In many cases, performance on Fold 0 is substantially lower, especially in terms of sensitivity and F1-score. This suggests that Fold 0 is a more challenging cross-location test case, whereas Folds 1–4 may mainly reflect variability within the second scene rather than fully independent geographic generalization.
I recommend that the authors tone down claims about robustness and generalization. The dataset is valuable, but it should be presented as a benchmark based on two spatial areas rather than as a broadly representative benchmark for different soil types, regions, seasons, and agricultural systems. A dedicated limitations paragraph would be very helpful.
No-data pixels and patch composition
During inspection of the dataset, I noticed that some patches contain a large proportion of no-data pixels. The manuscript states that no-data pixels are encoded as -9999 and can be filtered automatically. This is technically acceptable, but the effect of no-data pixels on the dataset statistics and evaluation should be described more clearly.
The authors should clarify whether no-data pixels are excluded from all reported metrics, including ACC, F1, IoU, MCC, and AUC. It would also be useful to provide statistics on the distribution of valid pixels per patch, or at least to indicate whether patches with a very high no-data fraction were retained without additional filtering.
Reproducibility of baseline results
The provided code and trained models allow reproduction of some reported results, at least for the MINI dataset and for Logistic Regression and SVM. This is a positive point.
However, it appears that the complete training code for the baseline models is not provided. If only model evaluation code and trained models are available, then the benchmark is only partially reproducible. For an ESSD data paper, and especially for a benchmark dataset, it would be preferable to provide the full training workflow: preprocessing, normalization, sampling strategy, handling of class imbalance, random seeds, model hyperparameters, fold handling, and model training scripts.
If the authors decide not to provide full training code, this limitation should be clearly stated. Otherwise, I recommend adding the missing training scripts to the Zenodo package or to a linked repository.
Baseline results and interpretation
The baseline results are useful. It is interesting that relatively simple linear models, especially Logistic Regression and SVM, perform very well and sometimes outperform more complex tree-based models. This suggests that spectral information alone is already highly informative for this task.
At the same time, the qualitative examples show that some patches remain difficult, especially in Fold 0. The manuscript would benefit from a deeper discussion of failure cases. It would be helpful to explain whether these errors are likely caused by spectral differences between P1 and P2, illumination changes, soil moisture, crop residues, shadows, boundary ambiguity, or annotation uncertainty.
Length and focus of the manuscript
The manuscript is generally well structured, but some parts of the introduction and literature review could be shortened. Since this is a data description article, the paper would benefit from a stronger focus on dataset construction, annotation quality, metadata, data access, reproducibility, limitations, and reuse scenarios.
Technical corrections
- All formulas in Section 2.5 should be checked and reformatted. The current rendering contains several notation errors: missing equality signs after metric names, missing or unclear fraction bars, missing arithmetic operators, and incomplete brackets/parentheses. Please revise the entire section to ensure that all metric formulas are mathematically correct and properly typeset.
- Please use consistent terminology for ROC, AUC, and ROC-AUC. In the tables, the column is labeled “ROC”, but it appears to refer to AUC or ROC-AUC.
- There is a typo: “othophotops”.
- Please add a clear table or metadata file with band indices and wavelengths.
- Please clarify whether the TIFF files contain wavelength metadata internally.
- Please provide the random seed and the exact stratification procedure used to select the MINI subset.
- Please explicitly state whether no-data pixels are excluded from all reported metrics.
- Please add the training scripts for the baseline models, or clearly state that only evaluation scripts and trained models are provided.
- In Figure 5, it would be useful to add a short explanation of why IMG_0022_F0 is a difficult case and what this example shows about cross-location generalization.
Recommendation
Overall, I consider HyBEAR a valuable and relevant dataset for ESSD. The data are accessible, the file structure is mostly clear, and part of the baseline results can be reproduced. However, before publication, the manuscript should be substantially revised to improve metadata completeness, clarify the annotation quality control procedure, provide the full training workflow or clearly state the limits of reproducibility, better address the limited spatial diversity of the dataset, and align the evaluation protocol more closely with the stated whole-field bare soil detection objective.
Citation: https://doi.org/10.5194/essd-2026-64-RC2
Data sets
HyBEAR 🐻 A. Wijata et al. https://zenodo.org/records/17607898
Model code and software
HyBEAR 🐻 A. Wijata et al. https://zenodo.org/records/17607898
Interactive computing environment
HyBEAR 🐻 A. Wijata et al. https://zenodo.org/records/17607898
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 302 | 81 | 18 | 401 | 18 | 13 |
- HTML: 302
- PDF: 81
- XML: 18
- Total: 401
- BibTeX: 18
- EndNote: 13
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
This paper provides a curated benchmark about hyperspectral images for bare soil recognition along with the cross-validation procedures, as well as a comparative of the different ML models (and their code) used in this task.
The paper is very well-written, it is clear that the authors have been working on this topic for a long time and revised the document carefully. The contribution is notable, as this kind of annotated benchmark is hard to find in open data repositories. The results shown in the paper are sound and the validation procedure is methodologically correct.
Still, there are room for some improvement and clarification as listed below:
Introduction section. Perhaps it could be a better idea to split this section in two, being the new one named “Background” or “Related Work” or similar. In this manner, it would be clearer the context and goal of the paper (in the introduction) and the state of the arte and the contribution (in the new section).
For the NO-DATA in the images, it should be clarified why there is the case that no data is captured. Is due to limitation on the device used? Or due to absence/noisy data?
Can be the cost of the hyperspectral data collection measured? It would be convenient to show this information in section 2.
In section 2.2, when describing the limitations of identfying different elements in Figure 2 (eg., dirt road, tree shadows), it would be very useful to circle or signaling these elements in the figure itself, to quickly spot the problem.
Regarding the results, I miss an explanation on the low performance of DT, Random Forest and AB models with Folder 0. Could it be due to the fact that this folder is the one with the minimum percentage of soil pixel ratio?
An explicit paragraph indicating the limitations of this work is needed, as well as a more clear description of specific application that can be benifited from this work.