Austrian NIR Soil Spectral Library for Soil Health Assessments
Abstract. The rise in demand for soil data and information calls for quick and cost-effective methodologies to quantify soil properties. This is particularly important in the realm of restoring soil health in Europe. Near-infrared (NIR) spectroscopy has demonstrated the ability to predict specific soil properties with high accuracy whilst being less costly and time-consuming than traditional methods. To fill gaps in national spectroscopic soil data, we compiled the first Austrian NIR Soil Spectral Library (680–2500 nm) based on legacy samples (n = 2129), covering all environmental zones of Austria. We then applied partial least square regression modelling to test the usability of the dataset for soil health assessments at its current stage. Our analysis revealed that, at the present time, the Austrian NIR Soil Spectral Library is not suitable to predict most of the 14 soil properties with sufficient accuracy. Nevertheless, total nitrogen, CaCO3 organic carbon and clay showed satisfactory results (R2 > 0.7). Most importantly, the dataset containing sample meta-data (e.g., land use type, environmental zone or zip code), laboratory reference values and NIR spectra with 1 nm resolution can be used as a foundation for further spectral analysis and modelling. We make this work openly accessible to actively contribute to closing soil data gaps and promote the expansion of soil spectral libraries as a basis for soil health assessments.
General Comments
This manuscript presents a valuable and timely contribution to the field of soil science and digital soil mapping. The development of the first open-access Austrian NIR soil spectral library (SSL) fills a significant data gap and aligns perfectly with current European initiatives (e.g., EU Soil Mission, Soil Monitoring Law) that demand cost-effective tools for monitoring soil health. The study is well-structured, the methodology is sound and thoroughly described, and the data is made openly available, which is highly commendable. While the current predictive performance of the PLSR models for most properties is reported as insufficient for replacing routine lab analyses, the library itself represents a crucial foundational resource for the scientific community. The manuscript is therefore suitable for publication in Earth System Science Data after minor revisions to clarify certain aspects and strengthen the discussion.
Specific Comments
Abstract and Short Summary:
L13-15 (Short Summary): The statement "the accuracy was insufficient compared to routine laboratory analyses" is very general. Consider rephrasing to be more specific and balanced, e.g., "The accuracy for most properties was currently insufficient... though several key properties (TN, SOC, CaCO₃, clay) showed promising predictive potential (R2 > 0.7)."
L28-30 (Abstract): Similar to above. The phrase "is not suitable to predict most of the 14 soil properties with sufficient accuracy" could be tempered to "showed limited accuracy for predicting many of the 14 soil properties", followed immediately by the positive results for TN, etc.
Introduction:
L53-55: The sentence "Based on the increasing requirements... are in demand" is a bit awkward. Suggest rephrasing for clarity: "The increasing requirements for soil health assessments... are creating a demand for less cost-intensive alternative methods."
Soil sample selection:
L89-90: "For one sample, the location and environmental zone are unknown (Sample_number 743)." It is good practice to state how this sample was handled in the spatial analysis (e.g., was it excluded from Fig. 1?). Please clarify.
Figure 1: The figure is essential. Please ensure that in the final version, the map is of high resolution and the circle sizes for sample counts are clearly distinguishable in the legend.
Dataset creation and description:
L108-110: "Providing coordinates was not possible because the dataset includes samples sent in by private individuals..." This is a crucial point regarding data FAIRness (Findability). It is well-justified, but it should be explicitly mentioned in the "Data availability" section as a limitation of the dataset's interoperability.
L111-112: "Sampling depths are reported in columns 8 and 9". Please specify what these two columns represent (e.g., "upper depth" and "lower depth" or "min depth" and "max depth"?).
Chemical and physical reference analysis:
Table 1: The minimum and maximum values for silt content (5% and 75.7%, respectively, in Table 1) seem unusual given the range of the clay fraction. Could the authors please double-check these values for potential typographical errors?
L142-146: The paragraph explaining the SOC < 7% subset is critical for understanding the modelling choices. This rationale should be briefly restated in Section 5 ("Spectroscopic modelling") when the models are introduced, as it is key to interpreting the results in Table 2.
Spectral measurement and preprocessing:
L159: "The first forward derivative was applied to remove noise." Derivatives are typically used to enhance spectral features and remove baseline offsets; noise removal is usually achieved by smoothing. Please clarify the intended purpose here.
Spectroscopic modelling:
L175-178: The explanation for handling the "clay in suspension" validation set is clear and logical.
L179: Please specify the version of the prospectr package used for reproducibility.
L182: "UCal™ Chemometric Software". If this is a commercial software, please provide the company and location (e.g., Unity Scientific, MA, USA) for completeness.
Model performance & Figure 4:
L200-201: "clay analyzed in suspension had a small coefficient of determination (R2=0.58, SEP=2.71)". An R² of 0.58 is actually quite respectable for soil spectroscopy, especially for a physical property. Consider using a more neutral term like "moderate" instead of "small".
Figure 4: The plots are excellent and very informative. Please ensure all axis labels are clearly visible in the final version. The unit for Labile C is cut off in the provided preprint (mg kg⁻¹).
Usability of the Austrian NIR Soil Spectral Library:
L222-223: "the predictive quality is currently insufficient compared to routine laboratory analyses." This is a key conclusion. It would be helpful to provide a specific threshold or benchmark the authors have in mind for "sufficient" accuracy (e.g., RPD > 2, or a required SEP for practical application).
L230-235: The suggestions for improvement are excellent. To make this section even stronger, consider structuring it into a short, bulleted list or a separate paragraph titled "Recommendations for Future Work".
Data availability:
As mentioned above, please add a note here about the lack of precise coordinates due to privacy concerns, acknowledging this as a limitation for certain spatial applications.