the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
GSSM-10 (Global 10-m Surface Soil Moisture) Derived from Multi-Sensor Data and Ensemble Learning
Abstract. Satellite-driven soil moisture monitoring systems currently available fail to meet the spatial resolution requirement for a wide range of applications. This limitation is particularly critical for agricultural water management, assessing risks associated with extreme events, and hydrological modeling. This work aims to address the spatial limitations of satellite soil moisture remote sensing by developing GSSM-10, a global 10-meter resolution surface soil moisture dataset, using multi-sensor datasets integrated within an ensemble machine learning framework. These datasets encompass diverse data types—active microwave, multispectral, thermal infrared, and land elevation—offering a robust and comprehensive approach to estimating surface soil moisture (SSM). The ensemble model incorporates TabNet, Random Forest (RF), and Extreme Gradient Boosting (XGBoost). The model was trained on ground-truth data collected from the International Soil Moisture Network (ISMN). The ensemble model demonstrated robust performance, achieving an R² of 0.8344, a bias of –0.0001, an RMSE of 0.0433 m³/m³, and an ubRMSE of 0.0433 m³/m³ in 5-fold cross-validation. When evaluated on a held-out test set, the model achieved similar levels of accuracy, with an R² of 0.8591, a bias of –0.0002 m³/m³, and an RMSE/ubRMSE of 0.0401 m³/m³. An interactive web platform has been developed for data access, visualization, and download, enabling broad adoption by researchers, practitioners, and policymakers. By providing globally consistent, high-resolution SM estimates, GSSM-10 represents a significant advancement in satellite-based soil moisture monitoring for environmental and agricultural applications.
- Preprint
(2428 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 22 Nov 2025)
-
RC1: 'Comment on essd-2025-511', Anonymous Referee #1, 16 Oct 2025
reply
-
AC1: 'Reply on RC1', Nuo Xu, 17 Oct 2025
reply
We sincerely appreciate the reviewer’s insightful comments. The global 10-m soil moisture dataset we developed is indeed too large to be presented as a single figure within the manuscript. Moreover, our product is generated dynamically — wherever valid satellite observations exist for a given date, soil moisture can be estimated at that location. However, since satellite coverage and cloud conditions vary, it is not possible to produce a truly global map for a single date. Therefore, we designed a real-time interactive tool that allows users to click any location on a map, choose a specific date, and instantly generate the corresponding surface soil moisture map.
Regarding the online platform, due to funding limitations, we have not yet been able to purchase cloud server resources. Given the extended peer-review process, we have temporarily released the complete website source code to ensure transparency and reproducibility. Once the cloud server is acquired, we will immediately deploy the platform online and make it publicly accessible. We expect to complete this deployment within next month (November).
We also thank the reviewer for the helpful suggestions regarding the manuscript text:
- We will add explicit information on the temporal coverage and spatial resolution of the dataset in the Abstract.
- We will include a brief mention of SAR-based soil moisture retrieval methods in Page 2, lines 15–40 to provide better context on high-resolution approaches.
- We will correct the spacing issue noted at Page 2, line 35.
All these textual revisions will be incorporated in the next revision of the manuscript.
Citation: https://doi.org/10.5194/essd-2025-511-AC1
-
AC1: 'Reply on RC1', Nuo Xu, 17 Oct 2025
reply
-
CC1: 'Comment on essd-2025-511', Yi Yu, 21 Oct 2025
reply
Dear Xu et al.,
I am a postdoc working for ecohydrology and remote sensing. I have been pretty interested in this work - please bear my bothering. However, I believe I have the same concern with R1 - I cannot see a "true" global SSM map as promised in this work. The Zenodo link just gives the code and the trained model. I acknowledge your reply to R1 sounds plausible - it is almost impossible to provide such 10 m global SSM map (at long-term) due to the unrealisticly big data volume. However then, should this work be published as a "method paper" rather than a "data paper"? To me, I feel like your proposed online platform could be a promising workflow that used by many global users, like many researchers would highlight their method as an executable workflow through GEE or some accessible platforms.
Nonetheless, I look forward to your proposed platform to be deployed in Nov 2025, which could be valuable contribution to the soil moisture community. Best of luck.
Citation: https://doi.org/10.5194/essd-2025-511-CC1 -
AC2: 'Reply on CC1', Nuo Xu, 30 Oct 2025
reply
We sincerely thank you for your encouraging comments. You have raised a point regarding the distinction between a data paper and a method paper. Our primary goal is to publish a data paper, because the presented product, the 10 m resolution surface soil moisture dataset, is a fully trained and operationally ready data product. The current version enables global-scale generation of soil moisture maps at any desired location and date where valid satellite observations are available.
We truly appreciate your recognition of the platform’s potential and share your enthusiasm for its usefulness to the soil moisture and ecohydrology community. As mentioned, we expect to deploy the online platform in November 2025, and it will be freely accessible for researchers worldwide.
Citation: https://doi.org/10.5194/essd-2025-511-AC2
-
AC2: 'Reply on CC1', Nuo Xu, 30 Oct 2025
reply
-
RC2: 'Comment on essd-2025-511', Anonymous Referee #2, 24 Oct 2025
reply
OVERVIEW
The paper describes a machine learning algorithm to obtain soil moisture from remote sensing and elevation data at high resolution.
GENERAL COMMENTS
The paper is fairly well written and not clear in some parts. The topic is undoubtedly relevant to the readership of the ESD. The proposed algorithm appears to be a promising way of obtaining high-resolution soil moisture data, but I emphasise the word 'promising'. Indeed, I have several major issues with the paper (I have reported the relevance of the comments).
- MAJOR: The paper has two major issues with the title. Firstly, it does not show a global soil moisture dataset, and secondly, it does not show a dataset with a resolution of 10 m. The first issue has already been raised by previous reviewers and I believe the term 'global' should be removed from the title and text. The second point is, to me, even more scientifically relevant. A recent paper discussing this issue can be found here: Brocca et al. (2024, https://doi.org/10.1016/j.scitotenv.2024.174087). Providing soil moisture data at a resolution of 10 m does not necessarily mean that the actual resolution of the data is 10 m; this must be tested, but this has not been done in the paper. Figure 5 shows some high-resolution images, but there is no evidence that the obtained soil moisture data are representative of a spatial resolution of 10 m.
Therefore, in my opinion, the title of the paper must be changed.
- MAJOR: In relation to the point regarding the actual spatial resolution of the data, the actual resolution of some of the datasets mentioned in the introduction is not 1 km, as shown in Brocca et al. (2024). Similarly, the 30-m resolution of the Vergopolan et al. (2021) dataset is incorrect. These aspects should be described more critically in order to provide the readership with a scientifically reliable assessment of the currently available soil moisture datasets and their actual spatial and temporal resolution.
- MODERATE: The reason for considering the water cloud model is unclear to me. I would expect the deep learning techniques to identify the relationship between Sentinel1 sigma zero data and soil moisture themselves. Did the authors try to avoid using the water cloud model? What were the results?
- MODERATE: Figure 4 is hardly readable and it should be improved. Moreover, I didn’t expect to find “elevation” and “location” features so high in the feature importance analysis. I believe the authors should show also time series of the retrieves and observed soil moisture data to really understand the quality of the proposed algorithm.
- MAJOR: In the comparison with SMAP-HydroBlocks it is unclear why the R^2 performance in the original paper of SMAP-HydroBlocks soil moisture is around 0.5, whereas here the performance in terms of R^2 is -0.4253. How is it possible to have negative R^2 values? This analysis should be revised and better described.
- MAJOR: In the comparison with S^2MP the sample size is N=14. Why? Where the data come from? For which location? This analysis with only 14 data points is not robust, therefore it should be removed.
- MAJOR: Section 4 (the application section) should be revised significantly. The 'After-fire assessment' section simply shows two soil moisture maps, one taken before and one taken after a wildfire. What evidence is there to suggest that these maps are accurate? The same applies to the other two analyses: agriculture and flood monitoring. For agriculture, an area where irrigation occurs should be shown alongside actual irrigation data in order to make a valid comparison. In the flood section, it simply shows that, after the heavy rainfall in May, the soil is wetter than it is two months later, after a dry period. I would comment, 'Let's hope.' This analysis, along with its applications, could be crucial in accurately assessing the spatial and temporal resolution of the data. However, actual data on irrigation, flooding or wildfire areas should be considered (see again the examples in Brocca et al., 2024).
As a very positive note, the availability of the source code in Zenodo and GitHub is very important and highly appreciated.
RECOMMENDATION
Based on the above comments, the paper requires significant revision before it can be considered for publication. Its focus should be changed to make it more scientifically sound.
Citation: https://doi.org/10.5194/essd-2025-511-RC2 -
AC3: 'Reply on RC2', Nuo Xu, 30 Oct 2025
reply
MAJOR: The paper has two major issues with the title. Firstly, it does not show a global soil moisture dataset, and secondly, it does not show a dataset with a resolution of 10 m. The first issue has already been raised by previous reviewers and I believe the term 'global' should be removed from the title and text. The second point is, to me, even more scientifically relevant. A recent paper discussing this issue can be found here: Brocca et al. (2024, https://doi.org/10.1016/j.scitotenv.2024.174087). Providing soil moisture data at a resolution of 10 m does not necessarily mean that the actual resolution of the data is 10 m; this must be tested, but this has not been done in the paper. Figure 5 shows some high-resolution images, but there is no evidence that the obtained soil moisture data are representative of a spatial resolution of 10 m.
Therefore, in my opinion, the title of the paper must be changed.
Response: We acknowledge the reviewers’ comments regarding the “global SSM” terminology for this paper and in fact, we are not providing 10m resolution global maps for SSM. However, let me elaborate on the challenges that we had and that has prohibited us from generating global maps similar to the products provided by SMAP and others. First when working at 10m resolution with multiple satellite data and with machine and deep learning models, you could imagine the computational load of this exercise. Beside, loading a global 10m resolution map in any type of GIS or map visualization model would not be possible with normal computers. The high spatial scale we adopted (10m resolution) for this work, provided us a clear advantage compared to existing models due mainly to the point measurement nature of the in-situ soil moisture sensors that most SSM models rely on for calibration and validation. As such, we managed to minimize the spatial variability within single pixel boundaries resulting better performance. This unfortunately has limited us from generating large maps due to computational and visualization constraints. Adding to the fact that many of the satellite images are collected at different time period and hence producing global map from high resolution satellite images with low temporal resolution would not make any sense as the map produced will not be temporally consistent. In the manuscript we stressed on the usefulness of such high resolution model in small scale application such as in-field soil moisture variability for precision agriculture application and to other relatively large extent such as irrigation management at district level wildfire, flood and landslide management.
We also fully agree with the reviewer comment and with Brocca et al.. This is a common challenge with satellite remote sensing models relying on point or local ground measurements. Thanks for raising this issue, we will ensure to include such comment in our discussion and methodological limitations.
MAJOR: In relation to the point regarding the actual spatial resolution of the data, the actual resolution of some of the datasets mentioned in the introduction is not 1 km, as shown in Brocca et al. (2024). Similarly, the 30-m resolution of the Vergopolan et al. (2021) dataset is incorrect. These aspects should be described more critically in order to provide the readership with a scientifically reliable assessment of the currently available soil moisture datasets and their actual spatial and temporal resolution.
Response: We thank the reviewer for raising this important and insightful point. We fully agree that the nominal grid spacing of a dataset does not necessarily represent its actual (effective) spatial resolution, as discussed in Brocca et al. (2024).
In our study, the 10 m sampling resolution corresponds to the native spatial resolution of the Sentinel-1 SAR (VV, VH, incidence angle) and Sentinel-2 optical (B1–B12, NDVI, NDMI) inputs used in the model (see Table 2). These datasets directly provide physical measurements at 10 m scale rather than being downscaled from coarser products. To maintain a uniform spatial grid across all input features, the Landsat-8/9 and DEM (ALOS DSM / SRTM) datasets, originally at 30 m resolution, were resampled to 10 m. This harmonization ensures feature consistency but does not alter the intrinsic resolution of the original data.
We acknowledge that the effective spatial resolution of the retrieved soil moisture may differ from the nominal 10 m, depending on land-surface heterogeneity, vegetation cover, and radar sensitivity. We will explicitly discuss this distinction in the revised manuscript and cite Brocca et al. (2024), emphasizing that while the GSSM-10 dataset provides 10 m sampling resolution, its effective spatial representativeness should be interpreted in the context of sensor physics and spatial autocorrelation of soil and vegetation properties.
We also agree that the “30-m” label for SMAP-HydroBlocks refers to the gridded sampling rather than a demonstrated effective spatial resolution at 30 m. In the revised introduction, we will describe SMAP-HB as a 30-m gridded product downscaled from coarse-footprint SMAP observations using HydroBlocks.
MODERATE: The reason for considering the water cloud model is unclear to me. I would expect the deep learning techniques to identify the relationship between Sentinel1 sigma zero data and soil moisture themselves. Did the authors try to avoid using the water cloud model? What were the results?
Response: We thank the reviewer for raising this important point. The inclusion of the Water Cloud Model (WCM) in our approach follows the same rationale as in the S²MP algorithm (El Hajj et al., 2017; Lozac’h et al., 2020). In both cases, the WCM is not used as a separate retrieval model but rather as a physical constraint that accounts for vegetation attenuation and separates the soil and vegetation contributions to the radar backscatter. In S²MP, a synthetic database is first generated using a parameterized WCM coupled with the modified Integral Equation Model (IEM), and this database is then used to train the neural network that estimates soil moisture from Sentinel-1 backscatter, incidence angle, and NDVI inputs.
Similarly, in our framework, the WCM correction helps reduce vegetation-related scattering effects and ensures that the machine learning model learns relationships grounded in the underlying radar–soil–vegetation physics rather than purely statistical correlations. This hybrid design (WCM + ML) improves model generalization across different land-cover conditions and observation dates, as demonstrated in previous S²MP studies and confirmed in our experiments.
MODERATE: Figure 4 is hardly readable and it should be improved. Moreover, I didn’t expect to find “elevation” and “location” features so high in the feature importance analysis. I believe the authors should show also time series of the retrieves and observed soil moisture data to really understand the quality of the proposed algorithm.
Response: We thank the reviewer for this valuable suggestion. In the revised manuscript, we will include representative time-series plots comparing the retrieved soil moisture from our model with the in-situ observations from selected ISMN stations. These plots will illustrate how the model captures temporal variations in soil moisture.
MAJOR: In the comparison with SMAP-HydroBlocks it is unclear why the R^2 performance in the original paper of SMAP-HydroBlocks soil moisture is around 0.5, whereas here the performance in terms of R^2 is -0.4253. How is it possible to have negative R^2 values? This analysis should be revised and better described.
Response: We thank the reviewer for carefully noting this point. The negative R² value results from the use of the scikit-learn r2_score function in Python, which can yield negative values when the predictive relationship is weaker than a simple mean-based prediction. This does not indicate a computational issue, but rather reflects differences in data resolution, scale, and representativeness between the two datasets at the validation sites.
In our comparison, SMAP-HydroBlocks (SMAP-HB) was evaluated using the same ISMN locations employed for validating GSSM-10. These sites may differ from the original SMAP-HB validation network, which can explain the lower correlation observed here. Our intention is not to question the quality of SMAP-HB, which remains a well-established and physically based product, but to demonstrate that local-scale correspondence can vary when comparing a 10 m sampling product (GSSM-10) with one derived from coarser-scale SMAP observations.
To ensure transparency, we attached an additional table listing the time, location, ISMN observations, GSSM-10 estimates, and SMAP-HB values used in this comparison. In the supplementary table, each ISMN record is paired with the corresponding GSSM-10 and SMAP-HB estimates. For SMAP-HB, multiple soil moisture values may appear for the same date and location because the product consists of seven ensemble members (sample1–sample7), each representing a distinct model realization. To ensure consistency in the comparison, we computed the ensemble mean across these members and used this averaged value in all statistical analyses. We will also revise the manuscript to clearly describe the comparison setup and emphasize that our analysis highlights differences in scale and spatial representativeness, rather than questioning the validity of SMAP-HB itself.
MAJOR: In the comparison with S^2MP the sample size is N=14. Why? Where the data come from? For which location? This analysis with only 14 data points is not robust, therefore it should be removed.
Response: We thank the reviewer for this comment. The sample size (N = 14) in the comparison with S²MP is small because the S²MP product covers only limited regions. To ensure a fair comparison, we selected only sites where S²MP data, ISMN in-situ measurements, and our 10 m soil moisture estimates were all available for the same dates and locations. These overlap areas are very limited, and the 14 samples represent the maximum number of valid coincidences we could obtain. We acknowledge that this sample size is small, and we will clarify this limitation in the revised manuscript, emphasizing that this analysis serves as a preliminary consistency check rather than a robust statistical evaluation.
MAJOR: Section 4 (the application section) should be revised significantly. The 'After-fire assessment' section simply shows two soil moisture maps, one taken before and one taken after a wildfire. What evidence is there to suggest that these maps are accurate? The same applies to the other two analyses: agriculture and flood monitoring. For agriculture, an area where irrigation occurs should be shown alongside actual irrigation data in order to make a valid comparison. In the flood section, it simply shows that, after the heavy rainfall in May, the soil is wetter than it is two months later, after a dry period. I would comment, 'Let's hope.' This analysis, along with its applications, could be crucial in accurately assessing the spatial and temporal resolution of the data. However, actual data on irrigation, flooding or wildfire areas should be considered (see again the examples in Brocca et al., 2024).
Response:
We thank the reviewer for this valuable comment. We agree that the current presentation of Section 4 could be misinterpreted as a validation analysis, while our intention was to provide illustrative examples demonstrating potential applications and visualization of the proposed global 10 m soil moisture dataset under different contexts.
In the revised manuscript, we will clarify this purpose explicitly at the beginning of Section 4 and emphasize that these examples are intended to showcase possible use cases, rather than to quantitatively validate the dataset. We will slightly expand the discussion to describe how these examples reflect the dataset’s ability to capture spatial and temporal changes in soil moisture related to wildfires, irrigation, and flooding. We believe this clarification will better align the section with the objectives of a data paper and the scope of ESSD.
-
RC4: 'Reply on AC3 - Additional point', Anonymous Referee #2, 30 Oct 2025
reply
In view of having an interactive discussion, I would like to raise a few points.
- The authors replied that it is not possible to obtain a global product due to computational issues. Therefore, the term 'global' should be removed from the title.
- It is the actual resolution of the data that matters, not the sampling. S1 and S2 have a nominal resolution of around 10 m, but it is well known that S1 is highly noisy at this resolution, and that S2 has limited correlation with soil moisture. Therefore, the actual resolution of the dataset should be investigated. If it is much larger (as I expect), the envisaged applications might not be possible.
- How dense are the time series in terms of time? A time series plot should be provided for comparison with SMAP-HB and S^2MP products.
Citation: https://doi.org/10.5194/essd-2025-511-RC4
-
RC4: 'Reply on AC3 - Additional point', Anonymous Referee #2, 30 Oct 2025
reply
- MAJOR: The paper has two major issues with the title. Firstly, it does not show a global soil moisture dataset, and secondly, it does not show a dataset with a resolution of 10 m. The first issue has already been raised by previous reviewers and I believe the term 'global' should be removed from the title and text. The second point is, to me, even more scientifically relevant. A recent paper discussing this issue can be found here: Brocca et al. (2024, https://doi.org/10.1016/j.scitotenv.2024.174087). Providing soil moisture data at a resolution of 10 m does not necessarily mean that the actual resolution of the data is 10 m; this must be tested, but this has not been done in the paper. Figure 5 shows some high-resolution images, but there is no evidence that the obtained soil moisture data are representative of a spatial resolution of 10 m.
-
RC3: 'Comment on essd-2025-511', Anonymous Referee #3, 27 Oct 2025
reply
This is a very well written paper describing the application of three different machine learning (ML) models to estimate soil moisture from multiple satellite data sets, a digital terrain model and geographic coordinates. Unfortunately, the scientific value in this work is not clear. Essentially, it is an exercise in fitting ML models to available in situ data from the International Soil Moisture Network (ISMN). Given that the design of the experiment leads to an overfitting of the models, the obtained statistical metrics appear impressive at first sight. However, the chosen method and obtained results do not make much physical sense.
Here are my main comments:
- The authors used randomly selected 80% of the ISMN data for training, and 20% for testing. If I interpret this correctly than this means that every single ISMN station went into model training, making it an almost trivial exercise to estimate in the remaining 20% of the data from any station. While one may argue that such an approach is common practice in some ML fields, this is certainly not the case in the field of remote sensing of soil moisture. Many of the available satellite based soil moisture data sets are even derived without any external soil moisture data sets, focusing on exploiting the physical information content of the data.
- Feature importance is almost exactly opposite to what one would expect based on physical considerations. Backscatter should be more important than vegetation, which in turn should be more important than elevation.
- Including geographic coordinates in the feature space allowed the models to fine tune the absolute SM values to the ISMN stations. While I can understand this was necessary to improve the metrics, what can be learned from this? At least, soil maps, climate data and/or land cover data sets should have been used as features for describing static spatial patterns.
- As already pointed out by the other reviewers, if the global 10m soil moisture data set is announced in the title and abstract then such a data set should also be readily accessible.
- A key argument of the study is that a high spatial resolution is of key importance in applications. However, even more important is a high temporal sampling to capture the highly dynamic surface soil moisture field.
- The authors do not explain if all features need to be available in order to estimate soil moisture. If this is the case then the retrieval would be limited to cloud-free conditions. If this is not the case then it is hard to imagine that the resulting time series are consistent (which is a major claim made).
- It is not explained where the parameterization shown in Equations 7 to 10 come from? Note that all parameters (Avv, Bvv, Avh, and Bvh) experience a discontinuity at NDVI = 0.8. This must lead to inconsistencies in space and time, again differently as claimed by the authors.
- It is hard to believe that the GSSM-10 data are so much better than the SMAP-HydroBlocks data sets that rests both on SMAP and a powerful land surface model.
- In all applications discussed in Section 4, vegetation seems to be the main driver in observed changes in the derived soil moisture maps (e.g. in the use case for the fire in the Los Angeles country). While in some instances soil moisture and vegetation dynamics may indeed be well correlated, this is not universally the case. For example, after a fire, soil moisture may even go up.
- It is important to distinguish between spatial resolution and spatial sampling. E.g., SMAP data may be sampled at 9 km, but the actual resolution may be closer to the original resolution of the SMAP brightness temperature measurements (about 36 km).
Minor comments
3rd paragraph in introduction: So far there is little evidence for most 1km soil moisture data sets that they add significantly value over existing coarser resolution soil moisture data sets. Therefore, there is yet no basis to write in such generic terms that “These high-resolution global datasets represent significant progress in capturing soil moisture at much finer scales than earlier global products.”
Table 2: What is the effective temporal sampling of Sentinel-2 and Landsat (considering cloud cover)?
Page 6, line 15: Explain the term “dynamic”.
Page 7, lines 22ff: It is not true that “conventional methods” use static WCM parameters. Most studies also use LAI and NDVI to describe dynamic changes in the vegetation cover.
Section 3.3.1: S2MP is a scientific algorithm that is per se not “limited in geographic scope” to those areas mentioned by the authors. Therefore, at a fundamental level, there is no basis to write that “This extensive spatial and temporal coverage makes GSSM-10 more suitable for operational applications in regions where in situ data are sparse and where S2MP is unavailable, thus offering broader utility for global soil moisture monitoring and large-scale environmental assessments.”
Citation: https://doi.org/10.5194/essd-2025-511-RC3 -
AC4: 'Reply on RC3', Nuo Xu, 30 Oct 2025
reply
- The authors used randomly selected 80% of the ISMN data for training, and 20% for testing. If I interpret this correctly than this means that every single ISMN station went into model training, making it an almost trivial exercise to estimate in the remaining 20% of the data from any station. While one may argue that such an approach is common practice in some ML fields, this is certainly not the case in the field of remote sensing of soil moisture. Many of the available satellite based soil moisture data sets are even derived without any external soil moisture data sets, focusing on exploiting the physical information content of the data.
Response:
We sincerely thank the reviewer for this very important comment. We fully agree that using a purely random train–test split could lead to spatial overlap between samples from the same ISMN stations, potentially resulting in overoptimistic accuracy estimates. To address this, we have reorganized the dataset using a station-based split, ensuring that no station appears in both training and testing sets.
Our original dataset included 699 ISMN stations, and after data quality screening and matching with valid satellite observations, 269 stations remained. Among these, 215 stations (150,338 samples) were used for training and 54 stations (32,826 samples) were reserved exclusively for testing.
The new station-independent evaluation confirms the robustness of our models even on completely unseen locations:
- TabNet: R² = 0.8441, RMSE = 0.0423 m³/m³
- Random Forest: R² = 0.8500, RMSE = 0.0415 m³/m³
- XGBoost: R² = 0.6747, RMSE = 0.0611 m³/m³
- Ensemble: R² = 0.8567, RMSE = 0.0406 m³/m³
These are preliminary results obtained without any additional fine-tuning, yet they remain consistent with the earlier random-split evaluation. This demonstrates that the models are not overfitted and possess strong generalization capability across completely independent ISMN stations. We will include this updated analysis and clarify the station-based validation procedure in the revised manuscript.
- Feature importance is almost exactly opposite to what one would expect based on physical considerations. Backscatter should be more important than vegetation, which in turn should be more important than elevation.
Response:
We thank the reviewer for this insightful comment. We also initially expected backscatter (VV, VH) to be the dominant predictors based on physical understanding, as they are directly sensitive to surface roughness and moisture. However, the feature importance results from our models consistently indicated higher relative contributions from variables such as elevation and geographic coordinates.
This does not necessarily contradict physical reasoning but rather reflects the data-driven nature of ensemble learning models. Elevation and location variables capture large-scale climatic and soil-type gradients that strongly influence the background soil moisture regime, while backscatter and vegetation indices describe short-term variations. Because the ISMN stations are spatially distributed across diverse climatic zones, the model uses elevation and location features to represent broader environmental context, whereas Sentinel-1 backscatter contributes to finer temporal variability.
We will clarify this interpretation in the revised manuscript and explicitly discuss the complementary roles of physical (backscatter-based) and contextual (topographic and geographic) predictors in explaining soil moisture variability.
- Including geographic coordinates in the feature space allowed the models to fine tune the absolute SM values to the ISMN stations. While I can understand this was necessary to improve the metrics, what can be learned from this? At least, soil maps, climate data and/or land cover data sets should have been used as features for describing static spatial patterns.
Response: We thank the reviewer for this valuable suggestion. Climate conditions are already partially captured through geographic coordinates and elevation, which ranked highly in the feature importance analysis and implicitly represent large-scale climatic gradients. Regarding land cover, since it varies substantially over time—especially across agricultural regions—we preferred to use vegetation indicators (such as NDVI, NDMI, and spectral bands) that dynamically reflect seasonal vegetation changes, rather than static land-cover maps.
We agree that incorporating soil texture is an excellent suggestion. In the next revision, we will include soil texture (from the SoilGrids dataset), along with climate zone and land cover classes as additional static features to improve the representation of environmental and edaphic conditions. If their resolution and consistency prove sufficient, these datasets will be integrated into the final model; otherwise, we will discuss their limitations and potential influence on model performance in the revised manuscript.
- As already pointed out by the other reviewers, if the global 10m soil moisture data set is announced in the title and abstract then such a data set should also be readily accessible.
Response:
We fully agree that a global 10 m soil moisture dataset should be readily accessible. As explained in our earlier responses, the complete dataset cannot be stored or displayed as a single global file due to its massive data volume and dynamic generation process. Instead, we developed an interactive online platform that allows users to generate and download soil moisture maps for any location and date where valid satellite data are available. The full website code is shared on Zenodo and Github for transparency, and the live platform will be publicly available once cloud resources are secured (expected November 2025).
We acknowledge the reviewers’ comments regarding the term “global SSM.” In this study, we do not provide static, wall-to-wall global maps comparable to SMAP, but rather a globally applicable framework that can produce 10 m soil moisture estimates wherever satellite inputs exist. Generating global mosaics at 10 m resolution is computationally prohibitive, as multi-sensor fusion with machine and deep learning involves immense data volumes and asynchronous acquisitions, making temporally consistent global mapping impractical.
The chosen 10 m resolution provides a key advantage for local applications, better matching point-scale in-situ sensors and minimizing sub-pixel variability, but it also limits large-scale map generation. We will clarify these points in the revised manuscript and emphasize that GSSM-10 is primarily intended for high-resolution, regional to field applications such as precision agriculture, irrigation management, and local hazard monitoring rather than for static global products.
- A key argument of the study is that a high spatial resolution is of key importance in applications. However, even more important is a high temporal sampling to capture the highly dynamic surface soil moisture field.
- The authors do not explain if all features need to be available in order to estimate soil moisture. If this is the case then the retrieval would be limited to cloud-free conditions. If this is not the case then it is hard to imagine that the resulting time series are consistent (which is a major claim made).
Response (to comments 5 and 6):
We thank the reviewer for these important comments. We agree that while high spatial resolution is valuable for many applications, temporal sampling is equally critical for capturing the highly dynamic nature of surface soil moisture. In our approach, all relevant satellite features must be available to estimate soil moisture for a given location and date. Consequently, retrievals are limited to periods with valid and cloud-free optical observations, which restricts the temporal density of the dataset.
In regions or seasons with frequent cloud cover and rainfall, only a few dates per month—or in some cases none—yield valid soil moisture estimates. Conversely, during clear-sky and dry seasons, the number of valid retrievals can exceed five per month. We will add this clarification in the revised manuscript to explicitly discuss how data availability varies temporally and how this affects the consistency of the time series.
This limitation is inherent to the use of optical sensors. As the reviewer correctly noted, microwave observations have the advantage of operating under cloudy conditions, and we will elaborate on this complementary potential in the Discussion section.
- It is not explained where the parameterization shown in Equations 7 to 10 come from? Note that all parameters (Avv, Bvv, Avh, and Bvh) experience a discontinuity at NDVI = 0.8. This must lead to inconsistencies in space and time, again differently as claimed by the authors.
Response:
We thank the reviewer for this valuable comment. The parameterization in Equations (7)–(10) follows the dynamic Water Cloud Model (WCM) formulations proposed by Baghdadi et al. (2017, 2019) and Rawat et al. (2021). These studies established empirical relationships between the vegetation parameters (A and B) and NDVI by calibrating the WCM over various land-cover types using Sentinel-1 data and in-situ measurements. We adopted these published formulations to allow the scattering and attenuation coefficients to vary with vegetation density in a physically consistent way.We acknowledge the reviewer’s concern about the apparent discontinuity at NDVI = 0.8. In practice, however, the NDVI field is spatially and temporally continuous, and transitions around this threshold occur gradually; thus, no visible discontinuity appears in the retrieved backscatter or soil-moisture maps. The NDVI = 0.8 break simply reflects the empirical regime change between moderate and dense vegetation, as originally defined in Baghdadi et al. (2019).
We will highlight in the revised manuscript that these coefficients are derived from the above references, that the NDVI = 0.8 threshold originates from empirical calibration, and that the resulting backscatter corrections remain spatially and temporally consistent in practice.
- It is hard to believe that the GSSM-10 data are so much better than the SMAP-HydroBlocks data sets that rests both on SMAP and a powerful land surface model.
Response:
We appreciate the reviewer’s thoughtful comment and understand the skepticism. The higher accuracy of the GSSM-10 product relative to SMAP-HydroBlocks (SMAP-HB) primarily reflects differences in spatial resolution and data representation, rather than any contradiction with the physical soundness of SMAP-HB. SMAP-HB operates on a 30 m grid, derived from downscaled SMAP (~9 km) observations through a land surface modeling framework, whereas GSSM-10 directly integrates multi-sensor data from Sentinel-1, Sentinel-2, Landsat, and terrain features at native 10 m resolution. This allows GSSM-10 to better represent local heterogeneity around ISMN sites, which are inherently point-scale measurements, resulting in higher statistical agreement at those locations.
We emphasize that the purpose of this comparison is not to discredit SMAP-HB or any other model, as each approach has its own strengths and suitable applications. We included SMAP-HB in our comparison precisely because it is a well-established, physically based, microwave-driven dataset, providing an important benchmark for evaluating optical–SAR fusion methods like GSSM-10. The results simply highlight the ongoing challenge of reconciling differences between satellite-derived soil moisture and in-situ point measurements, especially across different spatial scales.
To ensure full transparency, we have added a supplementary table listing the time, location, ISMN in-situ observations, GSSM-10 predictions, and SMAP-HB estimates used in the comparison. In this table, multiple SMAP-HB soil moisture values may appear for the same date and location because the SMAP-HB dataset includes several ensemble members, each representing an independent model realization with slightly different initial or parameter conditions. For the reported comparison, we computed the ensemble mean across these members and used this averaged value in all statistical analyses. We will also clarify these points in the revised manuscript to avoid any misinterpretation and to emphasize the complementary nature of both products.
- In all applications discussed in Section 4, vegetation seems to be the main driver in observed changes in the derived soil moisture maps (e.g. in the use case for the fire in the Los Angeles country). While in some instances soil moisture and vegetation dynamics may indeed be well correlated, this is not universally the case. For example, after a fire, soil moisture may even go up.
Response:
Regarding the application examples in Section 4, we agree that vegetation dynamics can strongly influence surface soil moisture patterns, and that the relationship is not always direct or consistent, particularly in post-fire conditions. The examples presented in our manuscript are intended only as illustrative demonstrations of potential applications of the GSSM-10 dataset, rather than as detailed physical analyses of each process. We acknowledge that post-fire soil moisture dynamics can be complex, and a comprehensive study focusing specifically on wildfire-affected areas would require dedicated investigation, which we plan to explore in future work.
- It is important to distinguish between spatial resolution and spatial sampling. E.g., SMAP data may be sampled at 9 km, but the actual resolution may be closer to the original resolution of the SMAP brightness temperature measurements (about 36 km).
Response:
We also appreciate the reviewer’s clarification on the distinction between spatial resolution and spatial sampling. In our study, the 10 m sampling resolution corresponds to the native spatial resolution of Sentinel-1 SAR (VV, VH, incidence angle) and Sentinel-2 optical (B1–B12, NDVI, NDMI) inputs used in the model (see Table 2). For consistency across all inputs, the Landsat-8/9 and DEM (ALOS DSM / SRTM) datasets, originally at 30 m resolution, were resampled to 10 m to align with the Sentinel grid. This step harmonizes feature alignment but does not artificially increase the effective resolution. We acknowledge that the effective spatial resolution of the retrieved soil moisture may differ from the nominal 10 m depending on surface heterogeneity and sensor sensitivity. In the revised manuscript, we will explicitly discuss this distinction and cite Brocca et al. (2024), clarifying that while our dataset provides 10 m sampling resolution, the actual spatial representativeness of the soil moisture estimates should be interpreted in the context of the input sensors’ physical footprints and the spatial autocorrelation of land-surface properties.
Minor comments
- 3rd paragraph in introduction: So far there is little evidence for most 1km soil moisture data sets that they add significantly value over existing coarser resolution soil moisture data sets. Therefore, there is yet no basis to write in such generic terms that “These high-resolution global datasets represent significant progress in capturing soil moisture at much finer scales than earlier global products.”
Response:
We thank the reviewer for this constructive comment. We agree that there is still limited quantitative evidence demonstrating the added value of most existing 1 km soil moisture datasets compared with coarser-resolution products. In the revised manuscript, we will rephrase the statement to more cautiously reflect the current understanding.
When we refer to the “added value” of high-resolution soil moisture datasets, we mainly emphasize their potential application in areas where fine spatial detail is essential—such as agricultural management, where field boundaries and irrigation patterns cannot be effectively captured with resolutions coarser than 1 km. We will clarify this intended meaning in the Introduction to avoid overgeneralization.
- Table 2: What is the effective temporal sampling of Sentinel-2 and Landsat (considering cloud cover)?
Response:
We will add information on the effective temporal sampling of the optical datasets, accounting for cloud contamination. On average, Sentinel-2 provides 2–3 day nominal revisit frequency but typically achieves 5–10 clear observations per month depending on location and cloud conditions, whereas Landsat-8/9, with an 8-day nominal cycle, yields 2–4 clear scenes per month in most regions. This clarification will be added to Table 2 and the accompanying text.
- Page 6, line 15: Explain the term “dynamic”.
Response:
We will clarify that the term dynamic refers to the temporal adjustment of Water Cloud Model (WCM) parameters (A and B) according to the vegetation index (NDVI) at each acquisition date, allowing vegetation scattering and attenuation effects to vary with growth stage.
- Page 7, lines 22ff: It is not true that “conventional methods” use static WCM parameters. Most studies also use LAI and NDVI to describe dynamic changes in the vegetation cover.
Response:
We thank the reviewer for this correction. We will revise the text to acknowledge that many previous WCM-based studies also employed dynamic vegetation parameters such as NDVI or LAI. Our intention was to highlight that our implementation uses an explicitly NDVI-dependent empirical formulation rather than fixed coefficients, and we will adjust the wording to reflect this more accurately.
- Section 3.3.1: S2MP is a scientific algorithm that is per se not “limited in geographic scope” to those areas mentioned by the authors. Therefore, at a fundamental level, there is no basis to write that “This extensive spatial and temporal coverage makes GSSM-10 more suitable for operational applications in regions where in situ data are sparse and where S2MP is unavailable, thus offering broader utility for global soil moisture monitoring and large-scale environmental assessments.”
Response:
We agree that S²MP, as an algorithm, is not inherently restricted in geographic scope. Our statement referred to the currently available S²MP products, which are only publicly released for a few regions (parts of France, Morocco, Germany, U. S. A). We will clarify this in the revised manuscript by noting that while the algorithm itself is globally applicable, the existing processed datasets have limited geographic availability compared with the global coverage of GSSM-10.
Citation: https://doi.org/10.5194/essd-2025-511-AC4 -
AC5: 'Reply on RC3', Nuo Xu, 30 Oct 2025
reply
The comment was uploaded in the form of a supplement: https://essd.copernicus.org/preprints/essd-2025-511/essd-2025-511-AC5-supplement.zip
Data sets
GSSM-10 (Global 10-m Surface Soil Moisture) Nuo Xu, Andre Daccache, and Arman Ahmadi https://github.com/RSNuo/Global-10-m-Surface-Soil-Moisture-Maps.git
Model code and software
Ensemble Learning (TabNet, Random forest, XGBoost) Nuo Xu, Andre Daccache, and Arman Ahmadi https://github.com/RSNuo/Global-10-m-Surface-Soil-Moisture-Maps.git
Interactive computing environment
Jupyter Notebooks Nuo Xu, Andre Daccache, and Arman Ahmadi https://github.com/RSNuo/Global-10-m-Surface-Soil-Moisture-Maps.git
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 360 | 64 | 32 | 456 | 12 | 15 |
- HTML: 360
- PDF: 64
- XML: 32
- Total: 456
- BibTeX: 12
- EndNote: 15
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
The motivation of this paper is that the coverage of soil moisture data is presently limited to specific regions. Therefore, there is a need to develop a soil moisture product that combines global coverage with high spatial resolution.
However, after reading the manuscript, I found that this work does not present a complete global 10-meter resolution surface soil moisture dataset. Instead, it mainly describes a data fusion methodology that integrates multiple data sources. The paper only shows a few examples based on Sentinel-2, and I did not find any global-scale maps.
Although the authors mention that “An interactive web platform has been developed for data access, visualization, and download, enabling broad adoption by researchers,” I did not find any evidence of a user-friendly interface. It appears that users may need to run the process themselves, which limits accessibility.
Additional Comments:
Abstract: I suggest including information about the temporal coverage and spatial resolution of the dataset.
Page 2, lines 15–40: It might be helpful to briefly mention SAR-based soil moisture retrieval methods, as they offer higher spatial resolution and could address some of the limitations discussed here.
Page 2, line 35: Please add a space.