the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
GSSM: A global seamless soil moisture dataset from 1981 to 2022 matching CCI to SMAP with a novel bias correction method
Abstract. Surface soil moisture is vital for Earth's environmental and energy cycles. However, it is still rare to have remote sensing soil moisture data with a long-term temporal extent, a global seamless spatial coverage, and a near-real-time update frequency. Here, we provided a global seamless soil moisture dataset from July 1981 to December 2022, matching CCI with SMAP through a novel soil moisture data bias correction method (fitting beta CDF matching, BCDF), and filling the gaps of corrected soil moisture through XGBoost Algorithms along with various soil moisture covariates. The new soil moisture dataset was abbreviated as GSSM and it has been validated with in situ observations, original CCI and SMAP data, and simulated gap areas. Results demonstrated that 1) the GSSM has similar accuracy with the SMAP and they are both more accurate than the original CCI data as compared with in situ observations at 399 global sites (averaged R=0.72, averaged ubRMSE<0.05); 2) the GSSM has the global spatial coverage, while filling the gaps of original CCI data through various soil moisture covariates (in artificial gaps verification, averaged R>0.86, averaged ubRMSE<0.04); 3) the GSSM has the same temporal variation characteristics with the original CCI dataset, while it can be combined with SMAP to obtain a long-term and near-real-time soil moisture dataset. Thus, GSSM provides long-term and seamless soil moisture data, paving the way for environmental disaster and water cycle process research.
- Preprint
(4291 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (extended)
-
RC1: 'Comment on essd-2024-200', Anonymous Referee #1, 15 Oct 2024
reply
The study proposes a new dataset, called GSSM ("Gap-filled S, which first scales ESA CCI COMBINED v8.1 soil moisture against 9km SMAP observations and then fills gaps using an XGBoost machine learning approach with environmental drivers as input. Although gap filling remotely-sensed (soil moisture) data can be useful for several applications and developing enhanced gap-filling approaches is an important field of research, the proposed methodology and evaluation do not allow to draw any corroborated conclusions on the quality of the dataset.
First of all, the authors do not seem to be familiar with the data they are trying to gap-fill and hence the motivation of the study is unsound. As described in the product documentation, ESA CCI v8.1 already assimilates SMAP observations (e.g. https://catalogue.ceda.ac.uk/uuid/ff890589c21f4033803aa550f52c980c/). Instead, the authors write (line 80) that "... SMAP data have the potential to be integrated into existing long-term ESA CCI products to form a more reliable and useful product...".
Most of the paper (both the methodology, results and discussion sections) is about the bias correction between ESA CCI and SMAP. First of all, the justification of this step is unclear, is it because you want to provide ESA CCI in the climatology of the 9km NASA SMAP product? As mentioned before, SMAP is already used in ESA CCI. Second, the role of daily versus monthly data is unintelligible: is the bias correction based on monthly resampled data? If so, how was this resampling done? Or was the bias correction based on monthly data and the scaling functions then applied to daily data, as suggested in line 165? Note, that monthly data are much smoother and have less extreme values than daily data, so one cannot easily transfer bias correction functions from monthly to daily data. This is all very confusing.
The success of the bias correction is assessed by validating the native and the bias-corrected CCI data against SMAP data, which previously served as the scaling reference. No wonder that scores like the bias and the RMSE improve (e.g Fig.5). You basically compare SMAP with SMAP. The fact that the R and ubRMSE do not change with scaling tells you that the scaling itself does not significantly improve the dataset. As a consequence, the entire section can be considered redundant.Â
Why are some results only assessed for selected regions or some hard conclusions (e.g. on the performance of the various CDF-matching implementations) even based on only five points world-wide?
line 165: step 4 in the methodology mentions the application of a post-processing freeze/thaw masking to the gap-filled data. Why? One of the main reasons of the initial gaps in ESA CCI, is the flagging of spurious retrievals under frozen conditions. This is why the data are masked for such events, so gaps are there for a good reason. To me, reintroducing gaps after all the gap-filling effort, is questionable.
line 201-202: if there is only one observation pair between ESA CCI and SMAP you use a nearest neighbour for scaling. But if you have 2 or 3 or another limited number of observations, how reliable is your standard deviation in those cases?
The data sections are very short, hardly any dataset characteristics are given, e.g. what sensors were used in ESA CCI, what flags were applied, what sort of gaps are found and what are their causes? A more careful examination of the input dataset characteristics could have prevented many of the deficient analyses made and conclusions drawn in this paper.
A proper validation of the gap filling performance, which should be the core of the paper, is entirely lacking. In section 3.4 you perform a presumed validation of the pap-filling method, but this approach is incorrect: From the scaled and gap-filled data (you call them "Original values") you remove some regions and then use XGBoost again to predict these removed areas ("Predicted value"). Next, you compare the "Original values") with the "predicted values". Obviously, it's not surprising that these values correspond very well, as you basically assess how well XGBoost repredicts the predicted values. This is not a sound validation of the accuracy of the filled gaps and it's not surprising at all that scores are so high.
The authors spend many words on a very generic introduction, e.g. about microwave sensors that are available for the retrieval of soil moisture. Besides, the introduction contains several false claims, e.g. line 43-44: "there are three methods to obtain high-accuracy soil moisture data with global seamless spatial characteristics... Â "traditional ground-based measurements ..." -> In situ data are not seamless at all.; lline 58: it is claimed that remote sensing "...has become the most promising way to obtain data in long-term series, near-real-time, and high spatial coverage", yet most studies still show the superiority of reanalysis data over remote sensing soil moisture.
Line 73: CCI data are updated only once a year through ESA, that's correct. However, C3S is responsible for its operational production and regular update, every 10 days.Â
Individual networks of the ISMN shall be all properly acknowledged.
The discussion section mostly presents new results, not a discussion.
Citation: https://doi.org/10.5194/essd-2024-200-RC1
Data sets
GSSM: A global long term seamless soil moisture dataset (1981-2022) Hao Sun and Yunjia Wang https://data.tpdc.ac.cn/en/disallow/0f28a9b5-92eb-470a-80fe-472aa50a136f
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
465 | 97 | 22 | 584 | 18 | 20 |
- HTML: 465
- PDF: 97
- XML: 22
- Total: 584
- BibTeX: 18
- EndNote: 20
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1