the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
An 8-day composited 36 km SMAP soil moisture dataset from 1979 to 2015 produced using a random forest and historical CCI data
Abstract. Soil moisture (SM) plays a significant role in many natural and anthropogenic systems which are essential to supporting life on Earth. Thus, accurate measurement and assessment of changes in soil moisture globally is of great value, including long-term historical assessment. Since the on-board cycle and detailed parameters of disparate sensors are different, the European Space Agency established the Climate Change Initiative (CCI) program to harmonize the available multisource SM data, producing long time-series surface SM datasets starting from 1978 to the present. However, the Soil Moisture Active Passive (SMAP) mission, launched in 2015, has shown more satisfactory performance in both spatial accuracy and in capturing pattern of temporal changes. In this paper, a random forest (RF) model was proposed to extend the superior SMAP dataset historically (named RF_SMAP), using the corresponding CCI time-series. We assumed that the temporal changes in the SMAP dataset are similar generally to those in the available CCI dataset. Accordingly, the RF model was constructed using the CCI SM v05.2 data, which was migrated to the prediction of the RF_SMAP dataset. The available in-situ SM data and the real SMAP data from 2015 to 2019 were used as references to validate the predicted RF_SMAP data. It was shown that compared with the CCI dataset, the predicted RF_SMAP dataset is closer to the in-situ SM data and the real SMAP data. Thus, the RF_SMAP dataset was shown to be a reliable substitute for the historical CCI dataset. The new long time-series RF_SMAP dataset, which will be available to download, will be of great value for a range of research in applications such as climate assessment, agricultural planning, food insecurity monitoring and drought assessment and monitoring.
This preprint has been withdrawn.
-
Withdrawal notice
This preprint has been withdrawn.
-
Preprint
(6501 KB)
Interactive discussion
Status: closed
-
RC1: 'Comment on essd-2022-137', Anonymous Referee #1, 26 Jul 2022
The authors of this study tried to use a random forest model combined with historical CCI soil moisture estimates from multiple satellites and sensors, to incorporate the strength of SMAP soil moisture data from 2015, making an 8-day composite, 36 km SMAP-like soil moisture data set from 1979. Overall, this study has potential to provide an updated global soil moisture data set for a range of research and applications. However, there is an important assumption made in this proposed approach that changes in temporal variability in soil moisture estimates from CCI sol moisture data are similar to those of SMAP data. This should be fully investigated, because these are two different sources of information. The authors may want to seriously consider the spatiotemporal consistency among different data sets prior to conducting such an analysis. Second, there is no report on statistic metrics of comparison of the random forest-based SMPA estimates and in situ measurements. People may want to know if these estimates achieve a basic acceptable accuracy in terms of an overall error of less than 0.04 m3 m-3 for the volumetric soil moisture estimates. I hope the authors have a chance to rethink about this study, and have more time to redo such an analysis prior to resubmission.
Citation: https://doi.org/10.5194/essd-2022-137-RC1 -
RC2: 'Review of "An 8-day composited 36 km SMAP soil moisture dataset from 1979 to 2015 produced using a random forest and historical CCI data" by Haoxuan Yang et al. submitted to ESSD', Anonymous Referee #2, 18 Sep 2022
Yang et al. present a study recalculating global soil moisture data of the Soil Moisture Active Passive mission from data of the Climate Change Initiative programme and the International Soil Moisture Network by means of a Random Forest. The resulting dataset shall last back to 1979 at a resolution of 36 km and 8 days composites.
Â
In general such a dataset appears to be a valuable and worthwhile aim of a study. However, the applied methods omit addressing substantial questions about the validity of the data. Most fundamentally, the approach assumes that the barely five years of data overlap between 2015 and 2019 can describe the global relationship for 1979 to 2015, although this exactly is the period in which global change starts to become traceable in data and although global change is known to happen non-uniformly across the globe. As such, validation of the derived data requires more detailed analyses than the rather rough screening presented here. I would not expect overall RMSE statistics to be applicable for the desired outcome.Â
Â
Given the large amount of data in the ISMN database, the seasonality at most locations already providing a very simple first order dynamics and the very broad generalisation of a value for soil moisture at a grid of 36 km, I would be very interested about the ability of the model to reproduce deviations from the overall patterns. Since the data are spatially and temporally at least to some degree dependent/correlated, maybe LSTMs or other sorts of machine learning are more appropriate (cf. Fang et al. 2017, https://doi.org/10.1002/2017GL075619; Abbes et al. 2019 https://doi.org/10.1109/IGARSS.2019.8898418; Breen et al. https://doi.org/10.3390/make2030016, Zang et al. 2022 https://doi.org/10.1080/10106049.2022.2105406)?
Â
Moreover, there are already a number of global soil moisture products derived by machine learning. Among others, there are Sungmin and Orth 2021 (https://doi.org/10.1038/s41597-021-00964-1) in 0.25 degree resolution form 2000 to 2019 and Martens et al. 2017 (https://doi.org/10.1080/10106049.2022.2105406) ranging back to 1980 (GLEAM v3.6a). Given that these examples use very different and likely more sophisticated approaches, the authors should clarify clearly, what advances their dataset does provide. Hence in addition to the evaluation stated above, this opens a second area of analyses, which have not been addressed yet.
Â
With substantial deficits in these three domains (extrapolation from 5 years to non-stationary system, evaluation of performance beyond mean characteristics of climate zones, evaluation against other soil moisture products) the manuscript and the data deserve fundamental revisions before publication.
Citation: https://doi.org/10.5194/essd-2022-137-RC2
Interactive discussion
Status: closed
-
RC1: 'Comment on essd-2022-137', Anonymous Referee #1, 26 Jul 2022
The authors of this study tried to use a random forest model combined with historical CCI soil moisture estimates from multiple satellites and sensors, to incorporate the strength of SMAP soil moisture data from 2015, making an 8-day composite, 36 km SMAP-like soil moisture data set from 1979. Overall, this study has potential to provide an updated global soil moisture data set for a range of research and applications. However, there is an important assumption made in this proposed approach that changes in temporal variability in soil moisture estimates from CCI sol moisture data are similar to those of SMAP data. This should be fully investigated, because these are two different sources of information. The authors may want to seriously consider the spatiotemporal consistency among different data sets prior to conducting such an analysis. Second, there is no report on statistic metrics of comparison of the random forest-based SMPA estimates and in situ measurements. People may want to know if these estimates achieve a basic acceptable accuracy in terms of an overall error of less than 0.04 m3 m-3 for the volumetric soil moisture estimates. I hope the authors have a chance to rethink about this study, and have more time to redo such an analysis prior to resubmission.
Citation: https://doi.org/10.5194/essd-2022-137-RC1 -
RC2: 'Review of "An 8-day composited 36 km SMAP soil moisture dataset from 1979 to 2015 produced using a random forest and historical CCI data" by Haoxuan Yang et al. submitted to ESSD', Anonymous Referee #2, 18 Sep 2022
Yang et al. present a study recalculating global soil moisture data of the Soil Moisture Active Passive mission from data of the Climate Change Initiative programme and the International Soil Moisture Network by means of a Random Forest. The resulting dataset shall last back to 1979 at a resolution of 36 km and 8 days composites.
Â
In general such a dataset appears to be a valuable and worthwhile aim of a study. However, the applied methods omit addressing substantial questions about the validity of the data. Most fundamentally, the approach assumes that the barely five years of data overlap between 2015 and 2019 can describe the global relationship for 1979 to 2015, although this exactly is the period in which global change starts to become traceable in data and although global change is known to happen non-uniformly across the globe. As such, validation of the derived data requires more detailed analyses than the rather rough screening presented here. I would not expect overall RMSE statistics to be applicable for the desired outcome.Â
Â
Given the large amount of data in the ISMN database, the seasonality at most locations already providing a very simple first order dynamics and the very broad generalisation of a value for soil moisture at a grid of 36 km, I would be very interested about the ability of the model to reproduce deviations from the overall patterns. Since the data are spatially and temporally at least to some degree dependent/correlated, maybe LSTMs or other sorts of machine learning are more appropriate (cf. Fang et al. 2017, https://doi.org/10.1002/2017GL075619; Abbes et al. 2019 https://doi.org/10.1109/IGARSS.2019.8898418; Breen et al. https://doi.org/10.3390/make2030016, Zang et al. 2022 https://doi.org/10.1080/10106049.2022.2105406)?
Â
Moreover, there are already a number of global soil moisture products derived by machine learning. Among others, there are Sungmin and Orth 2021 (https://doi.org/10.1038/s41597-021-00964-1) in 0.25 degree resolution form 2000 to 2019 and Martens et al. 2017 (https://doi.org/10.1080/10106049.2022.2105406) ranging back to 1980 (GLEAM v3.6a). Given that these examples use very different and likely more sophisticated approaches, the authors should clarify clearly, what advances their dataset does provide. Hence in addition to the evaluation stated above, this opens a second area of analyses, which have not been addressed yet.
Â
With substantial deficits in these three domains (extrapolation from 5 years to non-stationary system, evaluation of performance beyond mean characteristics of climate zones, evaluation against other soil moisture products) the manuscript and the data deserve fundamental revisions before publication.
Citation: https://doi.org/10.5194/essd-2022-137-RC2
Data sets
The 8-day composited 36 km SMAP soil moisture dataset from 1979 to 2015 Haoxuan Yang, Qunming Wang, Wei Zhao, and Peter M. Atkinson https://doi.org/10.6084/m9.figshare.17621765
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
906 | 193 | 52 | 1,151 | 53 | 53 |
- HTML: 906
- PDF: 193
- XML: 52
- Total: 1,151
- BibTeX: 53
- EndNote: 53
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1