An 8-day composited 36 km SMAP soil moisture dataset from 1979 to 2015 produced using a random forest and historical CCI data

Yang, Haoxuan; Wang, Qunming; Zhao, Wei; Atkinson, Peter M.

doi:10.5194/essd-2022-137

Preprints

https://doi.org/10.5194/essd-2022-137

Preprints

08 Jun 2022

| 08 Jun 2022

Status: this preprint has been withdrawn by the authors.

An 8-day composited 36 km SMAP soil moisture dataset from 1979 to 2015 produced using a random forest and historical CCI data

Haoxuan Yang, Qunming Wang, Wei Zhao, and Peter M. Atkinson

Abstract. Soil moisture (SM) plays a significant role in many natural and anthropogenic systems which are essential to supporting life on Earth. Thus, accurate measurement and assessment of changes in soil moisture globally is of great value, including long-term historical assessment. Since the on-board cycle and detailed parameters of disparate sensors are different, the European Space Agency established the Climate Change Initiative (CCI) program to harmonize the available multisource SM data, producing long time-series surface SM datasets starting from 1978 to the present. However, the Soil Moisture Active Passive (SMAP) mission, launched in 2015, has shown more satisfactory performance in both spatial accuracy and in capturing pattern of temporal changes. In this paper, a random forest (RF) model was proposed to extend the superior SMAP dataset historically (named RF_SMAP), using the corresponding CCI time-series. We assumed that the temporal changes in the SMAP dataset are similar generally to those in the available CCI dataset. Accordingly, the RF model was constructed using the CCI SM v05.2 data, which was migrated to the prediction of the RF_SMAP dataset. The available in-situ SM data and the real SMAP data from 2015 to 2019 were used as references to validate the predicted RF_SMAP data. It was shown that compared with the CCI dataset, the predicted RF_SMAP dataset is closer to the in-situ SM data and the real SMAP data. Thus, the RF_SMAP dataset was shown to be a reliable substitute for the historical CCI dataset. The new long time-series RF_SMAP dataset, which will be available to download, will be of great value for a range of research in applications such as climate assessment, agricultural planning, food insecurity monitoring and drought assessment and monitoring.

This preprint has been withdrawn.

Received: 22 Apr 2022 – Discussion started: 08 Jun 2022

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Preprint (PDF, 6501 KB)

Withdrawal notice
This preprint has been withdrawn.
Preprint (6501 KB)

Download & links

This preprint has been withdrawn.

Haoxuan Yang, Qunming Wang, Wei Zhao, and Peter M. Atkinson

Interactive discussion

Status: closed

RC1: 'Comment on essd-2022-137', Anonymous Referee #1, 26 Jul 2022

The authors of this study tried to use a random forest model combined with historical CCI soil moisture estimates from multiple satellites and sensors, to incorporate the strength of SMAP soil moisture data from 2015, making an 8-day composite, 36 km SMAP-like soil moisture data set from 1979. Overall, this study has potential to provide an updated global soil moisture data set for a range of research and applications. However, there is an important assumption made in this proposed approach that changes in temporal variability in soil moisture estimates from CCI sol moisture data are similar to those of SMAP data. This should be fully investigated, because these are two different sources of information. The authors may want to seriously consider the spatiotemporal consistency among different data sets prior to conducting such an analysis. Second, there is no report on statistic metrics of comparison of the random forest-based SMPA estimates and in situ measurements. People may want to know if these estimates achieve a basic acceptable accuracy in terms of an overall error of less than 0.04 m³ m^-3 for the volumetric soil moisture estimates. I hope the authors have a chance to rethink about this study, and have more time to redo such an analysis prior to resubmission.

Citation: https://doi.org/10.5194/essd-2022-137-RC1
RC2: 'Review of "An 8-day composited 36 km SMAP soil moisture dataset from 1979 to 2015 produced using a random forest and historical CCI data" by Haoxuan Yang et al. submitted to ESSD', Anonymous Referee #2, 18 Sep 2022

Yang et al. present a study recalculating global soil moisture data of the Soil Moisture Active Passive mission from data of the Climate Change Initiative programme and the International Soil Moisture Network by means of a Random Forest. The resulting dataset shall last back to 1979 at a resolution of 36 km and 8 days composites.

In general such a dataset appears to be a valuable and worthwhile aim of a study. However, the applied methods omit addressing substantial questions about the validity of the data. Most fundamentally, the approach assumes that the barely five years of data overlap between 2015 and 2019 can describe the global relationship for 1979 to 2015, although this exactly is the period in which global change starts to become traceable in data and although global change is known to happen non-uniformly across the globe. As such, validation of the derived data requires more detailed analyses than the rather rough screening presented here. I would not expect overall RMSE statistics to be applicable for the desired outcome.

Given the large amount of data in the ISMN database, the seasonality at most locations already providing a very simple first order dynamics and the very broad generalisation of a value for soil moisture at a grid of 36 km, I would be very interested about the ability of the model to reproduce deviations from the overall patterns. Since the data are spatially and temporally at least to some degree dependent/correlated, maybe LSTMs or other sorts of machine learning are more appropriate (cf. Fang et al. 2017, https://doi.org/10.1002/2017GL075619; Abbes et al. 2019 https://doi.org/10.1109/IGARSS.2019.8898418; Breen et al. https://doi.org/10.3390/make2030016, Zang et al. 2022 https://doi.org/10.1080/10106049.2022.2105406)?

Moreover, there are already a number of global soil moisture products derived by machine learning. Among others, there are Sungmin and Orth 2021 (https://doi.org/10.1038/s41597-021-00964-1) in 0.25 degree resolution form 2000 to 2019 and Martens et al. 2017 (https://doi.org/10.1080/10106049.2022.2105406) ranging back to 1980 (GLEAM v3.6a). Given that these examples use very different and likely more sophisticated approaches, the authors should clarify clearly, what advances their dataset does provide. Hence in addition to the evaluation stated above, this opens a second area of analyses, which have not been addressed yet.

With substantial deficits in these three domains (extrapolation from 5 years to non-stationary system, evaluation of performance beyond mean characteristics of climate zones, evaluation against other soil moisture products) the manuscript and the data deserve fundamental revisions before publication.

Citation: https://doi.org/10.5194/essd-2022-137-RC2

Interactive discussion

Status: closed

RC1: 'Comment on essd-2022-137', Anonymous Referee #1, 26 Jul 2022

The authors of this study tried to use a random forest model combined with historical CCI soil moisture estimates from multiple satellites and sensors, to incorporate the strength of SMAP soil moisture data from 2015, making an 8-day composite, 36 km SMAP-like soil moisture data set from 1979. Overall, this study has potential to provide an updated global soil moisture data set for a range of research and applications. However, there is an important assumption made in this proposed approach that changes in temporal variability in soil moisture estimates from CCI sol moisture data are similar to those of SMAP data. This should be fully investigated, because these are two different sources of information. The authors may want to seriously consider the spatiotemporal consistency among different data sets prior to conducting such an analysis. Second, there is no report on statistic metrics of comparison of the random forest-based SMPA estimates and in situ measurements. People may want to know if these estimates achieve a basic acceptable accuracy in terms of an overall error of less than 0.04 m³ m^-3 for the volumetric soil moisture estimates. I hope the authors have a chance to rethink about this study, and have more time to redo such an analysis prior to resubmission.

Citation: https://doi.org/10.5194/essd-2022-137-RC1
RC2: 'Review of "An 8-day composited 36 km SMAP soil moisture dataset from 1979 to 2015 produced using a random forest and historical CCI data" by Haoxuan Yang et al. submitted to ESSD', Anonymous Referee #2, 18 Sep 2022

Yang et al. present a study recalculating global soil moisture data of the Soil Moisture Active Passive mission from data of the Climate Change Initiative programme and the International Soil Moisture Network by means of a Random Forest. The resulting dataset shall last back to 1979 at a resolution of 36 km and 8 days composites.

In general such a dataset appears to be a valuable and worthwhile aim of a study. However, the applied methods omit addressing substantial questions about the validity of the data. Most fundamentally, the approach assumes that the barely five years of data overlap between 2015 and 2019 can describe the global relationship for 1979 to 2015, although this exactly is the period in which global change starts to become traceable in data and although global change is known to happen non-uniformly across the globe. As such, validation of the derived data requires more detailed analyses than the rather rough screening presented here. I would not expect overall RMSE statistics to be applicable for the desired outcome.

Given the large amount of data in the ISMN database, the seasonality at most locations already providing a very simple first order dynamics and the very broad generalisation of a value for soil moisture at a grid of 36 km, I would be very interested about the ability of the model to reproduce deviations from the overall patterns. Since the data are spatially and temporally at least to some degree dependent/correlated, maybe LSTMs or other sorts of machine learning are more appropriate (cf. Fang et al. 2017, https://doi.org/10.1002/2017GL075619; Abbes et al. 2019 https://doi.org/10.1109/IGARSS.2019.8898418; Breen et al. https://doi.org/10.3390/make2030016, Zang et al. 2022 https://doi.org/10.1080/10106049.2022.2105406)?

Moreover, there are already a number of global soil moisture products derived by machine learning. Among others, there are Sungmin and Orth 2021 (https://doi.org/10.1038/s41597-021-00964-1) in 0.25 degree resolution form 2000 to 2019 and Martens et al. 2017 (https://doi.org/10.1080/10106049.2022.2105406) ranging back to 1980 (GLEAM v3.6a). Given that these examples use very different and likely more sophisticated approaches, the authors should clarify clearly, what advances their dataset does provide. Hence in addition to the evaluation stated above, this opens a second area of analyses, which have not been addressed yet.

With substantial deficits in these three domains (extrapolation from 5 years to non-stationary system, evaluation of performance beyond mean characteristics of climate zones, evaluation against other soil moisture products) the manuscript and the data deserve fundamental revisions before publication.

Citation: https://doi.org/10.5194/essd-2022-137-RC2

Haoxuan Yang, Qunming Wang, Wei Zhao, and Peter M. Atkinson

Data sets

The 8-day composited 36 km SMAP soil moisture dataset from 1979 to 2015 Haoxuan Yang, Qunming Wang, Wei Zhao, and Peter M. Atkinson https://doi.org/10.6084/m9.figshare.17621765

Haoxuan Yang, Qunming Wang, Wei Zhao, and Peter M. Atkinson

Viewed

Total article views: 1,758 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
1,363	314	81	1,758	110	134

HTML: 1,363
PDF: 314
XML: 81
Total: 1,758
BibTeX: 110
EndNote: 134

Views and downloads (calculated since 08 Jun 2022)

Month	HTML	PDF	XML	Total
Jun 2022	158	27	3	188
Jul 2022	66	14	7	87
Aug 2022	37	13	2	52
Sep 2022	48	20	3	71
Oct 2022	33	4	1	38
Nov 2022	31	9	0	40
Dec 2022	20	7	1	28
Jan 2023	10	2	0	12
Feb 2023	16	9	0	25
Mar 2023	16	2	2	20
Apr 2023	15	6	0	21
May 2023	42	6	0	48
Jun 2023	35	1	1	37
Jul 2023	31	9	1	41
Aug 2023	40	1	0	41
Sep 2023	58	5	1	64
Oct 2023	38	9	1	48
Nov 2023	14	3	0	17
Dec 2023	10	3	3	16
Jan 2024	21	4	1	26
Feb 2024	17	7	4	28
Mar 2024	29	16	4	49
Apr 2024	16	2	2	20
May 2024	15	3	6	24
Jun 2024	36	3	2	41
Jul 2024	10	3	13
Aug 2024	15	3	4	22
Sep 2024	5	2	0	7
Oct 2024	7	1	0	8
Nov 2024	11	1	0	12
Dec 2024	10	2	0	12
Jan 2025	9	5	5	19
Feb 2025	10	2	0	12
Mar 2025	7	3	1	11
Apr 2025	15	13	3	31
May 2025	7	9	2	18
Jun 2025	9	17	0	26
Jul 2025	8	7	4	19
Aug 2025	44	7	0	51
Sep 2025	276	5	2	283
Oct 2025	19	13	2	34
Nov 2025	32	26	5	63
Dec 2025	17	13	5	35

Cumulative views and downloads (calculated since 08 Jun 2022)

Month	HTML	PDF	XML	Total
Jun 2022	158	27	3	188
Jul 2022	66	14	7	87
Aug 2022	37	13	2	52
Sep 2022	48	20	3	71
Oct 2022	33	4	1	38
Nov 2022	31	9	0	40
Dec 2022	20	7	1	28
Jan 2023	10	2	0	12
Feb 2023	16	9	0	25
Mar 2023	16	2	2	20
Apr 2023	15	6	0	21
May 2023	42	6	0	48
Jun 2023	35	1	1	37
Jul 2023	31	9	1	41
Aug 2023	40	1	0	41
Sep 2023	58	5	1	64
Oct 2023	38	9	1	48
Nov 2023	14	3	0	17
Dec 2023	10	3	3	16
Jan 2024	21	4	1	26
Feb 2024	17	7	4	28
Mar 2024	29	16	4	49
Apr 2024	16	2	2	20
May 2024	15	3	6	24
Jun 2024	36	3	2	41
Jul 2024	10	3	13
Aug 2024	15	3	4	22
Sep 2024	5	2	0	7
Oct 2024	7	1	0	8
Nov 2024	11	1	0	12
Dec 2024	10	2	0	12
Jan 2025	9	5	5	19
Feb 2025	10	2	0	12
Mar 2025	7	3	1	11
Apr 2025	15	13	3	31
May 2025	7	9	2	18
Jun 2025	9	17	0	26
Jul 2025	8	7	4	19
Aug 2025	44	7	0	51
Sep 2025	276	5	2	283
Oct 2025	19	13	2	34
Nov 2025	32	26	5	63
Dec 2025	17	13	5	35

Viewed (geographical distribution)

Total article views: 1,706 (including HTML, PDF, and XML) Thereof 1,706 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 27 Dec 2025

Download

This preprint has been withdrawn.

Preprint (6501 KB)
Metadata XML

Short summary

A random forest (RF) model was proposed to extend the superior SMAP dataset (named RF_SMAP) from 1979 to 2015, using the corresponding CCI time-series. The new long time-series RF_SMAP dataset, which will be available to download, will be of great value for a range of research in applications such as climate assessment, agricultural planning, food insecurity monitoring and drought assessment and monitoring.


Total:	0
HTML:	0
PDF:	0
XML:	0