ChinaCropSM1 km: a ﬁne 1 km daily soil moisture dataset for dryland wheat and maize across China during 1993–2018

. Soil moisture (SM) is a key variable of the regional hydrological cycle and has important applications for water resource and agricultural drought management. Various global soil moisture products have been mostly retrieved from microwave remote sensing data. However, currently there is rarely spatially explicit and time-continuous soil moisture information with a high resolution at the national scale. In this study, we generated a 1 km soil moisture dataset for dryland wheat and maize in China (ChinaCropSM1 km) over 1993–2018 through a random forest (RF) algorithm based on numerous in situ daily observations of soil moisture. We independently used in situ observations (181 327 samples) from the agricultural meteorological stations (AMSs) across China for training (164 202 samples) and others for testing (17 125 samples). An irrigation module was ﬁrst developed according


Introduction
Soil moisture (SM) is closely associated with droughts and floods and consequently agricultural production (Tao et al., 2003).Therefore, SM information at a high resolution is critical to improve crop yield prediction (Prasad et al., 2006;Chakrabarti et al., 2014) and drought impact assessment (Sheffield, 2004).However, such higher resolutions at both temporal (e.g., daily and more than decade) and spatial scales are still unavailable across China, especially for dry croplands.
SM can be obtained in several ways, including in situ observations (Walker et al., 2004;Bogena et al., 2007), remote sensing retrieval (Mohanty et al., 2017;Wei et al., 2019) and process-based model simulations (Vergopolan et al., 2020;Ahmed et al., 2021).Field observations provide the most accurate SM but are expensive and time-consuming, and there are large uncertainties from extrapolating the limited observations into larger regions with high heterogeneity (Collow et al., 2012;Crow et al., 2012).Microwave sensors have been applied to retrieve SM in recent years (Schmugge et al., 2002;Wigneron et al., 2003;Amazirh et al., 2018).The microwave sensors can only monitor near-surface SM (0-10 cm) (Eagleman and Lin, 1976;Jackson et al., 1982).Passive microwave sensors can monitor daily SM but with a coarse resolution (25-40 km), compared with a high spatial resolution (10-30 m) and a coarser repetition interval (15-25 d) for active sensors (Eagleman and Lin, 1976;Jackson et al., 1982;Mallick et al., 2009).Such SM products have large uncertainties due to the limitations of satellite coverage and downscaling methods, although they can easily cover large regions compared with in situ observations (Loew et al., 2013;Su et al., 2016;Peng et al., 2017).Deriving the SM from model simulation is also challenging because of its high requirements in input data and computing ability, as well as large uncertainties from model parameters (Wang and Qu, 2009;Yilmaz et al., 2012;Petropoulos et al., 2015).In addition, many studies have found that irrigation, as an additional water supply source other than precipitation, reduces soil albedo (Chen and Dirmeyer, 2019), increases heat capacity (Wang et al., 2019), alters local SM (Lawston et al., 2017), and affects the water and energy budget (Shen et al., 2013).However, few studies have taken irrigation into account in developing SM data products at the national or global scale (Drewniak et al., 2013;Qiu et al., 2016a).Therefore, it is critical yet challenging to improve SM accuracy at both spatial and temporal resolutions.
As one part of the Climate Change Initiative (CCI), the European Space Agency (ESA) published a long-term surface SM dataset, and the latest version (v06.1)covered the period of 1978-2020 (https://www.esa-soilmoisture-cci.org/,last access: 10 April 2022) (Dorigo et al., 2017;Gruber et al., 2019;Preimesberger et al., 2021).The ESA CCI SM products are consistent with the observed values at some grassland and farmland sites in China (Liu et al., 2011;Al-bergel et al., 2013;Dorigo et al., 2015Dorigo et al., , 2017)); however, they have a coarse spatial resolution (∼ 27 km) and many coverage gaps (Llamas et al., 2020;Guevara et al., 2021).More recently, based on multiple neural networks, the global remotesensing-based surface soil moisture (RSSSM) dataset covering 2003-2018 at 0.1 • resolution was developed by using Soil Moisture Active Passive (SMAP) SM as the primary training target.The RSSSM improved the coefficient of determination (R 2 ) by 0.46 and the root mean squared error (RMSE) by 0.083 m 3 m −3 , with a 10 d resolution (Chen et al., 2021).In 2020, another new SM dataset in China from 2002 to 2018 was provided from different passive microwave SM products and model-based downscaling techniques (Meng et al., 2021).With an improved correlation coefficient (r) of 0.84 and an unbiased root mean squared error (ubRMSE) of 0.056 m 3 m −3 , the new dataset has a 0.05 • spatial resolution and a monthly time resolution.These SM products have contributed largely to related agricultural studies and management; however, they are still too coarse to assess agricultural drought risk and predict crop yield accurately.
Although numerous efforts have been devoted to developing SM products, major concerns should be addressed: (1) agricultural management activities such as irrigation have not been fully considered by previous studies, especially in countries such as China with extensive irrigated areas (Zhu et al., 2013); (2) both the spatial and temporal (e.g., daily) resolutions of SM products need to be improved for regional agricultural management; and (3) the SM accuracy needs to be further improved.In recent years, in situ observations have become available (Li et al., 2005).Some new methods, such as machine learning, are increasingly applied to many fields and have been shown to be robust in incorporating multiple sources of data to develop spatiotemporal datasets (Ahmad et al., 2010;Srivastava et al., 2013;Im et al., 2016).
Therefore, our main objectives in the study were to develop a novel method to generate a daily 1 km SM dataset for dry croplands across China based on numerous field observations, to evaluate their accuracy and compare them with current products, and to explore the spatiotemporal characteristics of soil moisture for dryland wheat and maize.We anticipate that our methods and datasets will be valuable for agricultural drought monitoring and crop yield forecasting.

Study area
The study area is dominated by dryland crops such as wheat and maize in China, with complex cultivation methods (Wu and Li, 2012) and various irrigation activities (Huang et al., 2015).According to the annual harvesting areas of crops across mainland China from 2000 to 2015 (Luo et al., 2020a, b), maize and wheat are the two main crops in China, accounting for 35.4 % of the total harvested area (FAOSTAT,

In situ SM observations
The in situ SM observation data (http://data.cma.cn, last access: 18 April 2021) from 1993 to 2018 were obtained from agricultural meteorological stations (AMSs) in China, which recorded the location, crop type, phenology, soil depth and SM.SM was measured at depths of 10 cm and 20 cm at each AMS on the 8th, 18th and 28th of each month.For each sample, crop phenology was observed and recorded by welltrained agricultural technicians in experimental fields (the average field size was 0.15 ha) and then checked and qualified by the Chinese Agricultural Meteorological Monitoring System (CAMMS).The location of AMSs is generally selected in areas with relatively homogeneous soil properties.Also the fact that crops were quite well managed by irrigation according to weather variability and crop growth status makes the crop SM records largely representative of the overall level of pixels (1 km × 1 km) (Zhang et al., 2020;Li et al., 2021).The first layer (0-10 cm) has been widely used to investigate the spatial and temporal characteristics of SM and validate SM retrieved from microwaves across China (Lacava et al., 2012;Zeng et al., 2015;Liu et al., 2018;Fang et al., 2020).
We collected the in situ observations of maize (287 sites) and wheat (240 sites), with a total of 181 327 samples (maize: 36 226 samples for the 0-10 cm soil layer, 36 245 samples for the 10-20 cm soil layer; wheat: 54 396 samples for the 0-10 cm soil layer, 54 460 samples for the 10-20 cm soil layer).

Environmental factors
The environmental factors were classified into site features and gridded features, both of which include meteorological data (MD), day of year (DOY), classified irrigation (CIR), soil properties (SPs), remote sensing data (RSD) and geographical information (GI) (Table 1).
where C i , P j , D k and SMI ij k are crop type, phenology, soil depth and the evaluation index of relative soil moisture (SMI) corresponding to the crop type i, phenology j and soil depth k.SMI is a threshold to determine when irrigation is applied (Table 2), which was released by the Ministry of Water Resources of China (CNMWR) (http://www.mwr.gov.cn, last access: 10 July 2022) in July 2012.SP includes sand, silt, gravel, organic carbon, clay contents, soil pH and bulk density, obtained from Harmonized World Soil Database Version 1.2 (http://webarchive.iiasa.ac.at/Research/LUC/External-World-soil-database/HTML/, last access: 18 August 2021).The original 30 arcsec raster spatial resolution data were resampled to a 1 km resolution based on nearest neighbor interpolation, and the site-related SPs were extracted from values to points using ArcGIS 10.5 software (ESRI).
GI includes latitude (lat), longitude (lon), moisture index (im) (Thornthwaite, 1948) and river vector data, provided by the Data Centre for Resources and Environmental Sciences, Chinese Academy of Sciences (http://www.resdc.cn/Default.aspx, last access: 18 April 2021).The distance from each AMS to river networks at all levels (R4, R5, R12) in China was calculated using the Euclidean distance analysis method. https://doi.org/10.5194/essd-15-395-2023 Earth Syst.Sci.Data, 15, 395-409, 2023   (Rodell et al., 2004) as a climatological reference (Wagner et al., 2012).The active/passive products were the integration of the scatterometerand radiometer-based SM retrievals, while the ESA CCI SM product is the fusion of both the active and passive products.We used the v05.2 product for comparison because of its advantages compared with active/passive products (Liu et al., 2012;Dorigo et al., 2017)

Variable selection and data treatment
For the site-related variables, we used the extract values to points tool to extract the 1 km resolution raster information of the environmental (i.e., SP, RSD and GI) data to AMS point data, to output point data attributes and to save it in CSV format to obtain a dataset of environmental factors through ArcGIS 10.5, and then we deleted those with high multicollinearity (|r|>0.5)according to the factor stacks (Figs.S3  and S4).Therefore, the 11 independent variables (pre, pre10, DOY, CIR, T_REF_BULK, R4, im, pet, lat, lon and fc) were selected because they characterize well the impacts of meteorology, time, irrigation, soil properties and geography on regional SM.We used the "Euclidean distance" option of the spatial analyst tools in ArcGIS10.5 to obtain the variables related to river networks in China (Danielsson, 1980).
We also applied the kriging interpolation method to obtain precipitation-related variables (e.g., pre-and pre-10) from CNMSs.Thereafter, all gridded maps were processed in the WGS84 UTM zone 45N Geographic Coordinate System (EPSG: 332645) and resampled to the same spatial resolution (1 km).

Model development
Ensemble learning was used to aggregate a collection of algorithms to predict the potential impacts, which represents a better method than that using any algorithm alone (Brownlee, 2016).Random forest (RF) is a typical ensemble learning algorithm that can be used to build predictive models for both classification and regression purposes.RF fits an ensemble of models that first train a multitude of decision trees and then obtain predictions by an average or vote through all individual trees (Breiman, 2001).The algorithm introduces extra randomness when growing trees and searches for the best trees among a random subset of features.This technique results in greater tree diversity, generally yielding an overall better model (Hutengs and Vohland, 2016;Lagomarsino et al., 2017).In addition, the bagging method, which constructs multiple training subdatasets by resampling with the replacement of the original dataset, is employed to reduce the variance and overfitting (Díaz-Uriarte and Alvarez de Andrés, 2006;Zhang et al., 2018).Its high accuracy and stability in agricultural fields have been substantiated in several previous studies, especially for predicting grain yield, identifying crop planting areas and mapping soil properties (Hengl et al., 2015;Jeong et al., 2016;Sun et al., 2019).
Hyperparameters in an RF model are very important to optimize its performance.Such parameters are initially defaulted, and we need to investigate their appropriateness or find potentially better values during the development of an RF regression (RFR).The important hyperparameters include the following: -n_estimators: the number of trees that the algorithm builds before taking the maximum voting or average overpredictions (a high number of trees increases the performance and makes the predictions more stable but demands more computations); -max_features: the maximum number of features that the random forest considers on a per-split level (the condition is based on variance for regression); -min_samples_leaf: the minimum number of leaves that are required to split an internal node; -max_samples: ratio of samples needed for training each tree.
We applied the 10-fold cross-validation method to tune the four hyperparameters to avoid overfitting the RF models (Fig. S5).Additionally, we used this 10-fold cross-validation to evaluate model performance (Fig. S7).
The detailed irrigation module is shown in Fig. S2.Given that SM is highly sensitive to irrigation application for dryland wheat and maize in China, we first used RF classification (RFC) to build an irrigation module.This module aimed to predict whether irrigation application occurred there and assigns response variable "1" for irrigation and "0" for without irrigation according to the response variables and predictor variables (the same environmental indicators used in producing ChinaCropSM1 km).
The response variable (classified irrigation CIR) was calculated by the irrigation threshold (Table 2) and in situ information, including crop type, phenology and soil depth.Then, we used the forecasted CIR as an additional predictor, integrating with other key predictor variables, to drive RFR for forecasting SM.Considering the regional differences in SM, we randomly sampled in situ SM observations (90 % for training and 10 % for testing) in each agricultural zone to develop the RF model.In total, 98 576 (65 626) and 10 820 (6845) observations were used for training and testing the model for wheat (maize), respectively.All these point samples were used to develop the pointed SM model, and then these pointed models are applied to inversely calculate the gridded SM by inputting 1 km raster environmental variables (Fig. 2).
The hyperparameters in the optimal model were determined as 50, 1, 1 and 4 for the respective n_estimators, max_samples, min_samples_leaf and max_features according to the highest accuracy during training (Fig. S5).We implemented these processes in MATLAB 9.8.0 (R2020a).More information can be found in the MAT-LAB help center (https://www.mathworks.com/help/stats/regressionlearner-app.html,last access: 26 May 2022).
The feature importance was evaluated for the RF model with the greatest regression accuracy by ordering the out-of-bag predictor observations using the MATLAB "oobPermutedPredictorImportance" function (https://www.mathworks.com/help/,last access: https://doi.org/10.5194/essd-15-395-2023 Earth Syst.Sci.Data, 15, 395-409, 2023 26 May 2022).We also used the method to measure the importance of each predictor variable when predicting ChinaCropSM1 km.

Evaluation metrics for validation and comparison
The in situ observations provide the most accurate SM, and all performance measures were calculated using the testing dataset for evaluation purposes.All SM products were evaluated against the in situ observations (testing dataset) according to five metrics: root mean square error (RMSE; m 3 m −3 ), bias (m 3 m −3 ), unbiased RMSE (ubRMSE; m 3 m −3 ), explained variance (R 2 ) and the correlation coefficient (r), which are defined in Eqs. ( 2)-( 6) as follows: where the overbar indicates the mean, P i is the ith prediction SM from products, O i is the ith in situ observation SM, N is the total number of observations, and σ O and σ P are the standard deviations of the in situ observed and predicted SM, respectively.In addition, we compared our four subsets of data with RSSSM and ESA CCI SM separately by evaluating their spatial and temporal accuracies related to in situ surface SM observations (Tables S1 and S2).
3 Results and discussion

Validation of ChinaCropSM1 km products
The scatterplots between the predicted SM and those observations are displayed by soil layers and crops (Fig. 3).We found that the SM predicted by the RF model agreed well with the in situ SM observations, with an ubRMSE of 0.028-0.037,bias of −0.0011-0.0009and r from 0.925-0.944.Additionally, the mean bias in predicting SM for wheat was  negative (Fig. 3a, b), while those for maize were positive (Fig. 3c, d).These findings suggest that maize SM was overestimated, while that for wheat was underestimated.The absolute values of mean bias and RMSE in predicting SM at topsoil depth (0-10 cm) for both crops were relatively larger (e.g., 0-10 cm; RMSE 0.036>0.028)than that for a soil depth of 10-20 cm.This result indicates that the RF model performed better in predicting the SM content in the 10-20 cm layer than in the 0-10 cm layer, which was consistent with previous studies (O and Orth, 2020).

The improvement of ChinaCropSM1 km products with an irrigation module
Interestingly, all prediction accuracies of SM were consistently improved for both crops and depths (Fig. 4) compared with those without an irrigation module (Table S5).Specifically, R 2 values increased by 6.8 %-9.ues decreased by 16 %-23 % (Table S5).Among these, R 2 values for maize SM were slightly improved compared with those of wheat, and RMSE for maize decreased more than that of wheat.This finding further suggests that the irrigation water requirements of maize are higher than those of wheat, which is consistent with the fact that summer maize requires large amounts of water to produce high yields.(Karrou et al., 2012).

The significant scores of different factors for simulating SM
It is critical to select which independent variables are involved in a model, neither too many nor too few, while simultaneously avoiding multicollinearity among them.We have deleted 7 variables due to their high correlations (|r|>0.5),leaving the 11 variables selected (Figs.S3 and S4).Surprisingly, the top scorer was irrigation factor (CIR), followed by pre10 (ante-accumulated precipitation over 10 d) and fc (field capacity) (Fig. 5).Current daily precipitation shows significantly different importance on SM planted by wheat and maize, with a similarity for DOY.However, all other factors show less importance in SM simulations.Compared with the significant roles of precipitation-related variables (e.g., pre10, pre) on SM in most rainfall-fed areas, irrigation shows overwhelming impacts on dryland soil moisture across China (Qiu et al., 2016b).Such results highlight that monitoring management activities more accurately, including irrigation times, areas and quantities, will further improve irrigation modules, consequently improving SM simulations (Wu et al., 2020;Zhang et al., 2015Zhang et al., , 2022)).

The temporal and spatial patterns between ChinaCropSM1 km and the in situ SM observations
The SM values in ChinaCropSM1 km were significantly correlated with the in situ SM observations, with a mean r of 0.92, 0.94, 0.93 and 0.94 for wheat 0-10 , wheat 10-20 , maize 0-10 and maize 10-20 , respectively, during the whole growing period (Fig. 6).The spatial coefficients for wheat at 10-20 cm were generally higher than the surface SM (0.94 vs. 0.92), and the two soil depths of SM in April and September were significantly higher (Fig. 6a, b).We attributed the high spatial correlations of surface SM to irrigation impacts because April and September are planting times for both spring and winter wheat.The better relationships further substantiated that the irrigation module developed in our SM model improves the simulation accuracy for surface SM.Consistently, the spatial coefficients for maize at the 10-20 cm depth were higher than those for the 0-10 cm depth (0.94 vs. 0.93) (Fig. 7c, d).At the sowing (April), heading (July) and milking (August) stages, maize usually demands a large water supply.The spatial coefficient for maize SM at both soil depths from May to August was lower than the mean value potentially due to the lack of irrigation applications (Yin et al., 2016) (Fig. 6).
We further analyzed the temporal pattern of SM accuracy in different regions (Fig. 7).The median r values for the Huang-Huai-Hai Plain and the northern arid and semiarid regions were higher than those in other agricultural regions because of the larger training samples.Our findings further substantiated that a larger training sample size will cause a higher temporal accuracy, indicated by a higher r and a lower RMSE (Fig. S6).However, the poor performance in the Yunnan-Guizhou Plateau might be caused by smaller training samples (Fig. S6).

Comparisons between ChinaCropSM1 km and public global SM products
We further compared our ChinaCropSM1 km with the two popular products through evaluating their spatiotemporal accuracy related to in situ surface SM observations.We summarized their evaluation indices by each individual product in Table 3, which consistently indicated in bold our Chi-naCropSM1 km means (all r>0.90, RMSE <0.04), while RSSSM and ESA CCI SM were shown by r<0.50 and RMSE >0.1.
To match the different spatial resolutions of the three products, we calculated the averages of all in situ observations in the same pixel (e.g., 1 km, 27 km or 0.1 • ) to make their spatiotemporal accuracies comparable.Interestingly, all indices of our products were consistently indicated by the higher accuracy (e.g., r 0.94, bias 0.005, RMSE 0.034, ubRMSE 0.034) (Fig. 9).The RSSSM dataset significantly underestimated SM with an averaged bias of −0.114, accompanied by a higher RMSE of 0.150.ESA CCI SM performed better  than RSSSM (e.g., RMSE 0.11 vs. 0.15) derived from Soil Moisture Active Passive (SMAP) (Entekhabi et al., 2010), and we ascribed such improvement partly to some corrections based on in situ observations for ESA CCI SM (Dorigo et al., 2017).Such results highlight that SM products derived solely from remote sensing satellites should be corrected with ground observations.Additionally, neither RSSSM nor ESA CCI SM considered irrigation activities; thus, their spatial correlations with ground observations are incomparable to those of our products (r 0.944 vs. 0.381 and 0.256) (Fig. 8).Our study strongly substantiates that an irrigation module should be taken into account when developing SM simulation models for producing SM products.

Discussion and conclusions
We developed a daily 1 km soil moisture dataset based on numerous field observations (181 327 samples) from 1993-2018, which significantly enriches the current SM datasets available.ChinaCropSM1 km shows higher spatial and temporal resolution and accuracy than the popular global SM products.Additionally, to date, few studies have provided a daily SM product with such a higher resolution, combining different soil depths and an irrigation module.Chi-naCropSM1 km is the first SM product with a higher spatial resolution (∼ 1 km) at depths of 0-10 and 10-20 cm in croplands in China by compiling ground observations and using the RF method.Our ChinaCropSM1 km predicted by the RF model agreed well with in situ SM observations (ubRMSE ranges from 0.028-0.037,bias ranges from −0.0011-0.0009,r ranges from 0.925-0.944,and R 2 ranges from 0.860-0.895).An irrigation module was first developed according to crop type (i.e., wheat, maize), soil depth (0-10 cm, 10-20 cm) and phenology.All prediction accuracies of SM were consistently improved (R 2 values increased by 6.8∼ 9.7 %, RMSE decreased by 16∼ 23 %) for both crops and depths.Additionally, ChinaCropSM1 km generally has advantages over other popular gridded SM products (RSSSM and ESA CCI SM) through evaluating their spatiotemporal accuracy related to in situ SM as the benchmark.Our ChinaCropSM1 km has relatively higher accuracy (all r>0.90, RMSE <0.04), while RSSSM and ESA CCI SM showed r<0.50 and RMSE >0.1.
The ChinaCropSM1 km dataset is credible and accurate according to the results compared with the public datasets; however, some limitations still exist in our study.First, the limited AMS irrigation records may lead to uncertainty in the irrigation factor predictions.More detailed irrigation information will help to improve irrigation module performance.Second, our method for generating cropland SM is applicable to other regions and crops, but more environmental variables will be increasingly required considering that SM variabilities are complex processes controlled by many factors (Famiglietti et al., 2008;Qin et al., 2013;Guevara and Vargas, 2019), especially for irrigation activities.For example, to more accurately characterize irrigation activities, many field samples are required at both spatial and temporal resolutions.Other auxiliary data on information on crop growth, classification and management (e.g., irrigation frequency, amount and method) will benefit the development of our irrigation module and the accurate derivation of SM datasets.Third, to provide the most extensive SM data as possible, a constant layer integrated with all pixels planting wheat/maize during 2000-2015 (http://dx.doi.org/10.17632/jbs44b2hrk.2,Luo et al., 2020b) was applied to generate our ChinaCropSM1 km.Such merged areas could lead to uncertainties in their spatial distributions because annual wheat/maize planting areas are dynamic over time.To avoid the uncertainties, potential users should mask our products with explicitly annual wheat/maize planting maps to obtain accurate SM data including spatial dynamic information.Fourth, different splitting methods during training and testing affect model performance.Selecting a splitting method to improve the generalization performance is dependent on the data.Generally, the larger the size of the data, the smaller the effect of the splitting methods on the results (Birba, 2020).Additionally, advanced algorithms will be potential alternatives for random forest due to their strong dependence on inputs (Breiman, 2001;Rasmussen, 2004).Improving irrigation modules should focus on details such as irrigation amount and frequency, which will significantly help to verify and improve the accuracy of both irrigation and SM predictions.We anticipate that a more accurate SM dataset will be produced by applying the approach to other crops and areas in the future with all the above improvements.
Author contributions.FC, HZ, ZZ, JH, JC, YL, LZ, JZ, FT and JX contributed to the design of this research.FC and ZZ collectively prepared the manuscript with contributions from all coauthors.JH, JC, YL, LZ, JZ, JX and FT revised the manuscript.FC and HZ developed the model code.
Competing interests.The contact author has declared that none of the authors has any competing interests.
Disclaimer.Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figure 2 .
Figure 2. Flow chart for producing ChinaCropSM1 km with an irrigation module.

Figure 3 .
Figure 3.Comparison between the predicted soil moisture (ChinaCropSM1 km) and in situ samples by crops and depths (cm).(a) wheat 0-10 , (b) wheat 10-20 , (c) maize 0-10 and (d) maize 10-20 .The red lines are the trend lines, the color bar indicates the point density, and the black lines represent the 1 : 1 lines.

Figure 4 .
Figure 4. Comparison of soil moisture accuracy with and without an irrigation module.

Figure 5 .
Figure 5.The importance scores of 11 independent variables and the irrigation factor (CIR).

Figure 9 .
Figure 9. Boxplot of the temporal (a, c) and spatial (b, c) accuracies for ChinaCropSM1 km, RSSSM and ESA CCI SM by r, bias, RMSE and ubRMSE.These evaluation indices were calculated by comparing the three products with in situ SM observations; the comparison period for ChinaCropSM1 km and RSSSM is from 2003 to 2018, and for ChinaCropSM1 km and ESA CCI SM it is 1995-2018.

Table 1 .
Environmental factors used in the study, including meteorological data (MD), day of year (DOY), classified irrigation (CIR), soil properties (SP), remote sensing data (RSD) and geographical information (GI).
Note: REF_BULK: soil bulk density; PH_H2O: hydrogen ion concentration; GRAVEL: volume percentage of crushed stone; T: topsoil layer.The dashed line represents no default values.

Table 3 .
Summary of means of evaluation indices (r, bias, RMSE and ubRMSE) of three products (ChinaCropSM1 km, RSSSM and ESA CCI SM), with better performance highlighted in bold.All products were compared with in situ surface observations (0-10 cm).
Figure 8.Time series of comparison between in situ SM observations and products.