A seamless global daily 5 km soil moisture product from 1982 to 2021 using AVHRR satellite data and an attention-based deep learning model
Abstract. Soil moisture (SM) data records longer than 30 years are critical for climate change research and various applications. However, only a few such long-term global SM datasets exist, and they often suffer from large biases, low spatial resolution, or spatiotemporal incompleteness. Here, we generated a consistent and seamless global SM product from 1982 to 2021 using deep learning (DL) by integrating four decades of Advanced Very High Resolution Radiometer (AVHRR) albedo and land surface temperature products with multi-source datasets. Considering the temporal autocorrelation of SM, we explored two types of DL models that are adept at processing sequential data, including three long short-term memory (LSTM)-based models, i.e., the basic LSTM, Bidirectional LSTM (Bi-LSTM), and Attention-based LSTM (AtLSTM), as well as a Transformer model. We also compared the performance of the DL models with the tree-based eXtreme Gradient Boosting (XGBoost) model, known for its high efficiency and accuracy. Our results show that all four DL models outperformed the benchmark XGBoost model, particularly at high SM levels (> 0.4 m3 m-3). The AtLSTM model achieved the highest accuracy on the test set, with a coefficient of determination (R2) of 0.987 and root mean square error (RMSE) of 0.011 m3 m-3. These results suggest that utilizing temporal information as well as adding an attention module can effectively enhance the estimation accuracy of SM. Subsequent analysis of attention weights revealed that the AtLSTM model could automatically learn the necessary temporal information from adjacent positions in the sequence, which is critical for accurate SM estimation. The best-performing AtLSTM model was then adopted to produce a four-decade seamless global SM dataset at 5 km spatial resolution, denoted as the GLASS-AVHRR SM product. Validation of the GLASS-AVHRR SM product using 45 independent International Soil Moisture Network (ISMN) stations prior to 2000 yielded a median correlation coefficient (R) of 0.73 and unbiased RMSE (ubRMSE) of 0.041 m3 m-3. When validated against SM datasets from three post-2000 field-scale COsmic-ray Soil Moisture Observing System (COSMOS) networks, the median R values ranged from 0.63 to 0.79, and the median ubRMSE values ranged from 0.044 to 0.065 m3 m-3. Further validation across 22 upscaled 9 km Soil Moisture Active Passive (SMAP) core validation sites indicated that it could well capture the temporal variations in measured SM and remained unaffected by the large wet biases present in the input European reanalysis (ERA5-Land) SM product. Moreover, characterized by complete spatial coverage and low biases, this four-decade, 5 km GLASS-AVHRR SM product exhibited high spatial and temporal consistency with the 1 km GLASS-MODIS SM product, and contained much richer spatial details than both the long-term ERA5-Land SM product (0.1°) and European Space Agency Climate Change Initiative combined SM product (0.25°). The annual average GLASS-AVHRR SM dataset from 1982 to 2021 is available at https://doi.org/10.5281/zenodo.14198201 (Zhang et al., 2024), and the complete product can be freely downloaded from https://glass.hku.hk/casual/GLASS_AVHRR_SM/.