Preprints
https://doi.org/10.5194/essd-2024-109
https://doi.org/10.5194/essd-2024-109
17 May 2024
 | 17 May 2024
Status: this preprint is currently under review for the journal ESSD.

Optimal feature selection for improved ML based reconstruction of Global Terrestrial Water Storage Anomalies

Nehar Mandal, Prabal Das, and Kironmala Chanda

Abstract. Understanding long-term Terrestrial water storage (TWS) variations is vital for investigating hydrological extreme events, managing water resources, and assessing climate change impacts. However, the limited data duration from the Gravity Recovery and Climate Experiment (GRACE) and its follow-on missions (GRACE-FO) poses challenges for comprehensive long-term analysis. In this study, we reconstruct TWS anomalies (TWSA) for the period Jan 1960 to Dec 2022 thereby filling data gaps between GRACE and GRACE-FO missions as well as generating a complete dataset for the pre-GRACE era. The workflow involves identifying optimal predictors from land surface model (LSM) outputs, meteorological variables, and climatic indices using a novel Bayesian Network (BN) technique for grid-based TWSA simulations. Climate indices, like the Oceanic Niño Index and Dipole Mode Index, are selected as optimal predictors for a large number of grids globally, along with TWSA from LSM outputs. The most effective machine learning (ML) algorithms among Convolutional Neural Network (CNN), Support Vector Regression (SVR), Extra Trees Regressor (ETR), and Stacking Ensemble Regression (SER) models are evaluated at each grid location to achieve optimal reproducibility. Globally, ETR performs best for most of the grids which is also noticed at the river-basin scale, particularly for the Ganga-Brahmaputra-Meghana, Godavari, Krishna, Limpopo, and Nile river basins. The simulated TWSA (BNML_TWSA) outperformed the TWSA from LSM outputs when evaluated against GRACE datasets. Improvements are particularly noted in the river basins such as Godavari, Krishna, Danube, Amazon, etc., with median values of the correlation coefficient, Nash-Sutcliffe efficiency, and RMSE for all grids in Godavari, India, being 0.927, 0.839, and 63.7 mm respectively. A comparison with TWSA reconstructed in recent studies indicates that the proposed BNML_TWSA outperforms them globally as well as for all the 11 major river basins examined. The presented dataset is published at https://doi.org/10.6084/m9.figshare.25376695 (Mandal et al., 2024) and updates will be published when needed.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.
Nehar Mandal, Prabal Das, and Kironmala Chanda

Status: open (until 26 Jul 2024)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
  • RC1: 'Comment on essd-2024-109', Anonymous Referee #1, 17 Jun 2024 reply
Nehar Mandal, Prabal Das, and Kironmala Chanda
Nehar Mandal, Prabal Das, and Kironmala Chanda

Viewed

Total article views: 357 (including HTML, PDF, and XML)
HTML PDF XML Total BibTeX EndNote
293 50 14 357 12 11
  • HTML: 293
  • PDF: 50
  • XML: 14
  • Total: 357
  • BibTeX: 12
  • EndNote: 11
Views and downloads (calculated since 17 May 2024)
Cumulative views and downloads (calculated since 17 May 2024)

Viewed (geographical distribution)

Total article views: 351 (including HTML, PDF, and XML) Thereof 351 with geography defined and 0 with unknown origin.
Country # Views %
  • 1
1
 
 
 
 
Latest update: 28 Jun 2024
Download
Short summary
Optimal features among hydroclimatic variables and land surface model (LSM) outputs are selected using a novel Bayesian network (BN) approach for simulating Terrestrial Water Storage Anomalies (TWSA). TWSA is simulated using ML models (CNN, SVR, ETR, and Stacking Ensemble Regression), and gridwise leader models are identified globally. TWSA is reconstructed (BNML_TWSA) with the selected leader models from January 1960 to December 2022 to generate a continuous global gridded dataset.
Altmetrics