the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Optimal feature selection for improved ML based reconstruction of Global Terrestrial Water Storage Anomalies
Abstract. Understanding long-term Terrestrial water storage (TWS) variations is vital for investigating hydrological extreme events, managing water resources, and assessing climate change impacts. However, the limited data duration from the Gravity Recovery and Climate Experiment (GRACE) and its follow-on missions (GRACE-FO) poses challenges for comprehensive long-term analysis. In this study, we reconstruct TWS anomalies (TWSA) for the period Jan 1960 to Dec 2022 thereby filling data gaps between GRACE and GRACE-FO missions as well as generating a complete dataset for the pre-GRACE era. The workflow involves identifying optimal predictors from land surface model (LSM) outputs, meteorological variables, and climatic indices using a novel Bayesian Network (BN) technique for grid-based TWSA simulations. Climate indices, like the Oceanic Niño Index and Dipole Mode Index, are selected as optimal predictors for a large number of grids globally, along with TWSA from LSM outputs. The most effective machine learning (ML) algorithms among Convolutional Neural Network (CNN), Support Vector Regression (SVR), Extra Trees Regressor (ETR), and Stacking Ensemble Regression (SER) models are evaluated at each grid location to achieve optimal reproducibility. Globally, ETR performs best for most of the grids which is also noticed at the river-basin scale, particularly for the Ganga-Brahmaputra-Meghana, Godavari, Krishna, Limpopo, and Nile river basins. The simulated TWSA (BNML_TWSA) outperformed the TWSA from LSM outputs when evaluated against GRACE datasets. Improvements are particularly noted in the river basins such as Godavari, Krishna, Danube, Amazon, etc., with median values of the correlation coefficient, Nash-Sutcliffe efficiency, and RMSE for all grids in Godavari, India, being 0.927, 0.839, and 63.7 mm respectively. A comparison with TWSA reconstructed in recent studies indicates that the proposed BNML_TWSA outperforms them globally as well as for all the 11 major river basins examined. The presented dataset is published at https://doi.org/10.6084/m9.figshare.25376695 (Mandal et al., 2024) and updates will be published when needed.
- Preprint
(39262 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on essd-2024-109', Anonymous Referee #1, 17 Jun 2024
Review comments for the manuscript "Optimal feature selection for improved ML based reconstruction of Global Terrestrial Water Storage Anomalies" of Mandal et al.:
The manuscript describes the derivation of a global long-term terrestrial water storage anomalies data-set, derived from a blending of GRACE satellite observations and global land surface models based on Bayesian networks and machine learning methods.
Such long term information about terrestrial water storage variations is valuable and can help to assess long term trends and to localize extremes. Therefore, I think the manuscript is relevant for publication, however in its current form it lacks to address uncertainties of the product and it is rather structured like a classic scientific study than a data description paper. The structure of the dataset itself is not suitable for efficient usage in the current form and needs revision. Thus for a publication in ESSD, my concerns are as follows.
1) From the title it does not become clear what dataset you would like to advertise. Should it be the TWSAs or the optimal features? I guess its the TWSAs that you would finally like to advertise so you should find a new title in the sense of "ML-based ... long-term terrestrial water storage anomalies from satellite and land-surface model data ...". The term "Optimal feature selection" does not generate an association with a data product, at least for me.
2) In the abstract, you write that you reconstruct TWSA but you don't specify the a grid type and the spatial resolution of the produced dataset. Do you only provide the gridded dataset or also basin aggregates? You should provide this information already in the abstract, although very briefly, so that the reader knows what to expect. Further, I suggest to add a section for data description that explains the structure and content of the final data product in the repository
3) The term "optimal predictors" is mentioned in the title and introduction but the explanation in the methods section (3, 3.1) is not fully clear. What are the optimal predictors? Are they a subsection of your full predictors list? Do you drop training data sets? Are the optimal predictors the ones that have the maximum impact (weight) in the ML algorithms? This should be made more clear and the benefit of knowing the optimal predictors should be outlined.
4) You are not always consistent with your vocabulary, in 3.1 you introduce the term features for what you named previously predictors. I think you should keep a single notion here (and mention the term feature only once, maybe in brackets if this is needed because it's well known by the community).
5) Reproducibility: for making the creation of your data set reproducible, you should at least mention which software tools you used for the machine learning and eventually publish the configurations alongside with your datatset or in another DOI based repository.
6) The selection of evaluation metrics may not be ideal for the global evaluations. CC will always be high for regions with a clear annual amplitude whereas for the deserts with less variations and seasonality it is hard to get a good score in CC. NSE is especially designed for assessing peak flows. Maybe KGE would suit better here. And wouldn't a directed error metric like ME provide additional insight on over- or underestimation tendencies?
7) I think the introduction and the 4.6. Section could be shortened a bit in favor of a data(set) description section.
8) For many of the references DOIs are missing. For several DOI links are incorrect with duplicates in their URLs.
9) Your results should be evaluated in the light of uncertainties of GRACE based water storage anomalies, e.g., https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2021JB022081; there are several different GRACE solutions available which have different levels of uncertainty (https://doi.org/10.1029/2023JB026908) so why did you select exactly one of them and how would the uncertainties of the GRACE product propagate into your BN_TWSA product? The characterization of uncertainties of your gridded and aggregated data sets would be important with respect to deriving any long term trends.
10) The structure of the published dataset does not follow any data standards. Further the naming of the downloadable zip file, Mississipi_Data.zip does not comply with the contained global grids. You should use descriptive filenames and use modern standard data formats, e.g., CDF conform (netCDF) self-describing data, geotiff, ... and a self describing tree structure. You can get inspiration for instance from other publications in ESSD
Minor things:L86/87: no commas in large numbers
Table 1: Provide not only publications for the data-sets but also the DOI references where they can be obtained; add the acronyms / abbreviations that you later use in the analysis and figures (e.g. Fig. 2, NTWSA, CTWSA)
L199: you name three types of ML algorithms but then are 4 listed and described
L274: That's the third different usage of P in the manuscript (Probability, Precipitation, and Prediction in the evaluation metrics)
L279: A grid is usually defined as a collection of adjacent pixels. You are using the term grid instead of pixel. I suggest to change it to either pixel or grid cell / cells.
Fig.2 Expand acronyms in the figure caption, make the caption more explanatory. From the colors it appears that several optimal predictors overlap for the same regions / pixels
Fig.3 Avoid red and green in the same figure (colorblind check, you can use https://www.color-blindness.com/coblis-color-blindness-simulator/ for checking)
L333: grid-based -> pixel based
Fig. 6: describe gray bars in figure caption (gaps in GRACE solutions)
Fig. 7: Change 1:1 line to non-dashed gray with thicker linewidth to make it distinguishable from the data. Use colors with better contrast for BNML and CTSWA
Abstract L18: remove "and updates will be published when needed"
Citation: https://doi.org/10.5194/essd-2024-109-RC1 -
RC2: 'Comment on essd-2024-109', Anonymous Referee #2, 03 Sep 2024
General commentsMandal et al. describes a data-driven reconstructed global product of TWS, namely BNML_TWSA. A Bayesian Network technique is used to find an optimal set of predictors. Then, several ML algorithms are trained at each grid cell. The authors choose the best ML algorithm for each grid cell to fore- and hind-cast TWS across the globe. Compared with several existing reconstructed TWS products and estimates by land surface models, the new product shows better agreements with GRACE observations at grid cell and basin scales, with being capable to capture some historical hydroclimatic extreme events. Based on the evaluation results, the authors conclude that the newly developed TWS product is reliable and can be used for hydroclimatological studies.Overall, I find the manuscript easy to follow. The topic is relevant to the journal, as TWS is an influential variable in many aspects of the Earth system functioning. The evaluation across space and time done by the authors is informative to see how good the ML-derived product performs. However, I still have some concerns regarding the robustness of the new product, majorly coming from the lack of (source of) uncertainty and the way it is evaluated. Please find the comments below. I hope they are helpful to improve the manuscript.Major comments:- The the (potential) source(s) of uncertainty needs to be discussed, which the current version of manuscript lacks. There could be some potential sources of uncertainty. First, I wonder if there is any overfit issue in the product as it is fully based on ML and the learning has been done for each grid cell in the study domain. In addition, I wonder if there is any way for BNML_TWSA to capture human impact on TWS and its trend. If no, then it can either be regarded as a source of uncertainty and be specified or be precluded from the model training. If yes, then the authors may add explanations. Lastly, as BNML_TWSA highly depends on TWSA estimates by selected LSMs, the common errors by LSMs, such as the phase shift in mean seasonal cycle (e.g., Bibi et al., 2024) or worse performance in (semi-) arid regions, could propagate to the results. The performance of BNML_TWSA can also potentially be influenced by the precipitation product used.Bibi, S., Zhu, T., Rateb, A., Scanlon, B. R., Kamran, M. A., Elnashar, A., Bennour, A., and Li, C.: Benchmarking multimodel terrestrial water storage seasonal cycle against Gravity Recovery and Climate Experiment (GRACE) observations over major global river basins, Hydrol. Earth Syst. Sci., 28, 1725–1750, https://doi.org/10.5194/hess-28-1725-2024, 2024.- I think that readers or potential data users can benefit from additional evaluations. It can be seen that the better performance of BNML_TWSA is expected, as 1) it uses TWSA from LSM(s) as a predictor for many grid cells, 2) the authors choose the best ML algorithms among trained and tested for each grid, and 3) the model is evaluated using TWSA time series. For example, the GRACE-REC by Humphrey and Gudmundsson (2019) that the authors used in the comparison is calibrated against the detrended and deseasonalized TWSA time series, which therefore may not be as good as BNML_TWSA by the design. The results from the evaluation can be seen as the strength of BNML_TWSA, but BNML_TWSA can also benefit from being fairly evaluated against variables that it is not trained with. Evaluating BNML_TWSA using independent variables is important to prove it's ability to extrapolate, because the product has already learned partly the GRACE TWSA information via CTWSA which assimilates GRACE TWSA observations (this fact should have been noted in the main text, I think). As a possible, but not nessesarily the only, way for the additional evaluation, one can suggest that the authors can repeat the evaluation done by Humphrey and Gudmundsson (2019). In the paper, GRACE-REC is evaluated with several independent datasets including sea level budget, streamflow measurements, and basin-scale water balance. This repetition can also work well to compare BNML_TWSA with GRACE-REC. Another way can be evaluating BNML_TWSA at seasonal and interannual (i.e., detrended and deseasonalized) temporal scales. What makes the evaluation at these temporal scales good is because the temporal scales are of a strong interest in multiple communities (e.g., carbon cycle, hydrology, and climate), so the results can be informative for both the product itself and potential users; also it can be regarded as a more fair way to examine BNML_TWSA as it is not trained at this temporal scales.- I think that the title is misleading. What is the role of the feature selection (i.e., BN) on BNML_TWSA? The title can be seen that using the optimal set of feature given by BN is the key to improve the ML based product in the study, but the relevant section or explanation cannot be seen from the current version of manuscript. So, the contribution of deploying the BN technique to the quality of BNML_TWSA could be more elaborated, or the title could be updated.- What is the value of examing multiple algorithms for each grid cell? It's clearly reported in the manuscript (e.g., Figure 3) that the spatial pattern leader model is very heterogeneous. However, it has not been reported how different the performance of tested models are, and what the differences are in the actual estimates. I wonder the actual influence on the resulted time series would be minor (e.g., a comparison between the current BNML_TWSA and another BNML_TWSA using the algorithm with the poorest performance for each grid cell), as many ML models usually show similarly good performance.- The results and discussion section is mostly about presenting how good the performance metrics for BNML_TWSA is, which could have been deeper to provide more insight about the product's applicability. Having evaluation from more diverse aspect as in the second bullet point would help with improving this aspect.- The dataset provided includes estimates for the Mississippi river basin only, while one would expect estimates for the whole global land grid cells, according to the title and abstract.Minor and technical comments:- The current manuscript has many in-text narrative citations that are wrongly used, e.g., L311, L334-335, and so on.- The introduction can be improved by better addressing the motivation to have a new product. Currently, it introduces TWS variable and examples of (ML-based) TWS reconstruction studies. The authors introduce that using a feature selection process can be a novelty of BNML_TWSAm, while testing multiple ML algorithms is important, which can be, but are not necessarily the reason to have a new product. The authors could better present the motivation by showing why having (or lacking) feature selection and multiple ML algorithms are critical for users and their science. Or, showing from which aspect existing reconstructed TWS products are less reliable/robust can better show the motivation. This will also help with having a focused presentation in the results section.Specific commentsAbstractIntroductionL23: There can be more references, especially ones done at the global scale.L24: This sentence needs more clarification. Which physical processes are missing? What are the influence on the estimates from which aspect?L29: I think that Mo et al. (2022) is not a proper reference for the sentence. The study is to report a new product, not to examine the human and climatic impact on water cycle.L41: The reconstruction by Humphrey et al. (2017) is at the global scale. Only the example application is for the Amazon Basin.L32-59: This paragraph is basically list up previous studies wtih a few sentences for each. I wonder if this is the best way of storytelling for readers.L60: The authors could first list up what the categories are.L77-78: As mentioned above, there need to be more elaboration on the importance of the feature selection on the TWSA reconstruction and the applications.Data and ProcessingL108: This counters the sentence L78-80L112-113: Are there any reasons to choose Noah and CLSM specifically?L125-126: Why doesn't LSMs fully use the information? This can be more elaborated.L129: I think that the time period of analysis hasn't specified before.L143: Although CLSM provides TWS directly, the authors should be able to refer to other materials to know which processes CLSM accounts for to calculate TWS. Please add this description, also plase update Table 1 accordingly.Eq.1: Sun et al (2019) mentioned that Noah does not account for surface water storage. Please clarify this.L149: "may be" or "can be"? It's a bit weird to use "may be" in this sentence.L152-154: It was not clear for me if the prior months are for P and T or for TWSA, too. I suggest to rephrase the sentence.L153: Why aren't the prior months used for the climate indices?L155: SVR and ETR need to be introduced as their full name.Figure 1: For the correlation coefficient, CC has been used through out the manuscript, instead of R.L231: 'built', not 'build', I thinkL233: Is the feature selection procudure in ETR independent to BN?L266-267: For each grid cell, does BN give the optimal set of predictors to each ML algorithm or is the set common for all ML algorithms? If it's the former, how can one be sure that different ML algorithms share the same optimal set of predictors?Eq.5: On the right hand side, does the denominator use Pi-Pbar or Oi-Obar? The typical NSE equation uses observations for the denominator. Please check this.Results and DiscussionsFigure 2: Would it be reasonable to interpret the results as the emerging importance of the variables to the global TWSA? For example, can the results be seen as that North Atlantic Oscillation has the least influence on the collective gridwise TWSA among the three modes of climate variability?L291: Please see the comment for Figure 1Figure 3: Is it expected that ETR would pop up as the leader? Is there any possible explanation for this, based on the nature of each algorithm?L306-307: It is not clear that BNML_TWSA performs better than LSMs, especially for the case of CTWSA. Could be clearer with histogram of metrics or the map of differences in metrics between BNML_TWSA and LSMs.Figure 4: It should be mentioned that BNML_TWSA also shows significant biases in cold or arid regions, and even in a wet region (e.g., a part of the Congo Basin).Figure 6: What does the shaded area stand for? What is the rationale that BNML_TWSA captures the GRACE TWSA trend, for example, in Indus and GBM Basins, where the TWSA trend would largely be affected by human activity?Section 4.5: It is great to prove the ability of BNML_TWSA to capture the hydrological extreme events that the MLs were not informed with. However, this section is only mentioning a specific type of event, flood. One could compare the historical TWSA time series of several basins in different climate zones with the corresponding time series of climate indices, drought indices, or precipitation.Figure 9: I feel that the way to show the ability of BNML_TWSA to capture the historical flood events can be seen as inappropriate. For example, in the map of the USA, I can see many other grid cells as bluish as ones in the green box. Does it mean that all the grid cells similarly bluish as ones in the green box are flooded?Section 4.6: It is recommended to compare BNML_TWSA and previous studies using independent data sets (please see the first major comment). Also, it needs to be noted that GRACE-REC is calibrated against detrended and deseasonalized GRACE TWSA.(may not be good at captureing the local TWSA correctly, if GLDAS is calibrated using TWS? BNML_TWSA largely depends on GLDAS LSMs which cannot be reliable at finer spatial scales than the original GRACE spatial resolution) <-- need to check how GLDAS LSMs simulates TWSConclusionsL432-434: I think that this point can be mentioned in the main text, possibly with a deeper discussion (i.e., implications of the distribution of selected gridwise predictors for the glbola TWSA and a possible explanation).Data- Please provide the unit and the file naming convention.- This should be also noted in the main text with the number or portion of the grid cells: "# For very few grids BN is not identifying any predictors or only one predictor. Grids with zero or one optimal predictor identified by BN are trained using all 15 potential predictors ('P', 'T', 'NOAH_TWSA', 'CLSM_TWSA', 'DMI', 'NAO', 'ONI', 'P1', 'T1', 'NOAH_TWSA_1', 'CLSM_TWSA_1', 'P2', 'T2', 'NOAH_TWSA_2', 'CLSM_TWSA_2')." Also, I would exclude these grid cells with zero or one predictor identified from Figure 2, if it makes a significant difference.Citation: https://doi.org/
10.5194/essd-2024-109-RC2
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
558 | 75 | 24 | 657 | 26 | 27 |
- HTML: 558
- PDF: 75
- XML: 24
- Total: 657
- BibTeX: 26
- EndNote: 27
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1