the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Multidecadal reconstruction of terrestrial water storage changes by combining pre-GRACE satellite observations and climate data
Abstract. The Gravity Recovery And Climate Experiment (GRACE) and its follow-on mission, GRACE-FO, have observed global mass changes and transports, expressed as terrestrial water storage anomalies (TWSA), for over two decades. However, for climate model evaluation, climate change attribution and other applications, multi-decadal TWSA time series are required. This need has triggered several studies on reconstructing TWSA via regression approaches or machine learning techniques, with the help of predictor variables such as rainfall, land or sea surface temperature. Here, we combine such an approach, for the first time, with large-scale time-variable gravity information from geodetic satellite laser ranging (SLR) and Doppler Orbitography by Radiopositioning Integrated on Satellite (DORIS) tracking. The new reconstruction TWSTORE (Terrestrial Water STOrage REconstruction) is formulated in a GRACE-derived empirical orthogonal functions (EOFs) basis and complemented with the Löcher et al. (2025) approach, in which global gravity fields are solved from SLR ranges and DORIS observations in EOF space for the pre-GRACE time frame. Our approach is highly modular, allowing to use different data sets at several steps in the workflow.
We reconstruct GRACE-like TWSA for the global land, excluding Greenland and Antarctica, from 1984 onward. We find that the new combined reconstruction inherits information from the geodetic method, mainly at longer timescales. In contrast, at the seasonal scale, the climate-driven reconstruction and the geodetic product are already surprisingly consistent. In comparison to other reconstructions, we find thus major differences mainly at the multi-decadal timescale. All in all, our study confirms the presence of significant changes in storage trends, showing that GRACE-derived results should not be extrapolated to the past. The reconstructed fields and corresponding uncertainty information are available at https://doi.org/10.5281/zenodo.15827789 (Hacker, 2025). We also derive evaporation based on the water balance equation and the presented reconstruction for 11 river basins. The corresponding time series are available at https://doi.org/10.5281/zenodo.16643628 (Gutknecht, 2025).
- Preprint
(17446 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on essd-2025-461', Anonymous Referee #1, 22 Dec 2025
-
AC1: 'Reply on RC1', Charlotte Hacker, 02 Feb 2026
Answer to the reviewer:
Review of Multidecadal reconstruction of terrestrial water storage changes by combining pre-GRACE satellite observations and climate data by Hacker et al. submitted to Earth System Science Data.
In this study, Hacker et al. provide a new GRACE-like TWSA reconstruction back to 1984. The major contribution is to involve the SLR/DORIS-derived TWSA as additional input to keep the fidelity of reconstructed products. The benefits of this approach have been demonstrated through multiple internal/external validations.
Reconstructing GRACE-like TWSA products is generally important for the community, especially for analyzing long-term variations. This study makes an important contribution to the community by involving the alternative geodetic observations in the pre-GRACE era. I only have several rather moderate comments before the manuscript can be published. Most of them are related to possible improvements of the evaluation approaches.
Thank you for the positive assessment.
General concerns: The evaluations/validations can be further improved. As a data description paper for ESSD, rigorous validation is quite important. However, the evaluations in 4.1 and 4.2 are not fully independent as the SLR/GRACE data are used for inputs/targets, while 4.3 only presents the internal comparisons. The authors should try to perform more independent evaluations so that the audience will be more confident in the data. Here are two thoughts from my side: 1. An independent comparison can be performed by computing the TWS-derived global mean sea level (GMSL) variations, like Fig. 7 in Gentner et al. (2025). The data can be obtained from Frederikse et al. (2020). To this end, the trend and inter-annual signals contained in TWSTORE can be better demonstrated, and the benefits of including SLR/DORIS observations should be better argued.
Thank you for the suggestion. We incorporated a corresponding subsection within our results section, comparing the recommended data set from Frederikse et al. (2020) and the GMSL data product produced within the ESA Sea Level Budget Closure CCI project (Horwath et al. 2022) with our reconstruction. We also derived the Pearson correlation, RMSD, and trend for the GMSL product, GRACE, SLR-DORIS, and the other reconstructions used in section 4. We find good agreement between the GMSL-based TWS and our reconstruction, which increases our confidence in our data set.
2. Section 5.2 in the manuscript actually provides another good way for evaluating the data based on water balance. It might be a good idea to perform a global study based on some global dataset of precipitation, evapotranspiration, and runoff, and report some quantitative metrics. It is basically an extension of the current section 5.2 to the whole globe.
We appreciate the suggestion for an extended global evaluation, which we fully support in principle. However, we would like to emphasize the fact that during the submission process we were explicitly asked by the editor to ensure that the manuscript would strongly focus on novel data-product publishing, and not be an analysis-heavy evaluation or iteration. As a compromise that should balance (1) manuscript length, (2) showcase potential use cases and (3) evaluation, we had decided for the initial submission to restrict this section to the carefully selected three cases (out of 11 total assessed, where uninterrupted coverage is even given for all budget components). In fact, the very purpose of the selection was to highlight the behaviour in different climatologically conditioned areas and, hence, also motivate future research involving TWSTORE; but explicitly NOT to provide a complete new analysis which would go far beyond the scope of the paper.
A validation against ‘global’ datasets of the named variables would be restricted even more to model- and reanalysis-heavy products, not observations. Given the well-known deficient hydrological trend behaviour of such models, we do not see the clear benefit here --- especially when weighing the necessary effort against the clear journal requirements mentioned above.
However, we also understand that potential users of TWSTORE data might (and should) demand sufficient quantitative information and require the released set to be adequately tested in its temporal and global extent. Therefore, besides the addition of substantially more TWS-validation metrics in the previous section itself, we rephrased the water balance section in a way that it is now more obvious to the reader that we assessed more than the three “showcased” basins presented in the main document. In order to provide more quantitative metrics from our globally distributed selection of catchments, we have now added Pearson correlation coefficients with three timeseries of other ET data sets over the same integration areas where we had derived ET-series via the budget equation for the full 1984/2020 time span: (1) GLEAM, (2) ERA5, and (3) a Budget-Approach where GPCC is replaced with ERA5 total precipitation. We believe that this addition clearly helps to rank the potential of the reconstruction and we are confident that this may be an acceptable solution in order to balance the need for improved quantitative assessment and at the same time focus on the release of the main data product, i.e. not inflate the scope of the manuscript.
Besides, the comparison with other reconstructions mentioned in lines 71 – 73 is relevant. However, this comparison could be further improved to better demonstrate the contribution of this study. It is clearly mentioned in Humphrey and Gudmundsson (2019) that their reconstruction is only “climate-driven” and the long-term trends are removed. And the trends in Li et al. (2019) are based on linear extrapolation to the past, so they are also not that reliable. It would be good to involve some more recent studies that made additional efforts to reconstruct long-term variability, such as Yin et al. (2023), Palazzoli et al. (2025), Gentner et al. (2025). All those products are publicly available and easily accessible. Moreover, some more quantitative comparison results in addition to the current Section 4.3 are appreciated.
Thank you for the comment. We added the reconstructions by Palazzoli et al. (2025) and Mandal et al.(2025) to our discussion. We omitted the one by Yin et al. (2023) because only about 10% of the grid points in the downloaded file have uninterrupted time series. Furthermore, we choose to focus on long-term signals, namely trend, acceleration, and the interannual signal, as the reconstructions show high agreement for the seasonal and subseasonal signal components, which is in agreement with Hacker and Kusche (2024). We derived a trend, acceleration, and interannual signal based on area-averaged time series of (sub)continents and included those results in the discussion. We hope that by adding the time-series-based metric, a more quantitative comparison is achieved.
Specific comments: [1] line 41: Please be clearer if you mean the missing input uncertainties, or the missing uncertainty information associated with those reconstructions, or maybe both.
Indeed, the uncertainty in the reconstructions is also limited by the missing input uncertainty. However, the statement relates to the reconstructions. We clarified the statement to: “On the downside, uncertainty information for the derived reconstructions may be missing or ambiguous, as it is challenging and complex to represent the various types of uncertainty in machine learning predictions and to assess the effects of unknown biases in climate data.”
[2] lines 43 – 44: “learned relations between climate and water storage can almost certainly not be transferred straightforwardly to the past” is a great motivation to involve the observations (SLR/DORIS) to further constraint the past reconstructions. Please highlight this point a bit more.
We added the following statement: “To overcome this issue, we stabilize our reconstruction using gravity fields observed via satellite-geodetic tracking data from the 80s onward.”
[3] Section 2 Methods. A schematic diagram showing the whole workflow would be beneficial for the audience to understand the full method, especially the relationship between EOF analysis and the three selected regression methods.
Thank you for the suggestion. We added a flowchart to improve the understandability of our methodology and the connections between its steps.
[4] lines 116 – 120: Dividing the whole world into different basins can indeed enhance the local performance. But will this strategy also introduce some discontinuities at the basin boundaries? Please comment on it. It may relate to the argument on lines 124 – 129. Is the missing cross-basin correlation a desired feature of the method or maybe a limitation?
Thank you for raising this valid point. To some extent it is a desired feature, to keep the problem manageable. However, the basin-wise approach introduces potential signal discontinuities at basin boundaries, affecting signal magnitude and across-basin spread. We experimented with the amount and size of basins. However, there is no optimal basin size, and selecting basin sizes is always a trade-off between the detail of the recovered signal and signal degradation due to reducing the signal to a specific polygon. However, signal inconsistencies at the boundaries are partly mitigated by the data combination, as the polygons used for reconstruction and data combination are not identical. We added: “The downside of the basin-wise approach is the potential for signal inconsistencies between bordering basins. However, this effect is partly negated by the data combination between the preliminary reconstruction and the SLR-DORIS data, as the basins used in the data combination are spatially larger and differ from those used in the reconstruction (figures A1 and A2).” to the assertion in line 116-120. The issue or limitation of missing cross-basin correlation is addressed in the conclusions, in response to the second reviewer's comments.
[5] Fig. 2: Some of the known strong linear-trend signals are associated with high RMSDs (such as Alaska or the North of India), but some of them are not (such as the High Plain Aquifer and North China Plain). Could you please comment on the reasons behind these differences?
Sure, this is an interesting point. For the North China Plain, the RMSD is around 60-80 mm for the GRACE period and 140-160 mm for GRACE-FO, indicating a mismatch across both time frames that has increased over time. Looking at the time series of GRACE/-FO, TWSTORE, SLR-DORIS, and the preliminary reconstruction over this region, a substantial decrease in TWSA, due to groundwater extraction, droughts, and drier conditions (Li et al. 2021, Jasechko et al., 2024, Long et al., 2025), is notable in GRACE/-FO and SLR-DORIS. The preliminary reconstruction and TWSTORE, however, exhibit stable TWSA, resulting in high RMSD values. Now, two questions arise: (1) why does the preliminary reconstruction not pick up the substantial decrease in TWSA, notable in GRACE, which at least is reflected in precipitation and soil moisture, and (2) why is the time series not “corrected” by the SlR-DORIS data? To answer(1): I highly suspect the reason to be a mix of the (short) training period, selected predictors, and some influence due to the PCA that leads to a stable TWSA prediction after the training period. To answer (2): For the data combination, the North China Plain is “absorbed” into a bigger overarching region (Figures B1 and C1) to accommodate the lower SLR-DORIS resolution, leading to a hampering of the TWSA decline. A change in the overarching region and reaggregation of the North China Plain might lead to a different outcome in the data combination, with TWSTORE following the decline in TWSA measured by SLR-DORIS.
For the High Plains Aquifer, the RMSD values range from 20 to 80 mm for the two time frames. The values result from a mismatch in seasonal amplitude (mostly underestimation during the GRACE period; overestimation during the GRACE-FO period). The overall decline due to low natural recharge over extraction is also quite visible in the reconstruction.
We added: “We find RMSD values of around 120--180 mm between the reconstruction and GRACE-FO for certain aquifers that experience (accelerated) groundwater decline Jasechko et al.(2024), such as the North China Plain, the Middle Chao Phraya basin, the Lower Zayandeh-Rud basin, and the Southern High Plains. The mismatch is mainly due to our preliminary reconstruction not accounting on the accelerated declines in TWSA, which were less severe during the training period.” to the section.
References:
Long, D., Xu, Y., Cui, Y. et al. Unprecedented large-scale aquifer recovery through human intervention. Nat Commun 16, 7296 (2025). https://doi.org/10.1038/s41467-025-62719-5
Jasechko, S., Seybold, H., Perrone, D. et al. Rapid groundwater decline and some cases of recovery in aquifers globally. Nature 625, 715–721 (2024). https://doi.org/10.1038/s41586-023-06879-8
Li, X., Ren, G., You, Q., Wang, S., and Zhang, W. (2021). Soil moisture continues declining in north China over the regional warming slowdown of the past 20 years. J. Hydrometeorol. 11, 3001–3015. doi:10.1175/jhm-d-20-0274.1
[6] Page 19 / Section 5.1: This section includes many short paragraphs. Please consider merging some of them.
We merged some paragraphs from the discussion, reducing the total number while maintaining some structure.
References
Frederikse et al. (2020). The causes of sea-level rise since 1900. https://doi.org/10.1038/s41586-020-2591-3 Gentner et al. (2025). DeepRec: Global Terrestrial Water Storage Reconstruction Since 1941 Using Spatiotemporal-Aware Deep Learning Model. https://doi.org/10.22541/essoar.175138855.54947789/v1 Palazzoli et al. (2025). GRAiCE: reconstructing terrestrial water storage anomalies with recurrent neural networks. https://doi.org/10.1038/s41597-025-04403-3 Yin et al. (2023). GTWS-MLrec: global terrestrial water storage reconstruction by machine learning from 1940 to present. https://doi.org/10.5194/essd-15-5597-2023
Citation: https://doi.org/10.5194/essd-2025-461-AC1
-
AC1: 'Reply on RC1', Charlotte Hacker, 02 Feb 2026
-
RC2: 'Comment on essd-2025-461', Anonymous Referee #2, 22 Dec 2025
This review is for the article titled “Multidecadal reconstruction of terrestrial water storage changes by combining pre-GRACE satellite observations and climate data” by Hacker et al., submitted to Earth System Science Data.
General comments
The manuscript presents a new global reconstruction of terrestrial water storage anomalies (TWSA) back to 1984, combining climate-based regression with SLR/DORIS constraints in a GRACE-derived EOF space. This extension into the pre-GRACE era is valuable, and the paper is generally clear and methodologically innovative.
However, confidence in the reconstruction would benefit from stronger validation and clearer discussion of methodological choices. In particular, the validation strategy, the comparison with existing reconstructions, and the interpretation of long-term variability could be expanded to better support the conclusions. Comparisons with other reconstructions are useful but could more clearly explain why products diverge and how those differences affect interpretation.
Overall, the dataset has clear potential, but the manuscript would be strengthened by a more cautious interpretation of long-term variability and by additional independent evaluation. Therefore, I recommend a moderate revision addressing the points below.
Specific comments
- In Section 3.2, the rationale for resampling all climate datasets to a 0.5° spatial resolution, except for SST, is not clearly explained. Please clarify why SST is treated differently and how this choice affects comparability across predictors.
- The manuscript compares TWSTORE against earlier TWSA reconstructions (Humphrey and Gudmundsson, 2019; Li et al., 2021; Chandanpurkar et al., 2022). While these products are widely used, more recent reconstructions are now available (e.g., Yin et al., 2023; Palazzoli et al., 2025;) that also span the TWSTORE period. The authors should explain why these newer datasets were not included and consider adding them to strengthen the comparative assessment.
- Section 4 would benefit from quantitative support. When discussing differences in trends, accelerations, annual amplitude and phase, and seasonal/inter-annual variability, the text largely narrates what is visible in the spatial maps in Figure 1. Including numerical metrics (e.g., basin-averaged values, global means, or RMSE differences) would allow readers to assess the magnitude of improvements attributable to the combined reconstruction.
- In Section 4.2, the comparison with GRACE/GRACE-FO is summarized through correlation and RMSE maps (Figures 2–3). It would be helpful to accompany these maps with representative numerical values. For example, when stating that correlations are generally lower in arid and high-elevation regions, please provide average correlations over those climate classes so that the reported patterns can be interpreted quantitatively.
- The same applies to Section 4.3. When comparing TWSTORE against other reconstructions, please provide summary statistics (e.g., spatially averaged correlations, regional performance metrics) rather than only visual differences.
- Throughout the manuscript, it remains unclear whether TWSTORE consistently outperforms existing reconstructions in reproducing GRACE or other independent products. Several statements appear to assume superiority, yet quantitative evidence/empirical support is limited. In comparative analyses, please indicate whether TWSTORE yields objectively better agreement with reference datasets relative to alternative products, and specify where performance is similar or weaker.
- The Conclusions section should explicitly acknowledge limitations of the approach and of the TWSTORE product, e.g., methodological assumptions, regional weaknesses, remaining uncertainty in long-term trends, and known issues with SLR/DORIS coverage.
Technical corrections
- Please ensure consistent use of “GRACE/FO” (sometimes “GRACE/-FO” appears in the text) throughout the manuscript. Also, the name appears before being defined (definition at line 71), while it should be introduced at first usage.
- In Section 4, subsection 4.1 is missing.
- Please clarify why Section 4 is not presented as part of the Results. Aren’t results presented in Section 4 core evaluation results as well?
- Figures appear out of order: Figure 2 is discussed after Figure 3. Please reorder figures (or discussion) to follow narrative flow.
- Section 5 includes several methodological explanations that belong in Methods or Introduction (e.g., lines 367–375, 421–431, 431–439). Consider relocating these to avoid mixing interpretation with procedural detail.
- At line 393, the manuscript refers to Figure 1 in Kusche et al. (2016) without context. Readers unfamiliar with the paper will not know what is being compared. Please summarize the relevant features of that figure in the text.
- Line 439 contains an incomplete sentence: “For terrestrial water storage changes, we evaluate TWSTORE”. Please revise for clarity.
References
Palazzoli, I., Ceola, S. & Gentine, P. GRAiCE: reconstructing terrestrial water storage anomalies with recurrent neural networks. Sci Data 12, 146 (2025). https://doi.org/10.1038/s41597-025-04403-3
Yin, J., Slater, L. J., Khouakhi, A., Yu, L., Liu, P., Li, F., Pokhrel, Y., and Gentine, P.: GTWS-MLrec: global terrestrial water storage reconstruction by machine learning from 1940 to present, Earth Syst. Sci. Data, 15, 5597–5615, https://doi.org/10.5194/essd-15-5597-2023, 2023.
Citation: https://doi.org/10.5194/essd-2025-461-RC2 -
AC2: 'Reply on RC2', Charlotte Hacker, 02 Feb 2026
Answers and comments to the reviewer
This review is for the article titled “Multidecadal reconstruction of terrestrial water storage changes by combining pre-GRACE satellite observations and climate data” by Hacker et al., submitted to Earth System Science Data.
General comments
The manuscript presents a new global reconstruction of terrestrial water storage anomalies (TWSA) back to 1984, combining climate-based regression with SLR/DORIS constraints in a GRACE-derived EOF space. This extension into the pre-GRACE era is valuable, and the paper is generally clear and methodologically innovative.
However, confidence in the reconstruction would benefit from stronger validation and clearer discussion of methodological choices. In particular, the validation strategy, the comparison with existing reconstructions, and the interpretation of long-term variability could be expanded to better support the conclusions. Comparisons with other reconstructions are useful but could more clearly explain why products diverge and how those differences affect interpretation.
Overall, the dataset has clear potential, but the manuscript would be strengthened by a more cautious interpretation of long-term variability and by additional independent evaluation. Therefore, I recommend a moderate revision addressing the points below.
Thank you for the positive assessment and the constructive comments and suggestions.
Specific comments
-
In Section 3.2, the rationale for resampling all climate datasets to a 0.5° spatial resolution, except for SST, is not clearly explained. Please clarify why SST is treated differently and how this choice affects comparability across predictors.
In contrast to the other predictors, which are given over land on the same grid as the GRACE data, the SST data are part of the far-field observations. Therefore, they do not need the same resolution as the predictors over land, and no resampling is required, reducing an additional error source introduced by resampling. The comparability across SST and the other predictors is not affected by differences in resolution, as SST primarily serves as a driver of the other predictors. We clarified the statement to: “All data sets were trimmed to the period 1984-2020, averaged or summed to monthly data, and, except for the SST data, resampled to a 0.5° grid. SST data were not resampled as the variable is part of the far-field variables.”
-
The manuscript compares TWSTORE against earlier TWSA reconstructions (Humphrey and Gudmundsson, 2019; Li et al., 2021; Chandanpurkar et al., 2022). While these products are widely used, more recent reconstructions are now available (e.g., Yin et al., 2023; Palazzoli et al., 2025;) that also span the TWSTORE period. The authors should explain why these newer datasets were not included and consider adding them to strengthen the comparative assessment.
We added the global reconstructions by Palazzoli et al. (2025) and Mandal et al. (2025). We omitted the one by Yin et al. (2023) because only about 10% of the grid points in the downloaded file have uninterrupted time series. Furthermore, we choose to focus on long-term signals, namely trend, acceleration, and the interannual signal, as Hacker and Kusche (2024) illustrated that reconstructions show high agreement for the seasonal and subseasonal signal components. We adjusted Figure 4 and its discussion accordingly.
-
Section 4 would benefit from quantitative support. When discussing differences in trends, accelerations, annual amplitude and phase, and seasonal/inter-annual variability, the text largely narrates what is visible in the spatial maps in Figure 1. Including numerical metrics (e.g., basin-averaged values, global means, or RMSE differences) would allow readers to assess the magnitude of improvements attributable to the combined reconstruction.
Thank you for the suggestion. We included a table with trend, acceleration, seasonal amplitude and phase, subseasonal and interannual signal magnitude, based on area-averaged time series of the preliminary reconstruction, TWSTORE, and the SLR-DORIS data for different (sub)continents. The boundaries of the (sub)continents are based on the HydroBasins level 01 product available from HydroSheds. We adjusted the discussion in the section accordingly to include the results from the area-averaged time series.
-
In Section 4.2, the comparison with GRACE/GRACE-FO is summarized through correlation and RMSE maps (Figures 2–3). It would be helpful to accompany these maps with representative numerical values. For example, when stating that correlations are generally lower in arid and high-elevation regions, please provide average correlations over those climate classes so that the reported patterns can be interpreted quantitatively.
We added numerical values throughout the discussion presented in section 4.2
-
The same applies to Section 4.3. When comparing TWSTORE against other reconstructions, please provide summary statistics (e.g., spatially averaged correlations, regional performance metrics) rather than only visual differences.
Similar to the adjustments made in section 4.1, we included a table of trends, accelerations, and interannual signal values based on area-averaged time series for the same regions used in section 4.1. We adjusted the discussion to include the area-averaged time series.
-
Throughout the manuscript, it remains unclear whether TWSTORE consistently outperforms existing reconstructions in reproducing GRACE or other independent products. Several statements appear to assume superiority, yet quantitative evidence/empirical support is limited. In comparative analyses, please indicate whether TWSTORE yields objectively better agreement with reference datasets relative to alternative products, and specify where performance is similar or weaker.
Thank you for this valid point. As pointed out in the conclusions, “We recognize that evaluating our own and others' reconstructions is a challenging task, and we invite readers to propose new ideas for this purpose.” It is challenging to quantify superiority, as there are no GRACE-like measurements before 2002, except for the SLR-DORIS data, which is used in the data combination and is therefore not suited for validating TWSTORE. Furthermore, there is a large variety of performance metrics for quantifying superiority, which depend very much on the intended application. We toned down the statements, hoping to achieve a less biased presentation.
-
The Conclusions section should explicitly acknowledge limitations of the approach and of the TWSTORE product, e.g., methodological assumptions, regional weaknesses, remaining uncertainty in long-term trends, and known issues with SLR/DORIS coverage.
Thank you for raising this point. We adjusted the conclusion accordingly, adding limitations of our approach due to the basin-wise EOF approach (“The potential downside of the basin-wise [EOF] approach is that correlations across basins are missed, and signal characteristics might change with the basin size.”), The data combination with SLR-DORIS (“While the SLR-DORIS data adds temporal reliability to our reconstruction, making it valuable for validation, it shows a slight decrease in spatial resolution.”) adds to the already mentioned issues with the long-term trend, the representability of the training period for long-term reconstructions, and higher uncertainties in the final reconstruction for 1984-1992.
Technical corrections
-
Please ensure consistent use of “GRACE/FO” (sometimes “GRACE/-FO” appears in the text) throughout the manuscript. Also, the name appears before being defined (definition at line 71), while it should be introduced at first usage.
Thank you for noticing. We ensured the consistent use of “GRACE/-FO” throughout the manuscript.
-
In Section 4, subsection 4.1 is missing.
Actually, subsection 4.1 is not missing. However, it might be missed, as the section is situated at the top of page 12 and directly followed by figure 1.
-
Please clarify why Section 4 is not presented as part of the Results. Aren’t results presented in Section 4 core evaluation results as well?
The rationale for distinguishing between sections 4 and 5 is that section 5, the results section, addresses independent validation of the datasets. In section 4, no independent validation is performed. The impact of the data combination is assessed based on the data used in the reconstruction and the validation against GRACE. Comparing the reconstruction to other existing reconstructions is also not an independent validation, as those reconstructions use similar datasets or algorithms.
-
Figures appear out of order: Figure 2 is discussed after Figure 3. Please reorder figures (or discussion) to follow narrative flow.
Thank you for the comment. We reordered the discussion accordingly
-
Section 5 includes several methodological explanations that belong in Methods or Introduction (e.g., lines 367–375, 421–431, 431–439). Consider relocating these to avoid mixing interpretation with procedural detail.
We initially anticipated abstaining from mentioning validation methods in the methodological description of the TWSTORE data set to avoid confusion with the comprehensive methodology of the manuscript’s objective; a practice also common in recent publications in the same field and journal (cf. Mandal et al., 2025). However, to comply with common standards of scientific writing, we have now moved the procedural descriptions of the a-posteriori evaluation to a new dedicated “Validation methods” subsection, right before the Data section. This way, the respective paragraphs in the results and validation sections now more clearly focus on the very results and highlight potential use cases of the new dataset.
-
At line 393, the manuscript refers to Figure 1 in Kusche et al. (2016) without context. Readers unfamiliar with the paper will not know what is being compared. Please summarize the relevant features of that figure in the text.
Thank you for pointing this out. We rewrote the part in question to: “Figure 1 in Kusche et al. (2016) shows peaks and lows at the one-in-five-year level for the GRACE data from 2003-2015, the corresponding time series, and the empirical and fitted probability density functions. When compared to Fig. 1 in Kusche et al. (2016), we observe close similarities in the spatial patterns of the one-in-five-year levels, with signal peaks in the Central Amazon and the Mississippi basin, and low events being concentrated in the Amazon and the Zambezi basin. Although the return levels in the present study are generally lower than those reported in Kusche et al. (2016).”
-
Line 439 contains an incomplete sentence: “For terrestrial water storage changes, we evaluate TWSTORE”. Please revise for clarity.
We modified the paragraph to be more clear about the procedure and actual global locations evaluated. In particular, it is now moved up to the dedicated validation methods section and we added the information that we use central differences to derive TWS change.
References
Palazzoli, I., Ceola, S. & Gentine, P. GRAiCE: reconstructing terrestrial water storage anomalies with recurrent neural networks. Sci Data 12, 146 (2025). https://doi.org/10.1038/s41597-025-04403-3
Yin, J., Slater, L. J., Khouakhi, A., Yu, L., Liu, P., Li, F., Pokhrel, Y., and Gentine, P.: GTWS-MLrec: global terrestrial water storage reconstruction by machine learning from 1940 to present, Earth Syst. Sci. Data, 15, 5597–5615, https://doi.org/10.5194/essd-15-5597-2023, 2023.
Citation: https://doi.org/10.5194/essd-2025-461-AC2 -
Data sets
Multidecadal statistical reconstruction of GRACE (Gravity Recovery And Climate Experiment) like terrestrial water storage anomalies (TWSA) incorporating geodetic tracking data Charlotte Hacker et al. https://doi.org/10.5281/zenodo.15827789
Catchment-Averaged Monthly Evaporation Timeseries 1984–2020 Derived from GRACE-like TWS Change via Terrestrial Water Budgets Benjamin Guknecht et al. https://doi.org/10.5281/zenodo.16643628
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 472 | 450 | 35 | 957 | 28 | 50 |
- HTML: 472
- PDF: 450
- XML: 35
- Total: 957
- BibTeX: 28
- EndNote: 50
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
Review of Multidecadal reconstruction of terrestrial water storage changes by combining pre-GRACE satellite observations and climate data by Hacker et al. submitted to Earth System Science Data.
In this study, Hacker et al. provide a new GRACE-like TWSA reconstruction back to 1984. The major contribution is to involve the SLR/DORIS-derived TWSA as additional input to keep the fidelity of reconstructed products. The benefits of this approach have been demonstrated through multiple internal/external validations.
Reconstructing GRACE-like TWSA products is generally important for the community, especially for analyzing long-term variations. This study makes an important contribution to the community by involving the alternative geodetic observations in the pre-GRACE era. I only have several rather moderate comments before the manuscript can be published. Most of them are related to possible improvements of the evaluation approaches.
General concerns:
The evaluations/validations can be further improved. As a data description paper for ESSD, rigorous validation is quite important. However, the evaluations in 4.1 and 4.2 are not fully independent as the SLR/GRACE data are used for inputs/targets, while 4.3 only presents the internal comparisons. The authors should try to perform more independent evaluations so that the audience will be more confident in the data. Here are two thoughts from my side:
1. An independent comparison can be performed by computing the TWS-derived global mean sea level (GMSL) variations, like Fig. 7 in Gentner et al. (2025). The data can be obtained from Frederikse et al. (2020). To this end, the trend and inter-annual signals contained in TWSTORE can be better demonstrated, and the benefits of including SLR/DORIS observations should be better argued.
2. Section 5.2 in the manuscript actually provides another good way for evaluating the data based on water balance. It might be a good idea to perform a global study based on some global dataset of precipitation, evapotranspiration, and runoff, and report some quantitative metrics. It is basically an extension of the current section 5.2 to the whole globe.
Besides, the comparison with other reconstructions mentioned in lines 71 – 73 is relevant. However, this comparison could be further improved to better demonstrate the contribution of this study. It is clearly mentioned in Humphrey and Gudmundsson (2019) that their reconstruction is only “climate-driven” and the long-term trends are removed. And the trends in Li et al. (2019) are based on linear extrapolation to the past, so they are also not that reliable. It would be good to involve some more recent studies that made additional efforts to reconstruct long-term variability, such as Yin et al. (2023), Palazzoli et al. (2025), Gentner et al. (2025). All those products are publicly available and easily accessible. Moreover, some more quantitative comparison results in addition to the current Section 4.3 are appreciated.
Specific comments:
[1] line 41: Please be clearer if you mean the missing input uncertainties, or the missing uncertainty information associated with those reconstructions, or maybe both.
[2] lines 43 – 44: “learned relations between climate and water storage can almost certainly not be transferred straightforwardly to the past” is a great motivation to involve the observations (SLR/DORIS) to further constraint the past reconstructions. Please highlight this point a bit more.
[3] Section 2 Methods. A schematic diagram showing the whole workflow would be beneficial for the audience to understand the full method, especially the relationship between EOF analysis and the three selected regression methods.
[4] lines 116 – 120: Dividing the whole world into different basins can indeed enhance the local performance. But will this strategy also introduce some discontinuities at the basin boundaries? Please comment on it. It may relate to the argument on lines 124 – 129. Is the missing cross-basin correlation a desired feature of the method or maybe a limitation?
[5] Fig. 2: Some of the known strong linear-trend signals are associated with high RMSDs (such as Alaska or the North of India), but some of them are not (such as the High Plain Aquifer and North China Plain). Could you please comment on the reasons behind these differences?
[6] Page 19 / Section 5.1: This section includes many short paragraphs. Please consider merging some of them.
References
Frederikse et al. (2020). The causes of sea-level rise since 1900. https://doi.org/10.1038/s41586-020-2591-3
Gentner et al. (2025). DeepRec: Global Terrestrial Water Storage Reconstruction Since 1941 Using Spatiotemporal-Aware Deep Learning Model. https://doi.org/10.22541/essoar.175138855.54947789/v1
Palazzoli et al. (2025). GRAiCE: reconstructing terrestrial water storage anomalies with recurrent neural networks. https://doi.org/10.1038/s41597-025-04403-3
Yin et al. (2023). GTWS-MLrec: global terrestrial water storage reconstruction by machine learning from 1940 to present. https://doi.org/10.5194/essd-15-5597-2023