the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
State-of-the-art hydrological datasets exhibit low water balance consistency globally
Abstract. The proliferation and diversification of hydrological datasets have significantly advanced hydrological research. However, the coherence across these datasets remains poorly understood, hindering the comparability of findings derived from different data sources and variables. Here, we demonstrate that state-of-the-art hydrological datasets exhibit overall low consistency when evaluated through the lens of water balance – specifically, the relationship between variations in soil moisture and the difference between precipitation, evapotranspiration, and runoff. Our analysis reveals that satellite-based precipitation datasets generally show the highest consistency, while gauge-based datasets perform better in densely monitored regions of the Northern Hemisphere. For evapotranspiration, runoff, and soil moisture, reanalysis datasets demonstrate broader areas of higher consistency compared to gauge- or satellite-based products. Spatial patterns of consistency are strongly influenced by aridity and temperature, which affect measurement and modelling accuracy, while vegetation cover further modulates the performance of soil moisture datasets. Notably, dataset consistency has improved significantly in northern mid-latitudes over recent decades, likely reflecting advancements in observational technologies and the effects of climate warming. These findings underscore the importance of continued efforts to enhance dataset coherence and reliability for robust hydrological assessments.
- Preprint
(1502 KB) - Metadata XML
-
Supplement
(6160 KB) - BibTeX
- EndNote
Status: closed
-
RC1: 'Comment on essd-2025-376', Anonymous Referee #1, 24 Sep 2025
-
AC1: 'Reply on RC1', Junguo Liu, 27 Jan 2026
The comment was uploaded in the form of a supplement: https://essd.copernicus.org/preprints/essd-2025-376/essd-2025-376-AC1-supplement.pdf
-
AC1: 'Reply on RC1', Junguo Liu, 27 Jan 2026
-
RC2: 'Comment on essd-2025-376', Anonymous Referee #2, 08 Oct 2025
My review comments are structured as follows: Overall Assessment, Major Strengths, and Recommendations for Improvement.
I. Overall Assessment
This paper presents a systematic evaluation of the water balance consistency of 47 state-of-the-art hydrological datasets (precipitation, evapotranspiration, runoff, and soil moisture) using 8,294 independent combinations. The methodology is rigorous, the data coverage is extensive, and the study holds significant scientific and practical value. It reveals a widespread lack of water balance consistency in current global hydrological datasets and provides an in-depth analysis of the spatial patterns, influencing factors, and temporal trends. The manuscript is well-structured, the methods are transparent, and the results are credible. I recommend acceptance after minor revisions.
II. Major Strengths
1.High Novelty: This is the first study to systematically assess the consistency of multi-source, multi-variable hydrological datasets from a water balance perspective, filling a critical gap in the current literature.
2.Methodological Rigor:
a.The use of independent dataset combinations effectively avoids spurious consistency arising from the use of the same model or forcing data.
b.The use of adjusted R² as the consistency metric mitigates errors introduced by unit inconsistencies between variables.
c.The application of SHAP for factor attribution enhances the interpretability of the results.
3.Comprehensive Data Coverage: The inclusion of gauge-based, satellite-based, and reanalysis products ensures broad spatiotemporal coverage and strong representativeness.
4.Insightful and Actionable Results:
a.Clearly identifies the strengths and weaknesses of different data sources across various regions and climatic conditions.
b.Highlights the significant impact of soil moisture data depth on consistency.
c.Reveals an improvement in dataset consistency in mid-to-high latitude regions of the Northern Hemisphere in recent decades.
III. Recommendations for Improvement1. Clarifications in the Methods Section
Handling Soil Moisture Depth Differences: While the manuscript states that ΔSM represents "change," the response of soil moisture at different depths to P-ET-R varies. It would be beneficial to clarify if any normalization or sensitivity analysis was performed for ΔSM across different depths.
Temporal Scale Analysis: The significant differences in consistency between daily and annual scales warrant further discussion of the underlying physical mechanisms (e.g., high noise at daily scales, strong smoothing effects at annual scales).
2. Deepening the Results and Discussion
Root Causes of Low Consistency: Beyond the mentioned observational errors and model structures, could factors like surface-groundwater exchange or human activities (e.g., irrigation, reservoir regulation) also contribute? Expanding the discussion on this point would be valuable.
Mechanisms Behind Spatial Consistency Patterns: For instance, is the low consistency in high-latitude regions linked to insufficient representation of processes like snowpack and permafrost? Further interpretation in the context of existing literature is recommended.
3. Figures and Presentation
Figure 1: The meaning of the asterisk * and dashed lines in the boxplots should be explicitly stated in the figure caption.
Figure 2: The grey areas, indicating "multiple datasets show similar performance or low consistency," would benefit from having the specific thresholds for "similar" and "low" defined in the caption or figure.
Supplementary Material: Briefly mentioning the names of the best/worst performing datasets from Figures S13–S28 in the main text would help readers quickly grasp key findings.
4. Language and Formatting
Some sentences are quite long; breaking them up would improve readability.
Terminology should be checked for consistency (e.g., unified use of "gauge-based" vs. "station-based").
IV. Recommendation
Recommendation: Minor Revision
This manuscript makes a pioneering contribution to the evaluation of hydrological datasets. It is scientifically sound, its conclusions are robust, and it provides crucial insights for hydrological model development, data fusion, and climate change research. I recommend acceptance after the authors address the points above.
-
AC2: 'Reply on RC2', Junguo Liu, 27 Jan 2026
The comment was uploaded in the form of a supplement: https://essd.copernicus.org/preprints/essd-2025-376/essd-2025-376-AC2-supplement.pdf
-
AC2: 'Reply on RC2', Junguo Liu, 27 Jan 2026
-
RC3: 'Comment on essd-2025-376', Anonymous Referee #3, 26 Oct 2025
This study comprehensively evaluates the water balance consistencies among many state-of-the-art geospatial datasets on P, ET, and R. While it represents lots of work, clearly demonstrating the power of big data geospatial analysis, and the manuscript is generally well-written, there are several major concerns that should be addressed.
- The method for evaluating the water balance inconsistencies may need better justifications or some back-up analyses. While delta_SM is a reasonable proxy for the storage changes, it is still insufficient to capture those mass changes related to lakes/reservoirs/snow/glaciers, or those from underground. Therefore, the authors may need to explore the use of GRACE data to support their methods. I am afraid that some of the major conclusions for the high-latitude changes may be compromised if using GRACE.
- There are many useful insights regarding the performance of different datasets, however, it seems this study dos not directly contribute a new dataset itself? According to my understanding, ESSD’s scope is more data-centered. In this regard, can the authors clarify what are the new datasets they may be able to contribute to the community?
- The authors seem to overlook several past studies working on the similar topic (e.g., https://link.springer.com/chapter/10.1007/978-3-319-32449-4_4 and relevant citing references)
- Although water balance closure is indeed important, there are occasions where water balance is violated because of the unobserved loss/addition of water. For most cases, it points to the error of datasets, but for some occasional cases, they may point toward new hydrological insights. Authors may need to briefly discuss the limitation of their assumption on ‘water balance consistency is directly associated with good dataset performance’.
Citation: https://doi.org/10.5194/essd-2025-376-RC3 -
AC3: 'Reply on RC3', Junguo Liu, 27 Jan 2026
The comment was uploaded in the form of a supplement: https://essd.copernicus.org/preprints/essd-2025-376/essd-2025-376-AC3-supplement.pdf
Status: closed
-
RC1: 'Comment on essd-2025-376', Anonymous Referee #1, 24 Sep 2025
Manuscript: ESSD-2025-376
Huang et al. contribute to understanding the limitations of hydrological datasets (ground-, satellite-, and reanalysis-based) in capturing the relationship between monthly variations in soil moisture (SM) and the difference between precipitation (P), evapotranspiration (ET), and runoff (R) at a pixel scale around the world. Additionally, the manuscript’s results contribute to identifying the most suitable datasets for different geographical and ecological regions, which is important for reducing uncertainty in ecological, climatological, and hydrological studies using the evaluated datasets.
Overall, I found the paper well written and organized, and suitable for publication in the ESSD journal, but I have some comments that should be addressed before publication consideration. Particularly, some work is required to improve the clarity of the methods and results sections: (i) Explain how lateral flows and water table depth may potentially bias the proposed water balance at pixel scale, leading to the low water balance consistency reported in the manuscript; (ii) provide a clearer explanation of the linear relationship between SM, P, ET, and R (P – ET - R)s = k ΔSMs) at the monthly scale, including the potential limitations of assuming a linear relationship.
Major comments:
- Line 191: The proposed water balance equation does not include some fluxes that may strongly affect hydrological dynamics at the pixel scale and may contribute to the low water balance consistency reported in the manuscript. For example, lateral fluxes (both inputs and outputs) can significantly influence variations in soil moisture (SM) and runoff (R) at the pixel scale, particularly in low-elevation areas and along river channels (e.g., Fan et al., 2013; Miguez-Macho and Fan, 2025; Nobre et al., 2011). Similarly, SM dynamics are strongly influenced by water table depth (WTD). Therefore, the authors should explain how excluding lateral flows and WTD could bias the results. In this regard, I also suggest examining whether and how the runoff datasets capture lateral flows and groundwater dynamics at the pixel scale.
- Line 191: The linear regression between SM and P–ET–R may also introduce bias into your results. Because your analysis is performed at a monthly scale, the hydrological response of each water balance component may occur at different rates due to, e.g., seasonality (dry vs wet season or summer vs winter) or soil saturation. Therefore, I encourage the authors to provide a more detailed explanation of why the linear assumption is appropriate for the analysis, as well as its limitations.
- Line 205: The coefficient of determination (R2) of the linear regression model quantifies how well P-ET-R explains the variability of SM. However, you can include a bias metric (e.g., mean water balance error = i=1m(P-ET-R-SM)) to further examine the consistency of hydrological datasets.
Minor comments:
- Lines 47 – 68: You should provide further information about the general advantages and disadvantages of ground-based, satellite, and reanalysis datasets to characterize ET, runoff, and soil moisture as you did for precipitation.
- Lines 72, 23,7 and 253: I encourage authors to use another expression instead of the term “water variables” to avoid confusion.
- Line 89: Please clarify that R2 corresponds to the coefficient of determination.
- Lines 128 – 135: Soil moisture estimates were obtained from different depth profiles (< 2 cm, 0-50 cm, 0 – 100 cm, and > 100 cm). How well correlated are the variations in SM among these depth profiles? Do you consider extracting total water storage from GRACE https://grace.jpl.nasa.gov/mission/grace/?
- Line 124: Could you explain using linear interpolation in the dataset resampling process? Did you consider using bilinear interpolation?
- Line 229: An additional factor that may influence your analysis is the urban area fraction. Did you examine its effect on dataset's performance?
- Line 245: Please specify for which period you extract tree cover data.
- Line 313-320: Recently, Vargas Godoy et al., (2025) provide a global performance of several global precipitation datasets, identifying the best product at different spatial scales. Your manuscript and Vargas Godoy’s results agree that IMERG and MSWEP are the best products around the world. However, I am curious about the high R2 that you reported for PERSIANN-CDR (Fig. 1) due to Vargas Godoy et al. (2025), and several regional analyses suggest that PERSIANN-CDR exhibits a low accuracy compared to ground observations. Thus, I suggest providing a potential explanation for its high performance.
- Lines 440 – 442: Interestingly, reanalysis products show the best performance in terms of SM. Could you extend your explanation about these results and the potential reasons behind the lower performance of satellite-based products?
- Lines 445-450: Why is the lowest consistency observed at the annual scale?
Technical corrections:
- Please check whether the figure colors are suitable for color-blind readers.
References
Fan, Y., Li, H., and Miguez-Macho, G.: Global Patterns of Groundwater Table Depth, Science, 339, 940–943, https://doi.org/10.1126/science.1229881, 2013.
Miguez-Macho, G. and Fan, Y.: A global humidity index with lateral hydrologic flows, Nature, 644, 413–419, https://doi.org/10.1038/s41586-025-09359-3, 2025.
Nobre, A. D., Cuartas, L. A., Hodnett, M., Rennó, C. D., Rodrigues, G., Silveira, A., Waterloo, M., and Saleska, S.: Height Above the Nearest Drainage – a hydrologically relevant new terrain model, Journal of Hydrology, 404, 13–29, https://doi.org/10.1016/j.jhydrol.2011.03.051, 2011.
Vargas Godoy, M. R., Markonis, Y., Thomson, J. R., Ballarin, A. S., Perri, S., Miao, C., Sun, Q., Hanel, M., Papalexiou, S. M., Kummerow, C., Oki, T., and Molini, A.: Which Precipitation Dataset to Choose for Hydrological Studies of the Terrestrial Water Cycle?, Bulletin of the American Meteorological Society, BAMS-D-24-0306.1, https://doi.org/10.1175/BAMS-D-24-0306.1, 2025.
Citation: https://doi.org/10.5194/essd-2025-376-RC1 -
AC1: 'Reply on RC1', Junguo Liu, 27 Jan 2026
The comment was uploaded in the form of a supplement: https://essd.copernicus.org/preprints/essd-2025-376/essd-2025-376-AC1-supplement.pdf
-
RC2: 'Comment on essd-2025-376', Anonymous Referee #2, 08 Oct 2025
My review comments are structured as follows: Overall Assessment, Major Strengths, and Recommendations for Improvement.
I. Overall Assessment
This paper presents a systematic evaluation of the water balance consistency of 47 state-of-the-art hydrological datasets (precipitation, evapotranspiration, runoff, and soil moisture) using 8,294 independent combinations. The methodology is rigorous, the data coverage is extensive, and the study holds significant scientific and practical value. It reveals a widespread lack of water balance consistency in current global hydrological datasets and provides an in-depth analysis of the spatial patterns, influencing factors, and temporal trends. The manuscript is well-structured, the methods are transparent, and the results are credible. I recommend acceptance after minor revisions.
II. Major Strengths
1.High Novelty: This is the first study to systematically assess the consistency of multi-source, multi-variable hydrological datasets from a water balance perspective, filling a critical gap in the current literature.
2.Methodological Rigor:
a.The use of independent dataset combinations effectively avoids spurious consistency arising from the use of the same model or forcing data.
b.The use of adjusted R² as the consistency metric mitigates errors introduced by unit inconsistencies between variables.
c.The application of SHAP for factor attribution enhances the interpretability of the results.
3.Comprehensive Data Coverage: The inclusion of gauge-based, satellite-based, and reanalysis products ensures broad spatiotemporal coverage and strong representativeness.
4.Insightful and Actionable Results:
a.Clearly identifies the strengths and weaknesses of different data sources across various regions and climatic conditions.
b.Highlights the significant impact of soil moisture data depth on consistency.
c.Reveals an improvement in dataset consistency in mid-to-high latitude regions of the Northern Hemisphere in recent decades.
III. Recommendations for Improvement1. Clarifications in the Methods Section
Handling Soil Moisture Depth Differences: While the manuscript states that ΔSM represents "change," the response of soil moisture at different depths to P-ET-R varies. It would be beneficial to clarify if any normalization or sensitivity analysis was performed for ΔSM across different depths.
Temporal Scale Analysis: The significant differences in consistency between daily and annual scales warrant further discussion of the underlying physical mechanisms (e.g., high noise at daily scales, strong smoothing effects at annual scales).
2. Deepening the Results and Discussion
Root Causes of Low Consistency: Beyond the mentioned observational errors and model structures, could factors like surface-groundwater exchange or human activities (e.g., irrigation, reservoir regulation) also contribute? Expanding the discussion on this point would be valuable.
Mechanisms Behind Spatial Consistency Patterns: For instance, is the low consistency in high-latitude regions linked to insufficient representation of processes like snowpack and permafrost? Further interpretation in the context of existing literature is recommended.
3. Figures and Presentation
Figure 1: The meaning of the asterisk * and dashed lines in the boxplots should be explicitly stated in the figure caption.
Figure 2: The grey areas, indicating "multiple datasets show similar performance or low consistency," would benefit from having the specific thresholds for "similar" and "low" defined in the caption or figure.
Supplementary Material: Briefly mentioning the names of the best/worst performing datasets from Figures S13–S28 in the main text would help readers quickly grasp key findings.
4. Language and Formatting
Some sentences are quite long; breaking them up would improve readability.
Terminology should be checked for consistency (e.g., unified use of "gauge-based" vs. "station-based").
IV. Recommendation
Recommendation: Minor Revision
This manuscript makes a pioneering contribution to the evaluation of hydrological datasets. It is scientifically sound, its conclusions are robust, and it provides crucial insights for hydrological model development, data fusion, and climate change research. I recommend acceptance after the authors address the points above.
-
AC2: 'Reply on RC2', Junguo Liu, 27 Jan 2026
The comment was uploaded in the form of a supplement: https://essd.copernicus.org/preprints/essd-2025-376/essd-2025-376-AC2-supplement.pdf
-
AC2: 'Reply on RC2', Junguo Liu, 27 Jan 2026
-
RC3: 'Comment on essd-2025-376', Anonymous Referee #3, 26 Oct 2025
This study comprehensively evaluates the water balance consistencies among many state-of-the-art geospatial datasets on P, ET, and R. While it represents lots of work, clearly demonstrating the power of big data geospatial analysis, and the manuscript is generally well-written, there are several major concerns that should be addressed.
- The method for evaluating the water balance inconsistencies may need better justifications or some back-up analyses. While delta_SM is a reasonable proxy for the storage changes, it is still insufficient to capture those mass changes related to lakes/reservoirs/snow/glaciers, or those from underground. Therefore, the authors may need to explore the use of GRACE data to support their methods. I am afraid that some of the major conclusions for the high-latitude changes may be compromised if using GRACE.
- There are many useful insights regarding the performance of different datasets, however, it seems this study dos not directly contribute a new dataset itself? According to my understanding, ESSD’s scope is more data-centered. In this regard, can the authors clarify what are the new datasets they may be able to contribute to the community?
- The authors seem to overlook several past studies working on the similar topic (e.g., https://link.springer.com/chapter/10.1007/978-3-319-32449-4_4 and relevant citing references)
- Although water balance closure is indeed important, there are occasions where water balance is violated because of the unobserved loss/addition of water. For most cases, it points to the error of datasets, but for some occasional cases, they may point toward new hydrological insights. Authors may need to briefly discuss the limitation of their assumption on ‘water balance consistency is directly associated with good dataset performance’.
Citation: https://doi.org/10.5194/essd-2025-376-RC3 -
AC3: 'Reply on RC3', Junguo Liu, 27 Jan 2026
The comment was uploaded in the form of a supplement: https://essd.copernicus.org/preprints/essd-2025-376/essd-2025-376-AC3-supplement.pdf
Model code and software
Assess water balance consistency of state-of-the-art hydrological datasets Hao Huang and René Orth https://github.com/HowHuang/WaterBalanceConsistency
Viewed
| HTML | XML | Total | Supplement | BibTeX | EndNote | |
|---|---|---|---|---|---|---|
| 1,705 | 290 | 59 | 2,054 | 189 | 64 | 90 |
- HTML: 1,705
- PDF: 290
- XML: 59
- Total: 2,054
- Supplement: 189
- BibTeX: 64
- EndNote: 90
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
Manuscript: ESSD-2025-376
Huang et al. contribute to understanding the limitations of hydrological datasets (ground-, satellite-, and reanalysis-based) in capturing the relationship between monthly variations in soil moisture (SM) and the difference between precipitation (P), evapotranspiration (ET), and runoff (R) at a pixel scale around the world. Additionally, the manuscript’s results contribute to identifying the most suitable datasets for different geographical and ecological regions, which is important for reducing uncertainty in ecological, climatological, and hydrological studies using the evaluated datasets.
Overall, I found the paper well written and organized, and suitable for publication in the ESSD journal, but I have some comments that should be addressed before publication consideration. Particularly, some work is required to improve the clarity of the methods and results sections: (i) Explain how lateral flows and water table depth may potentially bias the proposed water balance at pixel scale, leading to the low water balance consistency reported in the manuscript; (ii) provide a clearer explanation of the linear relationship between SM, P, ET, and R (P – ET - R)s = k ΔSMs) at the monthly scale, including the potential limitations of assuming a linear relationship.
Major comments:
Minor comments:
Technical corrections:
References
Fan, Y., Li, H., and Miguez-Macho, G.: Global Patterns of Groundwater Table Depth, Science, 339, 940–943, https://doi.org/10.1126/science.1229881, 2013.
Miguez-Macho, G. and Fan, Y.: A global humidity index with lateral hydrologic flows, Nature, 644, 413–419, https://doi.org/10.1038/s41586-025-09359-3, 2025.
Nobre, A. D., Cuartas, L. A., Hodnett, M., Rennó, C. D., Rodrigues, G., Silveira, A., Waterloo, M., and Saleska, S.: Height Above the Nearest Drainage – a hydrologically relevant new terrain model, Journal of Hydrology, 404, 13–29, https://doi.org/10.1016/j.jhydrol.2011.03.051, 2011.
Vargas Godoy, M. R., Markonis, Y., Thomson, J. R., Ballarin, A. S., Perri, S., Miao, C., Sun, Q., Hanel, M., Papalexiou, S. M., Kummerow, C., Oki, T., and Molini, A.: Which Precipitation Dataset to Choose for Hydrological Studies of the Terrestrial Water Cycle?, Bulletin of the American Meteorological Society, BAMS-D-24-0306.1, https://doi.org/10.1175/BAMS-D-24-0306.1, 2025.