the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Multi-spatial scale assessment and multi-dataset fusion of global terrestrial evapotranspiration datasets
Abstract. Evapotranspiration (ET) is an important component of the terrestrial water cycle, carbon cycle, and energy balance. Currently, there are four main types of ET datasets: remote sensing–based, machine learning–based, reanalysis–based, and land–surface–model–based. However, most existing ET fusion datasets rely on a single type of ET dataset, limiting their ability to effectively capture regional ET variations. This limitation hinders accurate quantification of the terrestrial water balance and understanding of climate change impacts. In this study, the accuracy and uncertainty of thirty ET datasets (across all four types) are evaluated at multiple spatial scales, and a fusion dataset BMA(Bayesian model averaging)-ET, is obtained using BMA method and dynamic weighting scheme for different vegetation types and non-common cover years among ET datasets. ET from FLUXNET as reference, the study recommends remote sensing– and machine learning–based ET datasets, especially Model Tree Ensemble Evapotranspiration (MTE) and Penman-Monteith-Leuning (PML), but the optimal selection depends on season and vegetation type. At the basin scale, land–surface–model–based ET datasets have less relative uncertainty compared to other types of ET. At the global scale, the uncertainty is lower in regions with larger ET, such as the Amazon, Central and Southern Africa, and Southeast Asia. The BMA-ET dataset accurately captures trends and seasonal variability in ET, showing a global terrestrial increasing trend of 0.21 mm·yr−1 over the study period. BMA-ET has higher correlation coefficients and lower root-mean-square errors than most individual ET datasets. Validation using ET from FLUXNET as reference shows that correlation coefficients of more than 70 % of the flux sites exceed 0.8. Overall, BMA-ET provides a comprehensive, long-term resource for understanding global ET patterns and trends, addressing the limitation of prior ET fusion efforts. Free access to the dataset can be found at https://doi.org/10.6084/m9.figshare.28034666.v1 (Wu and Miao, 2024).
- Preprint
(3389 KB) - Metadata XML
-
Supplement
(14279 KB) - BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on essd-2024-600', Anonymous Referee #1, 17 Feb 2025
GENERAL COMMENTS
The research entitled "Multi-spatial Scale Assessment and Multi-dataset Fusion of Global Terrestrial Evapotranspiration Datasets" meticulously evaluated the accuracy and uncertainty inherent in thirty ET datasets at multiple spatial scales. These datasets encompass a variety of methodologies, including those derived from remote sensing–based, machine learning–based, reanalysis–based, and land–surface–model–based. Then the study produced a fusion ET dataset (BMA-ET) using BMA method and dynamic weighting scheme for different vegetation types. The article is well-written and demonstrates strong logical coherence. However, I am doubt about the purpose of this study. As the authors have pointed out, “there are large discrepancies among ET estimates from different methods”, I am wondering how does the research handle the uncertainty between different types of ET datasets. Due to differences in algorithm frameworks and input data, the uncertainty of estimation results varies. The ET Fusion not only combines the advantages of different models, but also integrates uncertainty and even enhances errors. Regarding this, the author did not provide a solution. For a global ET dataset, data availability is more important than validation accuracy, and the results and novelty do not reach the desired level, which I do not think meet the requirements of ESSD. Thus, I recommend rejection. Please see my specific comments below.SPECIFIC COMMENTS
1) I think the most significant problem with this research is that all the machine learning ET models and some other models (GLASS, PML, etc.) have been calibrated by ground observations from FLUXNET. The BMA-ET generated in this study used FLUXNET observations to fuse thirty ET datasets, which poses a problem of data reuse, and the estimated results may even overfit.2) How did the authors handle the estimation accuracy of sparse areas such as South America and Africa during the fusion process?
3) The BMA is not an advanced fusion algorithm. The GLASS v4.0 integrated five ET algorithms using BMA in 2014 and upgraded to v5.0 using a deep learning algorithm in 2022. Which version of GLASS product was fused in this study? Why don't the authors consider using deep learning fusion algorithms?
4) Table 2 shows that the spatial resolutions of the 30 ET datasets are different. How did the author solve the problem of spatial scale mismatch during the fusion process?
5) The 30 ET datasets cover different time ranges. How to carry out ET fusion for years with missing ET data?
6) What are the spatial and temporal resolution of BMA-ET? How to handle the mismatches with 30 ET input datasets?
7) Is the observation interval of the ground measurements from FLUXNET half an hour? How to process observation data into monthly scale? Is nighttime observation data used?
8) In line 181: What do 10 sites refer to? Does it refer to 60% of CRO sites? Please explain more clearly.
9) In section 2.2 (lines 176-195), “The ET fusion datasets for each vegetation type were spliced to obtain the final global ET fusion dataset”. How to obtain the boundaries of vegetation types at the regional scale? What is the accuracy? Have authors considered the fusion errors caused by land cover classification errors?
10) In Figure 2, “the 12 vegetation cover types do not cover the entire study area. For areas not covered, an equal weighting approach was taken”. Is this weight scheme reasonable?
11) In Figure 4, 30 ET datasets were well evaluated, and Table 3 showed the guidelines for the use of ET datasets. So, in the BMA-ET fusion process, were all 30 ET datasets used for fusion, or only the recommended datasets used for fusion? If as the authors stated, the accuracies of RA and LSM are not good, why are they still used for fusion?
12) In lines 237-238, the RS and ML ET datasets are recommended in the site scale validation results. Whereas, in lines 256-257, the ML ET datasets have greater TCH relative uncertainty. Do these two conclusions conflict? Please provide a detailed explanation.
13) In Figure 1, the common period of coverage for all ET datasets is 1982–2011. How did this study produce the BMA-ET dataset from 1980 to 2020?
14) In lines 355-356, the study recommended RS and ML based ET datasets (especially MTE and PML) based on the evaluation results. So why does the BMA-ET merge 30 ET datasets? Is it better to merge only MTE and PML?
Citation: https://doi.org/10.5194/essd-2024-600-RC1 -
RC2: 'Comment on essd-2024-600', Anonymous Referee #2, 24 Feb 2025
General comments
The work "Multi-spatial scale assessment and multi-dataset fusion of global terrestrial evapotranspiration (ET) datasets" presents a detailed comparison of 30 global-scale evapotranspiration datasets and uses Bayesian Model Averaging to create a new weighted ensemble dataset. The paper is logically structured and clear overall. The comparison of such a large sample of ET datasets and the evaluation and comparison at a range of scales alone are interesting, novel, and valuable. For the dataset to be useful, more methodological details are needed about the pre-processing methods used align all datasets to a consistent spatial and temporal basis beyond the descriptions provided in the supplementary material. In addition, more detail is needed regarding the robustness of the BMA approach to key assumptions, namely land cover change, land cover classification uncertainty at the resolutions presented, and BMA model validation. Many such questions would be far easier to review and provide feedback on if (annotated) code used to generate the dataset were provided. These additions as well as a correction of the 1 degree resolution dataset are recommended before publication in ESSD.
Specific comments
- There is a problem with the 1 degree dataset starting at approximately timestep 262:
- Figure 7: What resolution and spatial interpolation methods were used to fill in data gaps in producing Figure 7? Below is a detail mean annual derived from 0.5 degree data (see attached notebook for full size image, the 500px restriction on image attachments is rather limiting here).
- Line 178: how was training/validation split done for evaluating BMA performance? Given the sparsity of flux sites, why wasn't cross-validation considered? How sensitive are model weights to the training sample?
- Line 191: Can you quantify or estimate distributions of typical land cover changes at the appropriate dataset resolution as a basic test of model sensitivity to the stationary land cover assumption?
- Line 245: It isn't clear why correlations here are based on mean annual values and elsewhere (Figure S18) based on monthly data, making it more difficult to interpret the different comparisons presented (i.e. site, basin, global scales).
- Line 284 -- comparing typical MAE values with the stated trend, what is the uncertainty in the 0.21mm/yr trend line? How significant is the magnitude (and precision) of this trend compared to typical variability due to error?
- A basic attempt to replicate Figure S23 was unsuccessful. There is likely a simple explanation for the substantial offset (~20mm) but it is much more laborious to investigate without the full replication code. A copy of the code used to generate the figures presented in this review is attached.
- Figure 8: What are the units these models are compared on? i.e. is standard deviation mm/year? Was some kind of normalization/standardization done to make the reference dataset standard deviation exactly 1?
- Figure 8: What is the advantage of the BMA-ET dataset over the GLDAS-VIC dataset, or other datasets with similar correlation, lower RMSE, and standard deviation closer to the reference dataset?
- Line 313: What is the sensitivity of model performance to typical differences / uncertainties introduced by spatial scale mismatch?
- Section 4.2: More discussion of how data leakage was avoided is needed. How is training data independent of validation data in each comparison?
- There is a problem with the 1 degree dataset starting at approximately timestep 262:
-
RC3: 'Comment on essd-2024-600', Anonymous Referee #3, 25 Feb 2025
This study applies BMA to merge multiple ET datasets, but two fundamental issues must be addressed before publication in ESSD: potential data leakage and the assumption of independent errors among datasets. If unresolved, the study lacks the necessary rigor for acceptance.
- ML datasets (e.g., FLUXCOM, MTE) trained on FLUXNET are also used for BMA likelihood estimation, raising concerns about inflated weights. Has the author evaluated this effect? Clearly identify datasets incorporating FLUXNET and assess their influence on BMA weights. If necessary, limit their posterior weights or introduce independent validation datasets.
- BMA assumes independent errors, but ML datasets share training data, RS datasets rely on MODIS, and LSMs use similar climate forcings. Has the author assessed inter-dataset correlations and their impact on weight allocation?
- Considering introduce a covariance matrix (compute Pearson correlation matrices for FLUXNET residuals) into the likelihood function of BMA accounts for inter-dataset correlations. Compare weight distributions before and after adjustment. Alternatively, cluster highly correlated datasets (e.g., FLUXCOM, MTE) and downweight them collectively.
- The study applies a 60%-40% FLUXNET station split for BMA training and validation. Considering implement K-fold cross-validation or leave-one-out validation to assess the stability of BMA weights across different training subsets.
- Bootstrap resampling of FLUXNET data could estimate confidence intervals for BMA weights and ET estimates. If dataset dependencies are strong, current uncertainty estimates may be underestimated. Try add confidence intervals (e.g., 95% CI) to BMA-ET results in figures such as Fig. 7 or Fig. S23 and discuss implications.
- SI L355, TableS1, The TRENDY model dataset link is inaccessible. Is it publicly available? Clarify access restrictions and provide an alternative link if possible.
- L125, Table2, The citation Tian et al. (2015) may not be the best reference for DLEM ET data. Consider citing Pan et al. (2015) or Friedlingstein et al. (2023) (TRENDY v12 reference), which are more relevant to ET estimates.
- L156, Suggest use FLUXNET2015 (2012-2014) data to supplement site validation and evaluate BMA-ET performance. Additionally, explore AmeriFlux or ICOS (2015-2020) data for further validation, enhancing extended-period credibility.
- L274, Figure 6(a) color scheme appears cluttered. Align with subplot (b) by using consistent colors—RS datasets in red, ML datasets in yellow, etc. This improves clarity and direct comparison.
- L300, figure8, same as above.
Citation: https://doi.org/10.5194/essd-2024-600-RC3
Data sets
A new global terrestrial evapotranspiration dataset from multi-datasets fusion based on Bayesian model averaging covering 1980-2020 (BMA-ET) Yi Wu and Chiyuan Miao https://doi.org/10.6084/m9.figshare.28034666.v1
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
298 | 50 | 9 | 357 | 29 | 8 | 5 |
- HTML: 298
- PDF: 50
- XML: 9
- Total: 357
- Supplement: 29
- BibTeX: 8
- EndNote: 5
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1