the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
A benchmark dataset for global evapotranspiration estimation based on FLUXNET2015 from 2000 to 2022
Abstract. Evapotranspiration (ET) is a crucial component of the terrestrial hydrological cycle. Latent heat flux (LE, equivalent to ET in W/m2) observed by the eddy covariance (EC) technique, as known as LEEC, has been publicly recognized as highly accurate benchmark for global ET estimation. Currently, there is an increasing need for long time-series benchmark data to support climate change analysis, construction of new models, and validation of new products. However, existing LEEC datasets, like FLUXNET2015, face significant challenges due to limited observation periods and extensive data gaps. This hinders their application. To address these issues, we developed a gap-filling and prolongation framework for LEEC data and established a benchmark dataset for global ET estimation from 2000 to 2022 across 64 sites at various time scales. The framework mainly contained 3 parts: site selection and data pre-processing, gap-filled half-hourly / hourly LE data generation, and prolonged daily LE data generation. We selected 64 sites from FLUXNET2015 based on a rigorous filtering criterion. A novel bias-corrected random forest (RF) algorithm was used as the gap-filling and prolongation algorithm of the framework to produce seamless half-hourly and daily LE data. After analysis, the framework using novel bias-corrected RF algorithm achieves excellent performance both in hourly gap-filling and daily prolongation, with a median RMSE of 32.84 W/m2 and 16.58 W/m2, respectively. The algorithm significantly improved the gap-filling performance for long gaps and extreme values compared with the original RF and marginal distribution sampling (MDS) algorithm. The results demonstrate robust prolongation performance of our framework both on prolonging directions and temporal stability. There is a high consistency in data distribution between our gap-filled dataset and FLUXNET2015 dataset. In conclusion, a benchmark dataset for global ET estimation based on FLUXNET2015 from 2000 to 2022 was firstly published. This dataset can strongly provide data support for ET modelling, water-carbon cycle monitoring and climate change analysis. It is made freely available via the following repository: https://doi.org/10.5281/zenodo.13853409 (Li et al., 2024b).
- Preprint
(9932 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on essd-2024-460', Anonymous Referee #1, 24 Jan 2025
The research article discusses the development of a benchmark dataset for global evapotranspiration (ET) estimation, addressing limitations in existing latent heat flux (LE) data from the FLUXNET2015 dataset. Current datasets suffer from short observation periods and significant data gaps, hindering climate change analysis and model validation. To overcome these challenges, the authors created a gap-filling and prolongation framework that generates seamless half-hourly and daily LE data from 2000 to 2022 across 64 sites. They employed a novel bias-corrected random forest algorithm for improved data accuracy, achieving a median RMSE of 32.84 W/m² for hourly and 16.58 W/m² for daily data. The resulting dataset enhances ET modeling, water-carbon cycle monitoring, and climate change research.
The study is one of the pioneering efforts to utilize a bias-corrected random forest approach to enhance data gap-filling performance. I suggest minor revisions to address some specific questions before proceeding with publication.
Some minor issues:
Figure 3 - From the diagram there are two RF models being trained and evaluated. Please indicate that LE and Bias without single quote serve as observational ground-truth labels in Model training box.
In Model validation box, there is only predicted values instead of true values being indicated. Please add that true LE and Bias are used to evaluate the performance of RF1 and RF2 and indicate performance metrics used for each model validation.
Figure 4 – It is hard to conclude Bias-corrected RF has better performance than the other two approaches as the mean values of RMSE of those three are tightly close to each other shown in the figure. Consider adding data labels to the mean RMSE values in the figure to highlight the findings. Same for Figure 5.
Line 185 – Please elaborate more on how you choose the best hyperparameters from 64 models. 64 models with 64 sets of parameters are obtained. For the sites with similar land type, are those models combined into one unified model by taking averages of parameters or still using different sets of parameters? Please explain it in more details.
In discussion section, please add potential limitations from this study in terms of variable importance, sensitivity and stability.
Citation: https://doi.org/10.5194/essd-2024-460-RC1 -
RC2: 'Comment on essd-2024-460', Anonymous Referee #2, 17 Apr 2025
This study presents a 23-year long-term benchmark ET dataset (2000–2022) based on global FLUXNET2015 observations. The dataset effectively addresses critical gaps in existing ET records at both hourly and daily scales, while also extending the time span. This makes it highly valuable for validating ET models and satellite-derived ET products. Therefore, this work has significant importance for the ET research community and is suitable for ESSD. Below are some minor suggestions to further enhance the quality of the manuscript.
- Line 14: “terrestrial”
- Line 19: “This hinders their application.” the sentence is too short
- Line 45: “With the abundance of data and the development of models” the sentence is not very clear.
- Line 50: “as” is missing in “Since LEEC data are considered…”
- Line 62: “hopes” seems not suitable
- Eq. 2-3: Ta and Td is better than ta and td. And also, ata is easily mistaken for a single symbol.
- Line 109: The sentence of “Its spatial resolution…” needs to be refined and polished.
- Line 112: replace “contained” by “includes”
- Line 123: ”if”-> “when”
- Line 129:What does the sentence of “thus we chose 2 more sites with relatively good data quality” mean?
- Line 212: Why did the author prolong the daily ET? not the hourly ET?
- Line 304: What’s the difference between the forward and the backward prolongation?
- For the datasets, what is the difference between the aggregated_daily and the prolonged_daily_200_2022?
Citation: https://doi.org/10.5194/essd-2024-460-RC2
Data sets
A benchmark dataset for global evapotranspiration estimation based on FLUXNET2015 from 2000 to 2022 (V1.0) Wangyipu Li et al. https://doi.org/10.5281/zenodo.13853409
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
471 | 71 | 12 | 554 | 23 | 17 |
- HTML: 471
- PDF: 71
- XML: 12
- Total: 554
- BibTeX: 23
- EndNote: 17
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1