A benchmark dataset for global evapotranspiration estimation based on FLUXNET2015 from 2000 to 2022
Abstract. Evapotranspiration (ET) is a crucial component of the terrestrial hydrological cycle. Latent heat flux (LE, equivalent to ET in W/m2) observed by the eddy covariance (EC) technique, as known as LEEC, has been publicly recognized as highly accurate benchmark for global ET estimation. Currently, there is an increasing need for long time-series benchmark data to support climate change analysis, construction of new models, and validation of new products. However, existing LEEC datasets, like FLUXNET2015, face significant challenges due to limited observation periods and extensive data gaps. This hinders their application. To address these issues, we developed a gap-filling and prolongation framework for LEEC data and established a benchmark dataset for global ET estimation from 2000 to 2022 across 64 sites at various time scales. The framework mainly contained 3 parts: site selection and data pre-processing, gap-filled half-hourly / hourly LE data generation, and prolonged daily LE data generation. We selected 64 sites from FLUXNET2015 based on a rigorous filtering criterion. A novel bias-corrected random forest (RF) algorithm was used as the gap-filling and prolongation algorithm of the framework to produce seamless half-hourly and daily LE data. After analysis, the framework using novel bias-corrected RF algorithm achieves excellent performance both in hourly gap-filling and daily prolongation, with a median RMSE of 32.84 W/m2 and 16.58 W/m2, respectively. The algorithm significantly improved the gap-filling performance for long gaps and extreme values compared with the original RF and marginal distribution sampling (MDS) algorithm. The results demonstrate robust prolongation performance of our framework both on prolonging directions and temporal stability. There is a high consistency in data distribution between our gap-filled dataset and FLUXNET2015 dataset. In conclusion, a benchmark dataset for global ET estimation based on FLUXNET2015 from 2000 to 2022 was firstly published. This dataset can strongly provide data support for ET modelling, water-carbon cycle monitoring and climate change analysis. It is made freely available via the following repository: https://doi.org/10.5281/zenodo.13853409 (Li et al., 2024b).