Synthesis of Global Actual Evapotranspiration from 1982 to 2019 1

. As a linkage among water, energy, and carbon cycles, global actual evapotranspiration (ET) plays an 8 essential role in agriculture, water resource management, and climate change. Although it is difficult to estimate ET 9 over a large scale and for a long time, there are several global ET datasets available with uncertainty associated with 10 various assumptions regarding their algorithms, parameters, and inputs. In this study, we propose a long-term 11 synthesized ET product at a kilometer spatial resolution and monthly temporal resolution from 1982 to 2019. Through 12 a site-pixel evaluation of 12 global ET products over different time periods, land surface types, and conditions, the 13 high performing products were selected for synthesis of the new dataset using a high-quality flux eddy covariance 14 covering the entire globe. According to the study results, Penman-Monteith Leuning (PML), operational Simpliﬁed 15 Surface Energy Balance (SSEBop), Moderate Resolution Imaging Spectroradiometer (MODIS, MOD16A2105) and 16 the Numerical Terradynamic Simulation Group (NTSG) ET products were chosen to create the synthesized ET set. 17 The proposed product agreed well with flux EC ET over most of the all comparison levels, with a maximum ME 18 (RME) of 13.94 mm (17.13%) and a maximum RMSE (RRMSE) of 38.61 mm (47.45%). Furthermore, the product 19 performed better than local ET products over China, the United States, and the African continent and presented an ET 20 estimation across all land cover classes.

Although flux EC ET is commonly flawed, particularly concerning energy balance closure at some sites (Foken, 2008;Helgason and Pomeroy, 2012), relatively short periods, and sparse spatial coverage, it is the most direct method for measuring the exchange between the surface and the atmosphere in different ecosystems (Foken et al., 2012;Baldocchi, 2014).Thus, site-pixel-level validation of certain ET products against flux EC ET as typically observed data has been performed by several studies in specific regions (e.g., globally (Leuning et al., 2008;Zhang et al., 2010;Ershadi et al., 2014;Michel et al., 2016); Asia (Kim et al., 2012); South Africa (Majozi et al., 2017); Europe (Ghilain et al., 2011;Hu et al., 2015); North America (Jiménez et al., 2009;Mu et al., 2011); Europe and the United States (Miralles et al., 2011b); the United States (Vinukollu et al., 2011b;Velpuri et al., 2013;Xu et al., 2019);and China (Jia et al., 2012;Liu et al., 2013;Chen et al., 2014b;Tang et al., 2015;Yang et al., 2017;Li et al., 2018)).Few previous studies have focused on merging certain ET products to create an ensemble ET product; for instance, (Vinukollu et al., 2011a;Mueller et al., 2013;Badgley et al., 2015).They used all ET products and created a merged product with a low spatial resolution.There are some global merged benchmarking evaporation products.Vinukollu et al. (2011a) generated an ensemble of six global ET datasets at a daily time scale and 0.5°×0.5°(≈55 km) spatial resolution for the period 1984-2007 using two surface radiation budget products and three models (i.e., surface energy balance, revised Penman-Monteith, and modified Priestley-Taylor).They reported that the ensemble simple mean value was reasonable; however, it was generally highly biased globally.Mueller et al. (2013) presented two monthly global ET products that differed in their input ET members and temporal coverage.The first dataset consisted of 40 datasets for the period 1989-1995, while the second dataset merged 14 datasets from 1989 to 2005.Their ET was derived from satellite and/or in situ observations (diagnostic) or calculated via LSM driven with observation-based forcing or output from atmospheric reanalysis.Hence, they provided four merged synthesis products, one including all datasets and three including datasets of each category (i.e., diagnostic, LSM, and reanalysis).They introduced the first benchmark products for global ET and found that its multi-annual variations showed realistic responses and were consistent with previous findings.Badgley et al. (2015)  However, from the aforementioned studies, we can report three findings: (1) no single ET product performed better than any other over different land surface types and conditions, (2) no one generated a single dataset for users, and (3) the created ensemble ET products relied on several individual ET products and were not based on the product with the best performance.
From our point of view, this work attempts to add to the growing scientific literature using a high-quality dataset from global flux towers for further validations and inter-comparison between different global ET products to understand their behavior within defined land cover types, elevation levels, and climatic classes.Moreover, we attempt to build an ensemble ET product that has a minimum level of uncertainty over as many conditions as possible.The study has two objectives: (1) to assess global ET products with in situ data derived from global flux towers across a variety of land surface types and conditions to gain a better understanding of the disparities among datasets and (2) to synthesize an ensemble global ET product with minimum uncertainties over more land surface types, climate systems, and monthly, annually and interannual time steps for a longer time.

Evapotranspiration
Twelve global ET datasets were explored in the current study (

Flux EC data
Comprehensive flux EC ET data from 645 sites (Fig. 1 and Table 3), AmeriFlux; FluxNET; EuroFlux; AsiaFlux; and ChinaFlux, were collected and processed to examine the performance of different estimated ET products.The downloaded EC data are half-hourly text-type data, while the periods of flux EC ET ranged from 1 year (12 months) to 21 years (252 months) from 1994 to 2019.The gap-filling technique was applied to the downloaded in situ EC data (Reichstein et al., 2005).Different EC flux sites were spatially distributed on the heterogeneous underlying surface, corresponding to different land cover types according to the International Geosphere-Biosphere Programme (IGBP) classification system, which is recorded in each flux attribute data.The in-situ measured ET (mm day -1 ) can be obtained by the half-hourly average latent heat flux (LE, W•m -2 s -1 ) through Eq. ( 1), (Su, 2002): Where LE ̅̅̅̅ (W•m -2 s -1 ) is the daily average of the half-hourly average latent heat flux, and λ is the latent heat of evaporation.λ varies with air temperature in hydrologic or agricultural system modeling but only to a small extent (Walter et al., 2001), and the value acts directly on the accuracy of the estimated in situ measured ET.Considering that there are very limited impacts of the changes in air temperature on the estimated in-situ measured ET (Henderson-Sellers, 1984;Li et al., 2018), the constant value of 2.45 MJ kg -1 is fixed in the calculation above (Walter et al., 2001).

Aridity index
The mean global aridity index dataset was produced by (Zomer et al., 2008) using WorldClim global climate data.The aridity index was estimated as the mean annual precipitation divided by the mean annual potential evapotranspiration, and the latter was calculated by the Hargreaves equation.The spatial resolution was 0.0083°×0.0083°(≈1 km) grid cell (Trabucco and Zomer, 2018) and the data can be downloaded from the following website: https://cgiarcsi.community/data/global-aridity-and-pet-database/

Elevation data
The Shuttle Radar Topography Mission (SRTM) data were provided at a resolution of one arc-second and void-filled (Farr et al., 2007).For the geographic areas outside the SRTM coverage area, the Global Multi-resolution Terrain Elevation Data 2010 (GMTED2010), which have a resolution of 7.5 arc-seconds, were used (Danielson and Gesch, 2011).
The magnitude of ME (the absolute value) is used as a bias indicator (Mu et al., 2011;Yang et al., 2017), while its sign indicates whether different ET products overestimate or underestimate the flux EC ET values.The accuracy of each ET product can be described by the RMSE (Miralles et al., 2011b;Hu et al., 2015).Moreover, the relative values of ME and RMSE are used for a fairer comparison between certain ET products among different regions and periods (Majozi et al., 2017).In addition, correlation coefficients (R values) are used to measure the strength of the relation between flux EC ET and different ET products (Ghilain et al., 2011;Hu et al., 2015), and with the aid of the Taylor score (TS), the overall performance of each product can be described well (Taylor, 2001;Mu et al., 2011).
To rank each ET product, the lower ME, RME, RMSE, and RRMSE values and the higher R and TS values are desired; lower biases and higher accuracies.elevation levels (24-26).Thus, the total number of cells is 156 for each level.Each cell in the matrix represents one of twelve ET products that belong to this level.Then, to select ET data for further synthesis, the number and percentage of ET product occurrence at matrix (Fig. 2b) of level one and two were calculated (Fig. 2c).ET products were ranked in descending order based on the occurrence percentage of levels one and two (the last column in Fig. 2c).Finally, the first two or three highly ranked ET products were selected to incorporate into the ensemble ET.For that, the selected ET products were resampled to a comparable spatial resolution if needed, and the average was used as the synthesized ET value.

Assessment of existing global ET datasets
Figure 3 shows that seasonality exists and is captured well by all ET datasets, with some exceptions over barren land, permanent snow and ice, and arid areas (not shown).February, which has a TS value above 0.60, as with the R-value, the TS declines from January, reaches its minimum in May, and then increases again starting in August.Figures 4 and 5 show these products yield intra-annual ET variations but vary in their performance according to the selected validation metrics, which also vary among all months (from January to December).Over croplands, grasslands, and forests, PML is the best product for R (TS) and RMSE (RRMSE).Additionally, it has the highest TS over water bodies.SSEBop, GLEAM33a, SEBS, NTSG, and GLDAS20 obtained the desired ME (RME) over croplands, grasslands, forests, water bodies, and barren land and permanent snow and ice, respectively.
GLEAM33a also represents the highest R (TS) with the lowest RRMSE, while GLDAS20 has the smallest RMSE over barren land and permanent snow and ice.In addition, GLDAS20 has the lowest RMSE, while SSEBop has the highest R and lowest RRMSE over water bodies, see   against flux EC ET aggregated for all sites for each land cover type (croplands: (1); grasslands: (2); frosts: (3); water bodies: (4)).

Validation by climate classes
Figures
Table 6 shows that, according to the occurrence of ET products in level one, PML, GLDAS20, and SEBS represent the first three best-performing ET products, while according to the occurrence of ET products in level two GLDAS20, PML, and MOD16A2105, and according to the total occurrence in levels one and two, PML, GLDAS20, and SSEBop are the best, respectively.For example, PML yielded the best validation metrics (the lowest ME, RME, RMSE, and RRMSE as well as the highest R and TS) over 83 (53%) and 24 (15%) cells in levels one and two, respectively; thus, the total count was 107 (34%) cells.Accordingly, the three best-performing ET products over most of the all conditions are MPL followed by GLDAS20 (level one: 10 (6%); level two: 37 (24%); total: 37 (15%)) and SSEBop (level one: 12 (8%); level two: 15 (10%); total: 27 (9%)).
Since the three best-performing ET products differ in their spatial resolution and algorithms, we introduced an ensemble mean product at a 1000 m × 1000 m spatial resolution that spans from 2003 to 2017 (15 years) and relies on remotely sensed models (PML and SSEBop).It should be noted that although SEBS has one point more than SSEBop on level one, it has 7 fewer points than SSEBop in level two (5%).In addition, SSEBop has a higher spatial resolution than that of SEBS.In the same manner, SSEBop and MOD16A2105 have the same performance in terms of total count (27 (9%)), but SSEBop is higher by 5 points in level one.
Obviously, from Table 7, the ensemble ET products cannot perform highly across all regions, and it had a total count of 50%, followed by PML (44%).Looking to the ensemble mean from Table 7 compared to PML from Table 6, the total count increased from 34% to 50% (+16%), indicating that the ensemble mean, which created from PML and SSEBop, enhanced PML performance across all conditions by 16% and PML itself still has the best performance by 44%.
To introduce an ensemble product before 2003, firstly, PML and SSEBop were ignored, and the same steps were repeated.Table 8 shows that the best-performing products are GLDAS20, MOD16A2105, and NTSG in terms of the total count.Since the last two products are based on remote sensing, they were selected to create the ensemble product before 2003 at a 1000 m × 1000 m spatial resolution.Although GLDAS20 agreed well over 42% and had the lowest maximum ME among all datasets (9.73 mm), NTSG was selected to provide the ET estimates before 2000 because it had a higher spatial resolution, so it could capture more spatial details than GLDAS20.
Table 9 shows that the ensemble ET for 2001 and 2002 performed better than the original ET products, with values of 62%, 38%, and 50% for level one, level two and the total, respectively.For the periods before 2001, NTSG can be used from 1982 to 2001 or GLDAS20 can be used instead.Hence, remotely sensed-based long-term ensemble ET can be synthesized from PML and SSEBop between 2003and 2017, MOD16A2105 and NTSG between 2001and 2002.SSEBop can be used after 2018, while before 2000, NTSG can be used.

Synthesized global ET product
Figure 13 shows, looking to July, except over barren land, permanent snow and ice, and arid areas (not shown), the maximum value of the synthesized ET lies between SSEBop, which yields the largest ET during all months, and PML.Hence, the long-term monthly synthesized ET performance is affected by PML and SSEBop more than by NTSG and MOD16A2105, as mentioned in Sect.4.2.2.
Table 10 provides the average monthly and annual synthesized ET (mm month -1 ), land cover types, aridity index classes, and elevation levels (mm year -1 ).The average annual ET from 1982-2019 is 567 mm year -1 .July represents the maximum synthesized ET (Fig. 13).Table 10 also provides average annual ET for land cover types calculated from flux sites.Across land cover types, croplands are higher than forests, followed by grassland, where the average synthesized ET was 597, 548, and 542 for croplands, forests, and grasslands, respectively.Low synthesized ET values across arid areas (average = 392 mm year -1 ) can be attributed to low vegetation cover.It should be noted that Table 10 does not represent the perfect calculation of ET over each Land cover class because the total number of fluxes for each class was not distributed well; for instance, in the arid areas, there were 35 (5%) fluxes, while in the humid area, there were 361 (56%) fluxes.
Figure 14 shows the decadal (1982-1989, 1990-1999, 2000-2009, and 2010-2019) and long-term (1982-2019) average synthesized ET maps worldwide, except for Antarctica.Regarding the spatial distribution, the higher ET is shown in Malaysia, Singapore, and Indonesia and the northern part of South America.During the first and second decades, the synthesized ET is based on the NTSG product; thus, the same spatial distribution was observed.
Although PML and SSEBop mainly contribute the synthesized ET between 2003 and 2017, there is little difference in their spatial distributions, where higher ET can be observed during 2010-2019 over the northern parts of South America.
Table 11 shows statistics of the maps provided in Fig. 14 for all continents except Antarctica.The standard deviation is higher over Africa followed by Oceania and Asia.The mean values of the synthesized ET is sequenced from South America followed by Oceania and Africa.The maximum value of the synthesized ET is recorded over Asia followed Africa and Australia.The total ETs are 29.1%,21.7%, 19.9%, 16.7%, 7.9%, 4.2%, and 0.5% for Asia, South America, Africa, North America, Europe, Australia, and Oceania, respectively.

Validation of the synthesized ET
Figures 15-18 show that the synthesized ET agreed well with the observed data, where the R (TS) ranged between 0.70 (0.85) and 0.78 (0.89), except at the annual time step (Fig. 15b) and over barren land and permanent snow and ice (not shown), where R (TS) was 0.65 (0.81) and 0.68 (0.80), respectively.Based on the ME sign, the value was underestimated only over water bodies.The magnitude of ME (RME) ranged between 0.54 mm (1.05%) and 6.76 mm (16.62%), while the RMSE (RRMSE) ranged from 20.95 mm (45.22%) to 30.12 mm (59.61%).Looking at the regression line equation, with no exceptions, the synthesized ET overestimated the flux EC ET at lower ET values and underestimated the flux EC ET at higher ET values.As mentioned above, even the long-term synthesized ET cannot perform best across all comparison levels (Tables 12 and 13).
During the periods 2018-2019 and before 2001, the synthesized ET performance came from the original datasets of SSEBop and NTSG, respectively.The ensemble mean has a total count of 50% over the periods 2003-2017 and 2001-2002 compared to the original datasets, indicating that it can perform better than other ET products over half of all comparison levels, see Tables 7 and 9. Figure 3) Table 10.The average decadal synthesized ET of monthly (mm month -1 ) and land cover types, aridity index classes and elevation levels (mm year -1 ).

Discussion
Since global land ET plays a paramount role in the hydrological cycle, its accurate estimation is essential for further studies.Although there are many global ET products that have been derived from remote sensing models, land surface models, and hydrological models, they differ in their algorithms, parameterization, and temporal span, and none of these products can be used for a long time with a reasonable spatial resolution and lower uncertainty.In this study, we ensemble the best-performing, currently available global ET products at a reasonable spatial resolution (kilometer) as one consistent global ET dataset covering a long temporal period.Users can use this dataset assuredly without looking at other datasets and performing additional assessments.
We used a high-quality dataset of global flux towers as a site-pixel-level validation for certain global ET products (Leuning et al., 2008;Zhang et al., 2010;Ershadi et al., 2014;Michel et al., 2016) to assess them and select the best products to create a synthesized ET covering a long temporal period.For that, a matrix of 6 validation criteria and 26 comparison levels was created, and then levels one and two of the validation metrics were used to select the best-performing products.Finally, by the simple mean of the products that performed best over the different periods, the synthesized ET was created.
Among all global ET products investigated in this study, the products that performed best are PML, GLDAS20, SSEBop, MOD16A2105, GLDAS21, SEBS, and NTSG (Table 6).From the perspective of all comparison levels, the performance of these products varied, and no single product performed well across all land surface types and conditions (Vinukollu et al., 2011a;Li et al., 2018).The PML represents the ET product with the highest agreement, with lower ME (RME) and RMSE (RRMSE) values, followed by the synthesized ET (Tables 12 and 13); however, it should be noted that PML estimates span a 15-yr period, while the synthesized ET presents longer estimates from 1982 to 2019 (38 years).
The main advantage of the new dataset is that, for the first time, a synthesized remotely sensed ET product with a reasonable spatial resolution and lower long-term uncertainties has been provided, where the maximum absolute ME (RME) and RMSE (RRMSE) values are 13.94 mm (17.13%) and 38.61 mm (47.45%), respectively.Furthermore, it agreed well (R > 0.70) in 62% of all comparison levels (Table 14).This dataset can provide ensemble ET estimates for all land cover types, where MOD16A2105 does not provide ET estimates over water bodies and desert areas other products are.Moreover, a comparison among the synthesized ET against CR, SSEBop, and FAO WaPOR ET products over China, the United States, and the African continent proved that the synthesized ET outperformed these products in terms of a higher agreement, higher accuracies and lower biases.Hence, the synthesized ET can play an essential role, especially for regional and global scale studies, over a long time (1892-2019).7 and 9).
Because the ET was synthesized during the first and second decades as well as the year 2000 based on resampled NTSG to a 1 km spatial resolution to be comparable with other products, future improvements may be focused on statistical downscaling of NTSG during this period.Moreover, since different datasets were selected due to data availability, also future improvements may be focused on the adjustment of the ensemble means particularly for long-term pixel-based studies.

Data availability
All data used in this study are freely available; see Sect. 2 and Appendix A. The synthesized ET is available in https://doi.org/10.7910/DVN/ZGOUED(Elnashar et al., 2020) and as GEE application from the following link: https://elnashar.users.earthengine.app/view/synthesizedet.In addition, it can be accessed in the GEE JavaScript editor (the updated link embedded in the GEE application interface).Through this application, the user can query and display as well as download the synthesized ET.It should be noted that SSEBop and NTSG datasets are not available in Earth Engine so they were uploaded as assets in GEE for this purpose.

Conclusion
In the current study, a site-pixel-level validation was conducted for certain global ET products across a variety of land surface types and conditions to select the best performing ET products and then produce a global long-term synthesized ET dataset.To apply a comprehensive evaluation from different perspectives, land cover types, climate and elevations were classified into five, four, and three classes, respectively.According to six comprehensive validation criteria, the evaluated ET products ranked based on the lowest error metrics and highest accuracy and consistency over different classification levels to choose the ensemble members over different times.
The average annual ET from 1982-2019 is 567 mm year -1 .Although no product performed better in terms of all selected validation criteria in all classification levels, PML, GLDAS20, SSEBop, MOD16A2105, GLDAS21, SEBS, and NTSG are the sequence of their performances.

TerraClimate ET
TerraClimate ET is estimated based on a monthly one-dimensional soil water balance for global terrestrial surfaces, which incorporates evapotranspiration, precipitation, temperature, and interpolated plant extractable soil water capacity.The water balance model is very simple and does not account for heterogeneity in vegetation types or their physiological responses to changing environmental conditions (Abatzoglou et al., 2018).TerraClimate estimates are provided at a monthly temporal resolution from 1958 to 2018 and 0.041°×0.041°(≈5 km) grid cells.
used a Priestly-Taylor Jet Propulsion Lab (PT-JPL) model with 19 different combinations of forcing data to produce global ET estimates from 1984 to 2006 at a 1°×1° (≈100 km) spatial resolution.The ensemble ET members changed according to the number of products available each year, which ranged between 4 and 12 members for 1999/2000 and 2001/2002, respectively.Their study focused on the uncertainty in global ET estimates resulting from each class of input forcing datasets.

Figure 1 .
Figure 1.Spatial distribution of 645 in-situ flux EC sites across the world.

Figure 5 .
Figure 5. Monthly validation metrics (ME (mm): (a); RME (%): (b); RMSE (mm): (c); RRMSE (%): (d); R: (e); TS: (f)) of ET products against flux EC ET for all sites (legend as Figure 3k).4.1.3.Validation by all sites' annual ETFigure6shows all ET products overestimate the observed ET with two exceptions; SEBS and MOD16A2.In all environmental conditions, PML has the highest R (TS) and the lowest ME (RME) and RMSE (RRMSE).Figures4 and 6indicate the obvious error metrics of annual scale performances that are consistent with those that come from the monthly time step.The lowest and highest absolute values of ME (RME) for monthly ET exist in MOD16A2105 (SEBS) and FLDAS, respectively, while those for annual ET exist in PML and FLDAS, respectively.Furthermore, PML yields the largest R and TS values for monthly and annual ET, but the minimum values of R and TS were registered with TerraClimate and MOD16A2 for monthly and annual ET, respectively.This result may be attributed to the aggregation of monthly ET into annual values.

4. 2 . 2
Contribution of ET datasets to the synthesized ET The synthesized ET dataset was created at a 1000 m × 1000 m spatial resolution from 1982 to 2019 based on remotely sensed ET products.PML, SSEBop, MOD16A2105, and NTSG were augmented together to create the new dataset.Since SSEBop and MOD16A2105 have a 1000 m × 1000 m spatial resolution, PML was upscaled and NTSG was downscaled by pixel average and nearest neighbor resampling techniques in GEE, respectively.The synthesized ET was fully contributed by SSEBop for the years 2018 and 2019 and by NTSG from 1982 to 2000, while for the years 2001 and 2002, it was contributed by the simple mean of MOD16A2105 and NTSG.Finally, between 2003 and 2017, the value represents the simple mean of PML and SSEBop.Since the synthesized ET performance was governed by each ET product(s) for the corresponding year from 1994 to 2019 (25 years), where the ET EC fluxes were available, most of the performance comes from PML and SSEBop for the 15 years from 2003 to 2017 (60%), from MOD16A2105 and NTSG for 2 years (2001 and 2002; 8%), from SSEBop for individual values in years 2018 and 2019 (8%), and from NTSG for 7 years (24%) from 1994 to 2000.

Figure 14 .
Figure 14.Decadal and long-term synthesized ET, the last plot shows continental-scale used to create Table 11 accompanied by the percent of ET over each continent for the periods 1982-2019 except Antarctica.Use the following link of the GEE application to preview these maps: https://elnashar.users.earthengine.app/view/synthesizedet/

Figure 15 .Figure 16 .
Figure 15.Monthly (a) and annually (b) synthesized ET against flux EC ET aggregated for all sites.

Figure 19
Figure 19 presents a monthly comparison between the synthesized ET with the country-based ET products over China and the United States as well as over the African continent.In general, the synthesized ET returned higher agreement (R and TS) and accuracy (RMSE) with the flux EC ET than did the other ET products (CR, SSEBop, and FAO WaPOR).Moreover, it has lower biases over the United States and the African continent.

Figure 19 .
Figure 19.Monthly comparison between the synthesized ET (a, c and e) and CR (b), SSEBop (d), and FAO WaPOR (f) ET products against flux EC ET aggregated for all sites over China (a and b), the USA (c and d) and the African continent (e and f).

Table 1 .
Table 1 and Appendix A).Of them, 5 datasets used the Moderate Resolution Imaging Spectroradiometer (MODIS) as input, including two versions (V6 and V105) of Global Evapotranspiration Project (MOD16A2), Penman-Monteith Leuning ET (PML), the operational Simplified Surface Energy Balance ET (SSEBop) and the Surface Energy Balance System (SEBS).One dataset used the Advanced Very High-Resolution Radiometer (AVHRR) as input, including the Numerical Terradynamic Simulation Group (NTSG).The remainder mainly uses meteorological datasets as direct input, including field measurements such as TerraClimate and reanalysis datasets such as FLADS and GLADS.The algorithm used in 12 global ET datasets is mainly the Penman-Monteith model, except for FLADS and GLDAS, which use the LSM, and TerraClimate, which uses the soil water balance model.Priestley-Taylor is used to estimate evaporation from open water by NTSG while Penman evapotranspiration is used in PML for a water body, snow and ice evaporation.SSEBop, SEBS, NTSG, and GLEAM are individually managed, and other ET products, as well as elevation data, are available from GEE. Global ET products.

Table 2 .
Regional ET products.

Table 3 .
Summary of 645 in-situ EC flux sites.

Table 4 .
Climate classification according to the global aridity index values.There are 6 validation metrics including R, TS, ME, RME, RMSE, and RRMSE.The validation values of 6 metrics are categorized into levels.The level one of validation metrics has the highest R and TS values and the lowest ME, RME, RMSE, and RRMSE while the level two of validation metrics has the highest R and TS values and the lowest ME, RME, RMSE, and RRMSE after level one.For that, R and TS sorted descending while ME, RME, RMSE,and RRMSE sorted ascending (Fig.2a) then the corresponding ET product of each validation metric saved in a new table to be used to fill in Fig.2b.
The current study proposes three steps to develop a synthesized global ET dataset.First, the ET datasets are compared based on 6 validated metrics to generate a matrix to indicate level one and two of the validation metrics of all ET products over all comparison levels (Fig.2b).For each level, there are 6 validation metrics in rows and 26 ET values of different periods and underlying conditions in columns (comparison levels), including monthly average (01), annual average (02), monthly (January-December: 03-14), land cover types (15-19), climate classes (20-23), and

Validation by all sites' monthly ET Figure
The maximum ET occurs during July and differs according to each ET dataset.Generally, MOD16A2 represents the minimum estimated ET across all conditions, while SSEBop represents the maximum ET across all conditions except over humid regions and at elevations between 500 role; additionally, only one contribution by the lowest RRMSE was found in February and the highest TS was found in March for TerraClimate and GLEAM33b, respectively.4showsthatonly SEBS and MOD16A2 underestimate flux EC ET.PML is the dataset that best agrees with the observed ET, and it had the lowest RMSE (RRMSE).MOD16A2105 returned the smallest absolute ME, while SEBS yielded the smallest RME.Figure5shows there are interannual differences between certain ET product performances.MOD16A2 shows negative MEs and RMEs for all months, with larger biases during March, April, and May, while FLDAS shows positive MEs and RMEs for all months, with larger biases during March, April May, June, and July.For other products, the ME and RME signs vary among months; for instance, the ME and RME values of GLDAS21 are negative (underestimated) during February, September, and November and positive (overestimated) in the remaining months, with larger biases during March, April, May, June, and July.The RMSE declines from January to February and then increases until July and declines again until November.The minimum RMSE values occur during February, November, and December, while the maximum values occur during June, July, and August.For instance, the RMSE in July ranges from 36.28 mm to 52.41 mm for FLDAS and PML, respectively, ET values (not shown).Second, in dry sub-humid areas, SSEBop (Fig.9c3) and GLDAS21 (Fig.9e3) overestimate under both lower and higher ET values.Applying for the highest R (TS) and lowest error metrics role, MOD16A2 cannot present any while it ranges from 17.08 mm to 21.68 mm for PML and SEBS, respectively.RRMSE declines from January reaches its minimum in June and then increases again until December, except for SEBS in December.The highest values of RRMSE (>80%) occur in January, February, November, and December except for SEBS in December, while the lowest values (<60%) exist in June, July, and August.The R-value declines from January and reaches its minimum in May; it then increases starting in August.Except for MOD16A2, all products have an R-value greater than 0.60 during January, February, November, and December.SEBS has the lowest R-value during March, April, May, and June, while PML yields the highest R-value during all months except January and December.Except for MOD16A2 in

Table 5 .
Levels one and two validation metrics of the 12 ET products for monthly (01), annually (02) interannual (January-

Table 6 .
The occurrence of the 12 ET products based on Table5.

Table 7 .
The occurrence of PML and SSEBop products and their ensemble mean during 2003 and 2017.

Table 8 .
The occurrence of all ET products except PML and SSEBop products.

Table 9 .
The occurrence of NTSG and MOD16A2105 products and their ensemble mean during 2001 and 2002.

Table 11 .
Statistics of the decadal and long-term synthesized ET (mm).

Table 13 .
Same as Table 6 but MOD16A2 replaced by the synthesized ET and based on Table 12.

Table 14 .
Percentage of R more than 0.70 and the maximum absolute value of ME (mm), RME (%) RMSE (mm), and RRMSE (%) across all comparisons levels (01-26) of the highly preformed ET products and the synthesized ET.The synthesized ET used SSEBop ET for the years 2018 and 2019 and NTSG from 1982 to 2000 because NTSG is the only remotely sensed global ET product available and has a good spatial resolution compared to GLDAS20.It is the simple mean of MOD16A2105 and NTSG for the years 2001 and 2002 and the simple mean of PML and SSEBop between 2003 and 2017 (see Tables (McNally et al., 2017)m PML, SSEBop, MOD16A2105 and NTSG agreed with the flux EC ET with R-values higher than 0.70, a maximum ME (RME) of 13.94 mm (17.13%) and a maximum RMSE (RRMSE) of 38.61 mm (47.45%) over 62% of all comparisons levels, as remotely sensed based ET product spanning from 1982 to 2019 with highest agreements, accuracies and lower biases over most of the land surface types and conditions.It performs well when compared with country-based and continental ET products over China, the United States and the African continent.However, the further synthesis of local ET products is encouraged if regional ET products are available.The results from this study provide a better understanding of the high performing ET products in each land cover type, elevation level and climate region as well as a monthly, annual and interannual time steps.Hence, this InfraRed Precipitation with Station data (CHIRPS), a quasi-global rainfall dataset designed for seasonal drought monitoring and trend analysis(McNally et al., 2017).FLDAS is provided at a 0.1°×0.1°(≈10 km) spatial resolution and monthly temporal resolution during the period 1982-2019.