the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
A Globally Seamless Terrestrial Evapotranspiration Dataset Retrieved by a Nonparametric Approach with Remote Sensing and Reanalysis Datasets
Abstract. Evapotranspiration (ET) serves as a key indicator of the water change between the Earth’s surface and atmosphere, significantly influencing the hydrology cycle, surface energy cycle, and carbon cycle. Existing remote sensing models for estimating ET usually necessitate the parameterization of resistance parameters. In this study, we proposed the Remote Sensed Non-Parametric (RSNP) model, which leverages the nonparametric (NP) and Surface Flux Equilibrium-nonparametric (SFE-NP) approaches, and adapted remote sensing and reanalysis datasets of meteorological and surface parameters as model inputs. We estimate global monthly ET from 2001 to 2019 in the spatial resolution of 0.1° with RSNP model. Validation against FLUXNET sites globally yield RMSE of 23 mm/month (278 mm/yr), while regional-scale validation against water-balance ET results in a Root Mean Square Error (RMSE) of 113 mm/yr. In addition, the produced ET dataset have great accuracy in forest underlying and obtains spatial details of land surface ET. Furthermore, compared with ETMonitor, PEW and PML_V2, our dataset offers a continuous and seamless ET dataset suitable for global research. This study contributes to the advancement of global ET estimation and informs future water balance studies. The dataset presented in this article has been published in National Tibetan Plateau Data Center at https://doi.org/10.11888/Terre.tpdc.301343 (Pan, 2024).
- Preprint
(1616 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 17 Apr 2025)
-
RC1: 'Comment on essd-2024-495', Anonymous Referee #1, 28 Jan 2025
reply
General Comment
This paper describes a “globally seamless ET dataset…with remote sensing and reanalysis data”. The data is openly available at the National Tibetan Plateau Data Center for the period 2001-2019.
I believe that the paper requires a severe revision of its content, as it lacks most of the details behind the methodology adopted, details on the characteristics of the final products are difficult to find (is it monthly or daily? Is the model applied directly on monthly data or aggregated afterward?), and several key details on the validation are difficult to follow. In addition, numerous typos and unclear sentences can be found throughout the text (see examples in the specific comments below).
However, the most notable drawback of the dataset resides in its inception. The study claims that this is a “remote sensing based” dataset that “overcome the need for pre-defined parameters”. Regarding the first point, I found really difficult to see this as a remote sensing product, as the vast majority of the inputs came from ERA5-land. The remote sensing contribution is limited to emissivity and albedo only. This would not have been a major issue (beside the need to reword some of the model descriptions), but it highlights the second major problem of this dataset. The method uses skin LST from ERA5-land. These data are not observed from satellite, but they are modelled within the land surface component of the reanalysis system. It means that skin temperature depends on the parameterization used in ERA5-Land, the same parameterization that you are claiming to avoid. Following this consideration, the relationships used (1-1 and 1-2) acts only as a simplified version of the PM approach, where the skin temperature is derived from the more complex (and heavily parameterized) H-TESSEL.
Overall, the dataset may still have some useful applications related to multi-model assessment, but three key points need to be address: 1) a much better contextualization of the modelling framework and scope in view of the above-mentioned issue; 2) a much better description of the methodology, including differences from already existing approaches, and 3) an improved (especially in consistency) evaluation of the dataset against other similar products (e.g., ET from ERA5-land itself).
Specific comments
Title: A seamless global…
L17. Water exchange
L18-19. Hydrological, surface energy, and carbon cycles.
L20. Resistances. This is difficult to follow out of context into an abstract. Please reword.
L27. Explain acronyms.
L40. To conduct.
L41. Conducting. Repetition
L44. Sequentially?
L50-53. Add references to these datasets.
L54. Use consistent units for pixel size.
L58. Datasets available, they often…
L62. Metrology?
L62-63. In this sentence, it is not clear which problem (or problems) this dataset is trying to solve.
L66. A lot of repetitions (non-parametric) and unclarified terms (what is the role of Hamilton’s principle here).
L66-69. This sentence is unclear, please reword.
L76. I suggest introducing the methodology first, as explaining the data used, without introducing for what they are used for, make difficult to follow.
L78. As the inputs of
L79. To estimate ET…. Daily? Monthly? Not clear.
L80. At a spatial resolution
L81. Longwave radiation.
L87. Resempled… how? Especially land use, which is categorical.
Table 1. This table is not referenced in the text, as far as I can tell.
Table 1. Please clarify which input is from remote sensing and which from reanalysis. Also, please separate the model inputs from other data used for validation and analysis. The “data usage” column may not be seen by a reader, especially when things are mixed (2 retrievals, then validation, then retrieval again, …).
Table 1. Aridity index.
L93. Was the closure forced on the data. Which method?
L100. Some more details on this dataset are needed. A section (2.3) of just few lines is not acceptable.
Section 2.4. Same as before, some more details are needed. Modelling approach, main inputs, similarity in either inputs or methods, etc.
L116. Nearest neighbour method.
L116. “the differences…” at which time scale? Daily? Monthly? Again, not clear.
L122. Based on the Hamilton of microstate system,… Please reword, not clear the role of this principle on your method.
Table 2. This table is not referenced in the text.
L130. Some more details on the formulations are need. The reader needs to understand the basis of this approach without the need to go reading another full paper. For instance, the first term (Rn-G) is related to the available energy, and it is in common in all ET approach, but what about the other terms? What (Ts4-Ta4) represents? And the logarithmic term with Gs?
Eqs. (4) and (5) are not really needed, as they are basic physics. Please expand, instead, on the peculiarity of your method compared to other approaches. How do you “avoid parameters”?
L156. Why is the resampling needed? Just for a different projection? Not clear.
L157. How where water bodies, etc. excluded?
L158. Here there is the first reference to monthly scale, but it should be made clearer and it should be reported much earlier. Also, is the approach designed for monthly scale? Does eqs. 1-1 and 1-2 valid at monthly temporal scale?
L162. Aridity index.
L163. Aridity. Please fix throughout the text.
L163-165. Why 0.65 is used? Please add some reference to support this choice.
Fig. 3. The goal of the upper part of the figure (steps 1 to 5) is not clear. Is this just for gap filling? Very little is said about that in the main text.
L168. This sentence is unclear. How was this evaluated?
L173-176. This part of the pre-processing is very confusing. It needs rewording and expanding.
Section 3.3 I found this section mostly unnecessary.
L182-183. This sentence is not clear.
L186. This reference is not needed. These are standard metrics used in validation, not specifically introduced in that research.
L206. Absolute value
L227-229. This appears to be just 1 point. What is the point to compare this case with all the others? This analysis is very weak.
Fig. 8. Differences among models seems mostly systematic, so what is the point of showing multiple years? Wouldn’t be better to show the average year? Regional results would also be useful. Global average data are somewhat difficult to analyse.
L260. What is the difference? Please quantify. This is true for the entire results section, where often qualitative statements such as “is higher” is not accompanied by quantification.
L262-265. From the map in Fig. 9, it is not clear this difference in your dataset compared to the other. Also, here and later, there is a lot on emphasis on ET over the desert (missing values, values different than 0, etc.). Is this really that important? Are you expecting notable difference in water budget over these regions.
L274. Consistent behaviour with latitude.
L284. Seam?
L295-297. This result, and Fig. 10, raises the question: did you use the same dates for the analyses in Figs. 8 and 9 for all datasets? Average values should be computed on the same samples, so if you used a different number of dates for each dataset (based on availability or coverage) the results will be biased just for that and not for the differences in methodology.
As an example, if one dataset tends to have gaps during cloudy days, its average ET will be higher just because those cloudy days are not included. Please ensure consistency in the results reported.
L298-299. This statement is confusing. Mu refers to 24% of land surface. Where the 81% comes from? What is a middle-high latitude?
L302. Most of the missing values in the other datasets seems related to desert. How much the water balance can be compromised there when ET is mostly 0 anyway? The missing data is an important point, but it is more relevant in regions when ET is different than 0 when the data are missing. I will focus on these conditions to highlight your point.
L305. Shifts.
Fig. 11. Monthly availability… How is this monthly? Not clear.
L317. This dataset is seamless because is not a remote sensing-based product. If you try to use skin LST from satellite, then you would have a RS dataset but with some gaps.
L320. This statement is very confusing to me, first because skin LST from ERA5-land relies on these resistances, and second because the methodology does not explain how the method get rid of the resistances.
L327. Our dataset…
L328-332. This sentence is very confusing. Please reword.
L333. ERA5-land has already an ET product. You should include in your analysis a comparison with that product, as it is based on mostly the same forcings and is produced together with the skin LST used in this study. What is the added value of your methodology compared with what is already there?
L337. Residual surface energy balance is neglected here, which is weird as it is the more “direct” method from remote sensing to assess ET.
L337. “…may have similar systematic uncertainty”. Is your method so different from PM? A lot of the same factors plays a role in this method and in PM. In the text, you also confirm a lot of similarities between your dataset and the others. A better description of the methodology can help understanding the key differences and why it should not be affected by the same systematic differences as the other methods.
Citation: https://doi.org/10.5194/essd-2024-495-RC1 -
RC2: 'Comment on essd-2024-495', Anonymous Referee #2, 10 Mar 2025
reply
The manuscript provides a new global ET dataset, which is meaningful for the detection of long-term global ET variation and water resource detection. Then that dataset is validated by FLUXNET sites and water-balance ET data. The subject of the manuscript is within the scope of ESSD. It is worth publishing in the journal provided a major revision following the comments given as below:
- The concept of “Nonparametric Approach” which was mentioned in the title, it suggested to briefly explain the core concept of this method in the Abstract.
- L 28: “our dataset offers a continuous and seamless ET dataset suitable for global research.” The repeated use of the dataset, and the sentence should be simplified.
- L 28: “This study contributes to the advancement of global ET estimation and informs future water balance studies.” It is too general and lacks specific descriptions of your contributions.
- L 37: Please extend the description with how and why the distribution of these flux sites across the global land surface influenced the accuracy of estimating global ET.
- L 39: The word “conduct” is repeatedly used, please rewrite the sentence.
- L 64: Although the last paragraph of Introduction mentions the non-parametric method (NP) and the surface flux equilibrium-non-parametric method (SFE-NP), there is no detailed explanation of the principles of these methods and their advantages over the traditional parametric methods. Nor is it explained how these methods avoid the complex parametric process and how they improve the accuracy and applicability of ET estimation.
- L72: In the Introduction section, although the RSNP model was mentioned, but there is no detailed explanation of how this model solves the problems of existing models and its unique contribution in global ET estimation. The research goals should be more specific and clearer.
- L84: Please explain how the 1 km resolution of MODIS land cover data was reconciled with the 0.1° resolution of other datasets. Was any downscaling or upscaling applied, and if so, what methods were used?
- L109: The RSNP model’s input data are mainly from ERA5-Land, and ERA5-Land also provides a data set of actual ET. However, the section of cross-validation of RSNP does not reflect the comparison with ERA5-Land.
- L110: Several acronyms (e.g., PT-JPL) are introduced without full definitions upon first mention, which hinders readability for non-specialist audiences. Ensure all abbreviations are spelled out at first occurrence.
- L138: There is an error in Equation2 for calculating net surface radiation, and it should be revised.
- L180: “Direct validation is composed of validation at the point scale and validation at the basin scale”. It is necessary to elaborate on the specific differences and complementarities of these two validation methods, and to verify the validity and reliability of the model from which aspects respectively?
- L196: Figure 4 reflects the scatter density with stretched colors, but a color band indicating whether red or blue represents a high or low density is missing?
- L228: "RSNP has certain advantages in monitoring basin or regional ET on a global scale", but it does not specify what these advantages are. Similar general statements in the article should be thoroughly proven and expanded.
- L330: The expression “unsatisfactory performance” is not specific enough. It is suggested to change it to “limited accuracy”.
Citation: https://doi.org/10.5194/essd-2024-495-RC2 -
RC3: 'Comment on essd-2024-495', Thomas Van Niel, 13 Mar 2025
reply
Ms. Ref. No.: essd-2024-495
Title: A Globally Seamless Terrestrial Evapotranspiration Dataset Retrieved by a Nonparametric Approach with Remote Sensing and Reanalysis Datasets
Authors: Suyi Liu, Xin Pan, Jie Yuan, Kevin Tansey, Zi Yang, Zhanchuan Wang, Xu Ding, Yuanbo Liu, Yingbao Yang
Overview
The study introduces a global terrestrial evapotranspiration (ET) dataset (2001–2019, 0.1° resolution) using the Remote Sensed Non-Parametric (RSNP) model, which avoids complex parameterization by leveraging nonparametric (NP) and Surface Flux Equilibrium-Nonparametric (SFE-NP) approaches with remote sensing and reanalysis data. Validation against FLUXNET and water-balance ET showed comparable accuracy to existing datasets (ETMonitor, PML_V2, PEW). RSNP offered more complete global coverage by reducing missing values, especially in arid regions.
I personally learned a great deal from my own research into the Hamiltonian approach used and came away inspired to test new ideas. However, almost none of this understanding came directly from the paper, which largely glosses over, arguably, the most compelling reason to publish the work. Because of the novelty of the approach used to generate the dataset, I would very much like to see this paper published. However, it would require a substantial effort to make it ready for publication, in my opinion. I describe 5 major comments/concerns that I have about the manuscript in its current state. These should be explicitly addressed in the author's response. The intent of my comments is only to help improve the manuscript. I then provide a list of minor issues.
Major Comments/Concerns:
1.) Insufficient Explanation of the Hamiltonian Approach:
The manuscript does not provide sufficient detail on the Hamiltonian microstate approach, making it unclear why this method was chosen over a standard deterministic model. The novelty of this approach is underemphasized, despite it representing a fundamental departure from traditional surface energy balance (SEB) modelling. The lack of explanation makes it difficult for readers to fully understand the rationale behind this choice and assess its advantages. I only realized the significance of the approach after questioning the formulation of Eq. (1-1) and (1-2) and conducting my own research. The authors should provide a much clearer and more detailed explanation of the Hamiltonian (variational) method and explicitly highlight how it differs from conventional deterministic SEB modelling approaches. Strengthening this discussion would better justify its use and emphasize the novelty of the study.2.) Further thinking/justification of the surface partitioning constraint:
The function, ln(Ts/Ta), that is in both Eq. (1-1) and (1-2) would seem to me to be very insensitive to change within a realistic range of naturally occurring terrestrial land surface and air temperatures. I feel like it would, thus, fail to effectively scale energy partitioning. For example, when I calculate the output for temperatures in Kelvin for a few realistic terrestrial temperature examples, I get:
- Ts = 308.15 K (35°C), Ta = 298.15 K (25°C) → ln(Ts/Ta) ≈ 0.033
- Ts = 288.15 K (15°C), Ta = 298.15 K (25°C) → ln(Ts/Ta) ≈ -0.034
As can be seen from the two examples above, the function's output is very small. The reason for this is that if the temperatures are in Kelvin, then the difference between Ts and Ta is relatively very small compared to either of Ts or Ta, resulting in values only negligibly different from unity. When the ln is taken of values near one, they are always small. This would, subsequently make it behave almost linearly and prevent the function from capturing the expected nonlinear shift in energy partitioning from latent heat to sensible heat as the surface dries. Additionally, if temperatures are expressed in degrees Celsius, the function becomes physically invalid, as it involves taking the logarithm of a ratio that can include negative values or be divided by zero. These issues suggest to me that ln(Ts/Ta) is not a suitable scaling function for partitioning surface energy fluxes within the Hamiltonian framework. Apologies if I’ve got this wrong. I’d appreciate to hear from the authors specifically if I’ve made a mistake in my interpretation. If I am right, then at the very best, this function is doing almost nothing to partition the sensible and latent heat fluxes. The authors may be better off looking into a more appropriate function of Ts and Ta, which might improve the partitioning of latent and sensible heat fluxes. If this function has nearly no impact on the model, then what does it say about the reason the model outputs very reasonable ET estimates? Is it because the ERA5 data are doing most of the work? I discuss this more below.
3.) Justification for a New Global ET Model:
One of the key questions that arises is whether a new global ET model is truly needed, particularly given that the proposed dataset appears to perform similarly to existing models. The primary stated advantage of the dataset is that it is gap-free, but this claim is not inherently compelling, as the seamless nature of the data appears to be a result of gap-filling through averaging rather than a fundamentally new methodological breakthrough. The authors should clarify what specific advancements their approach offers beyond convenience, particularly in relation to existing global ET datasets. After a very quick web search I found several existing global datasets, see below. The list of datasets in not intended to be comprehensive. The authors should, in my opinion, include a more comprehensive summary of the current global ET datasets and then justify the need for a new one. A table that summarises the available datasets and classifies them into groups by some relevant criteria would be very helpful.- Global land surface evapotranspiration monitoring by ETMonitor model driven by multi-source satellite earth observations https://www.sciencedirect.com/science/article/pii/S0022169422010149
- A global dataset of terrestrial evapotranspiration and soil moisture dynamics from 1982 to 2020 https://www.nature.com/articles/s41597-024-03271-7
- On the divergence of potential and actual evapotranspiration trends: An assessment across alternate global datasets https://doi.org/10.1002/2016EF000499
- A global terrestrial evapotranspiration product based on the three-temperature model with fewer input parameters and no calibration requirement Earth Syst. Sci. Data, 14, 3673–3693, 2022 https://doi.org/10.5194/essd-14-3673-2022
- A Comprehensive Evaluation of Five Evapotranspiration Datasets Based on Ground and GRACE Satellite Observations: Implications for Improvement of Evapotranspiration Retrieval Algorithm https://www.mdpi.com/2072-4292/13/12/2414
- Multi-scale evaluation of global evapotranspiration products derived from remote sensing images: Accuracy and uncertainty https://www.sciencedirect.com/science/article/pii/S0022169422005571
- Global Evapotranspiration Datasets Assessment Using Water Balance in South America https://www.mdpi.com/2072-4292/14/11/2526
- GLEAM4 https://repository.kaust.edu.sa/items/0980d173-e356-48b9-9bae-19c81d830eb7
4.) Unclear Justification for Chosen Comparison Datasets:
Following on from the previous comment, the study evaluates their ET dataset against three other global products, but the rationale for selecting these particular datasets is not provided. The omission of GLEAM, which is a widely used and well-validated ET dataset, is notable. The authors should justify their dataset choices of evaluation datasets —do they represent distinct modelling approaches or different data sources? Establishing a clear logic for dataset selection is necessary to ensure that the validation is robust and meaningful. A clear justification of the global ET dataset comparison would strengthen the study and make its need and value more obvious.5.) Heavy Reliance on ERA5 Reanalysis Data:
The model's substantial dependence on ERA5 reanalysis data is a concern, as it suggests that the ET estimates may be heavily influenced by the input data rather than providing a new contribution to the scientific community. Additionally, ERA5-Land already provides a latent heat flux product, which raises an important question: How different is the new model’s ET output from ERA5’s latent heat flux? A direct comparison between the study’s ET dataset and ERA5’s latent heat flux should be included to assess the degree of similarity and potential redundancy.Minor comments
6.) Title: “Seamless” is probably not the right word for what you mean. Something like gap-free might be easier to immediately understand. I didn’t know what you meant by seamless until several pages into the document.
7.) Line 18: "hydrology cycle" → "hydrological cycle"
8.) Line 35: "metrological" → "meteorological"
9.) Line 61: "applicability and accuracy of them have not been incrementally improved" → "applicability and accuracy have not improved significantly"
10.) Line 92: "access the accuracy of monthly ET retrieved by remote sensing method." → "assess the accuracy of monthly ET retrieved by the remote sensing method."
11.) Line 116: "nearest-image resampling method." → "nearest-neighbor resampling method."
12.) Line 125: "regions(Hsieh et al., 2022; Yang et al., 2016)." → Missing space before citation.
13.) Line 137: "expressed as(Bisht et al., 2005)" → Missing space before citation
14.) Line 206: "valud" → "value"
15.) Line 209 & 210: "retreival" → "retrieval"
Citation: https://doi.org/10.5194/essd-2024-495-RC3 -
RC4: 'Comment on essd-2024-495', Chaolei Zheng, 18 Mar 2025
reply
I read the manuscript “A Globally Seamless Terrestrial Evapotranspiration Dataset Retrieved by a Nonparametric Approach with Remote Sensing and Reanalysis Datasets” with great interest. The new generated global ET data by RSNP model is a great contribution to the ET community. While the manuscript is generally well written and clear, I do have some specific comments and requests for clarification of the presented analyses.
ETMonitor ET dataset is seamless at daily resolution, and it even include open water evaporation and snow/ice sublimation in the terrestrial surface. I’m not sure exactly why the presented available ratio of pixels is low in some regions. It should be noted that extreme low ET value (e.g., zero) in ETMonitor product is valid, and the missing value is set as ‘-1’ in the ETMonitor product. Please double check to make sure zero is not treated as unavailable data during the analysis.
The author mentioned the gaps in the desert regions specifically in the introduction. However, it should be noted that basic equilibrium assumption of SFE fails under the extreme condition, which will lead to large uncertainty. The RSNP model will suffer similar problem, since it combined SFE and NP. It’s also noticed that all validation sites are located in the vegetation-covered regions and none locates in the desert or sparse vegetation regions, which cannot illustrate this problem. This also raises concerns about the reliability of the global seamless datasets in this study.
Title: ‘Globally’ should be ‘Global’.
‘Existing remote sensing models for estimating ET necessitate the parameterization of resistance parameters’ does not mean a problem. Resistance can reflect the regulation of land surface or atmosphere status on ET effectively.
The RSNP model has been already published in other journals. It’s unappropriated to say ‘In this study, we proposed the Remote Sensed Non-Parametric (RSNP) model’ in the abstract.
Does ‘remote sensing and reanalysis datasets of meteorological and surface parameters’ mean remote sensing dataset of surface parameters and reanalysis datasets of meteorological parameters?
How many sites are used for validation?
Present the abbreviation when first appearance.
Besides soil evaporation and vegetation transpiration, terrestrial ET also include evaporation from open water body and the canopy intercepted rainfall.
Line 53-54: improper citation.
Line 62-63: This is not well connected to previous sentence, which address the problem of date gaps when applying to relevant studies.
GLASS data provide Black sky Albedo and White sky Albedo, which blue sky albedo is need in Eq(2). How the author covert GLASS albedo to blue sky albedo?
Table 1 should be reorganized, and some listed datasets are not model forcing.
Please describe the quality control process for flux tower data.
Is ‘the nearest-image resampling method’ mean ‘nearest neighbor resampling method’? To sample from 500m or 1km to 10km, the average resampling method should be adopted.
The coefficients in Eq(3) are wrong. According to GLEAM, it should be 0.25 for bare soil and 0.05 for tall canopy. Please double check the model application.
Eq(5): Different equation should be adopted to estimate water vapor pressure when the surface is covered by snow/ice (either permanent or temporarily).
Line 154: ‘0.05°’ -> ‘0.1°’ ?
Section 3.2 mixes too much information, and it is recommended to reorganize it. The framework of ET estimation should be moved to somewhere in the front, rather than the last part. The processing of BBE should move to the data Section.
The footprint of the flux tower observation mismatch the 0.1°pixels as the estimated ET, and the relevant uncertainty should be noticed.
Line 185: Correlation Coefficient is generally expressed as R, not R2. Please also check this citation.
Line 204: precision indicators?
Line 206: ‘valud’?
Line 227: ‘arid index is over 1.0’ means arid or humid?
Section 4.3.1: how to estimate the global average ET if the dataset is not seamless?
Discussions: Please double check the issue of missing values (see my above comments) and revise it accordingly. The estimated daily ET in the desert is generally very small (but still larger than 0). However, to save the disk storage, the ET data is stored in integer format (rather than float point format) after multiplying by a scaling factor, which is common when publishing the high-resolution data. Consequently, those very small ET value may be stored as zero in the published dataset, which is still valid.
There is no evidence or quantitative assessment on how the gaps impact the water resources or water-energy-carbon nexus in this study.
Line 336-340: This is important, but need a more comprehensive discussion.Citation: https://doi.org/10.5194/essd-2024-495-RC4
Data sets
Global seamless terrestrial evapotranspiration dataset (2001-2019) Xin Pan, Suyi Liu, and Jie Yuan https://doi.org/10.11888/Terre.tpdc.301343
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
405 | 100 | 15 | 520 | 11 | 19 |
- HTML: 405
- PDF: 100
- XML: 15
- Total: 520
- BibTeX: 11
- EndNote: 19
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1