the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
A benchmark dataset for half-hourly evapotranspiration estimation in China from 2000 to 2024
Abstract. Latent heat flux (LE) provides a direct representation of terrestrial evapotranspiration (ET) and plays a critical role in hydrological cycle studies, land surface model development, and the evaluation of remotely sensed evapotranspiration products. Although flux observations based on the eddy covariance technique are widely regarded as essential benchmark data for evapotranspiration estimation, existing ChinaFlux observations are generally limited by short observation periods and extensive data gaps, which substantially constrain their applicability in long-term change analyses and multi-scale studies. To address these limitations, we developed a gap-filling and temporal prolongation framework specifically designed for half-hourly LE and established a continuous ground-based benchmark dataset covering China for the period 2000–2024 based on observations from 50 ChinaFlux sites. The framework is built upon an automated machine learning approach (AutoML-H2O) and integrates ERA5-Land reanalysis data with MODIS vegetation indices, enabling accurate gap-filling within observation periods and reliable prolongation beyond observation intervals. Comprehensive evaluations demonstrate that the AutoML framework achieves high accuracy at the half-hourly scale across different gap-length scenarios, with an overall correlation coefficient (CC) of 0.862 and a root mean square error (RMSE) of 33.75 W m-2, and it substantially outperforms conventional methods under long-gap conditions of 7 d and 30 d. The forward and backward prolongation results show high consistency (CC values of 0.902 and 0.896, respectively) and exhibit robust temporal stability under varying training data lengths. Multi-timescale validations further indicate that the prolonged LE data reasonably reproduce diurnal variations, seasonal cycles, and interannual variability from half-hourly to daily and monthly scales. Comparisons with ChinaFlux observations under strict quality control reveal good consistency across different temporal scales, underlying surface types, and climate zones. SHAP-based interpretability analysis indicates that energy supply consistently dominates LE variability, while vegetation state and water availability modulate their relative importance under different environmental conditions. Overall, we present the first continuous half-hourly ground-based LE benchmark dataset covering China for the period 2000–2024. This dataset provides essential data support for the evaluation of remotely sensed ET products, land surface model validation, and studies of regional water–energy cycles and climate change, and it is freely available via the following repository: https://doi.org/10.5281/zenodo.18194590 (Qian et al., 2026).
- Preprint
(5723 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on essd-2026-22', Anonymous Referee #1, 16 Mar 2026
-
AC1: 'Author Reply on RC1', Lifeng Wu, 30 Mar 2026
Dear Reviewer,Thank you very much for your valuable comments, which have greatly helped improve the manuscript ESSD-2026-22.
We have carefully addressed all comments one by one and revised the manuscript and supplementary materials accordingly. For clarity and convenience, our detailed responses are provided in the attached PDF file rather than in this text box.
The attached file is:Author_response_to_reviewer1_ESSD-2026-22.pdf
We have not submitted the revised manuscript at this stage, as the system indicates that the revised version should not be uploaded here. However, in the point-by-point response document, we have clearly described all revisions in detail, so that the changes can be fully understood without directly consulting the manuscript.
If the revised manuscript is indeed required, please feel free to let us know, and we will upload it promptly. We apologize for any inconvenience this may cause.
Thank you again for your valuable comments.
-
AC1: 'Author Reply on RC1', Lifeng Wu, 30 Mar 2026
-
RC2: 'Comment on essd-2026-22', Anonymous Referee #2, 02 Jun 2026
major comments. I congratulate the authors on creating a broad and valuable data set. Bringing together data from so many flux sites and applying common gap-filling methods is a valuable contribution to earth system science. While I applaud the effort, I do have a number of concerns.
1. The documentation of methods is not up to the standards required for publication. Many of the data sets and data processing methods brought to bear in this data product are not cited. Citations for methods and data sets must be provided. If the methods are unique to this manuscript, they must be documented in this manuscript. Examples of these methods and data set that must be cited in order for this manuscript to be suitable for publication include flux data processing and machine learning methods and the ERA-5 and MODIS data products.
2. Many of the analyses presented, particularly the time series comparisons, are only semi-quantitative and accompanied by statements that aren’t quantifiable or precise. While some examples of the data product are suitable for a manuscript documenting the data set, I find the time series figures and associated discussion to be excessive.
3. The methods that are used to fill and prolong the data are not clear. The methodology presents five different versions of AutoML but never explains what is finally done to create the filled data set or why this one version is chosen. This is critical to the end product.
4. Documentation for the eddy covariance (EC) flux sites is lacking, almost entirely, from the document. The EC flux sites all have instruments and data processing methods, in addition to site metadata (soils, vegetation, terrain) that are unknown in the present document. Many scientists have contributed to this data collection. Their methods must be documented. The scientists who created these data should be credited for their contributions. While I admire the monumental effort of the assembling these data and this manuscript, I cannot support the publication of a data product based on undocumented original data sets.
5. Finally, I am concerned about the inclusion of investigator gap-filled data in what is fundamentally and gap-filling exercise. The document states that some site data were gap-filled by the EC flux investigators and that these data were retained as “true observations” out of necessity. I understand that this might be necessary but I am concerned that if these data are not flagged this will degrade the data set. I recommend that, if at all possible, the investigator gap-filled data should be excluded from the data set and this gap-filling exercise. Gap-filled data are not “true observations.” The result is fitting a model to a model. If this is not possible, I suggest that, at a minimum, those sites that include investigator gap-filled data must be flagged so that a potential data product user would have the option to avoid using those data. Finally, I suggest that those sites with gap-filled data should be excluded from all analyses that are used to evaluate this gap-filling methodology.
I recognize the importance of this work and this data product and I would like to encourage the authors. I believe, however, that these major points must be addressed for this manuscript and data product to be suitable for publication.
Many of these major comments are reflected in the detailed comments that follow.
Detailed comments.
1. Line 31. “SHAP-based”
Please avoid using undefined acronyms in the abstract.
2. Lines 53-54. “With the expansion of spatial and temporal scales”
Scales of what? And perhaps domain might be a better term for what you have in mind? I would also suggest that the term “scale” is overused and often ambiguous. Domain and resolution are often more precise terms.
3. Line 67-68. “long periods of missing observations due to instrument maintenance, energy balance non-closure,”
I don’t think of energy balance non-closure being a source of missing data. It is a systematic problem with EC flux measurements that exists throughout EC flux measurement records.
4. Line 71-72. “Although the marginal distribution sampling (MDS) approach has been widely adopted as the official gap-filling algorithm”
My apologies, I was not aware of any “official” gap filling algorithm. I don’t believe there is any regulatory agency identifying EC flux data methods. Can you please provide a citation for this method and its broad adoption? You address this later in the introduction. Perhaps these sections of the introduction should be merged.
5. Lines 73-74. “it often fails to accurately reconstruct the magnitude and extremes of LE under long-gap conditions or during periods of extreme environmental variability.”
Citation(s) please? I am not aware of this information or these studies.You address this later in the introduction. Perhaps these sections of the introduction should be merged.
6. Lines 74-75, “Moreover, the overall observation durations of existing flux sites are relatively short, and conventional gap-filling methods do not support temporal prolongation”
This statement contradicts the earlier message about long gaps in the Fluxnet2015 data record. Please be consistent with the data and with your own message.
7. More significantly, I would not call extrapolating a set of measurements beyond their temporal extent with a model an observation.
8. Lines 116-117. I suggest starting a new paragraph at some point where you introduce the idea of “data-driven” approaches. I would take care to distinguish this from the other methods you have cited since all of them use data. “Data-driven” alone does not distinguish among methods, in my opinion.
9. Line 123. “Automated machine learning (AutoML), by systematically exploring model architectures and optimizing hyperparameters,”
Please include a citation or multiple citations describing this method. The following text explaining the benefits of this approach should also be expanded to include appropriate citations.
10. Section 2.1
The entire paragraph introducing ChinaFlux has no citations. Many of the methods for data processing that are listed are published and the publications documenting those methods should be cited. Please add appropriate citations documenting this resource and its methods. Please add citations for the ChinaFlux data set(s).
11. If the network, its data sets and its methods are not published, this is a problem.
12. Line 161. “Previous studies have demonstrated…”
Please cite at least a representative subset of those “previous studies.”
13. Figure 1a. The elevation scale should be eliminated. It is not relevant to the information presented about the sites (vegetation cover). It is colorful but misleading - it can confuse the reader regarding what is presented concerning site locations.
14. Figure 1b. This figure is difficult to understand. The same variable exists on 2 axes. More explanation of how to read this would be helpful.
15. Figure 1b. I believe this figure is saying that 15 sites have nearly zero observational gaps. In my experience with EC flux measurements this is nearly impossible. Rainfall and stable atmospheric conditions essentially always lead to some significant fraction (~20%?) of data loss. How is it possible to have sites with less than 5% missing data?
16. Figure 1d. Please include units on the x-axis.
17. Lines 172-173. “Among the final set of sites, 40 provide quality-controlled original LE observations, while the remaining 10 sites offer continuous LE time series that have already been gap-filled within their observation periods.”
Does this mean that the gap filled data are included in the “observations?” If the data from these sites does not clarify what are observations vs. what is gap-filled that is a concern. Please clarify what methods have been followed here and what metadata are available for these sites. Ideally you are not gap-filling data that have already been gap-filled. I strongly recommend that you start only with observed fluxes before doing any gap filling.
18. I suggest that you do not include sites if they have only gap-filled flux records and you cannot identify the true observations.
19. I could accept the following compromise: If sites have gap filling that cannot be identified, please clearly identify these sites in your final data product. This would allow a data use to avoid these sites in their analyses.
20. Appendix A: Table AI. I see that Table A1 has no citations and no investigators associated with these flux sites. If this data set is the only published record of these data, then perhaps this is appropriate. I am concerned, however, that each site should have metadata describing the site, its instrumentation and its data processing methods that is not available in this manuscript. Please include references where readers can find the details about these sites, their investigators and their methods. If these references do not exist, I am concerned about the value of this data set. Flux records with biological and environmental data limited to “forest” have limited value. It is true that many land surface characteristics can be gathered from satellite data sets but many cannot. If you do not have access to these data concerning the sites, please explain how this data set retains its value, and add this discussion to the manuscript.
21. Table A1: I am also very puzzled by the sites with a “missing ratio” of zero (as previously noted). How is this possible? What is the definition of the “missing ratio”?
22. Table A1: Several column headings have words that are broken across rows (e.g. Longitud
e). Please correct this.
23. Table A1: Is elevation the altitude above sea level of the ground at the flux tower sites? What are the heights of the towers above ground?
24. Section 2.2.
Please include citations for the ERA-5 system and data sets. Web links are not sufficient documentation for these products. The lack of citations hinders readers from reviewing the details of the data products, limited the reproducibility and traceability of this work, and does not acknowledge the years of effort needed to make these data products publicly accessible.
25. Section 2.3.
Please include citations for the MODIS data products that are being used here. As with ERA-5, web links are not sufficient documentation. The lack of citations hinders readers from reviewing the details of the data products, limited the reproducibility and traceability of this work, and does not acknowledge the years of effort needed to make these data products publicly accessible.
26. Lines 221-222. “Given that the AutoML framework is primarily
based on tree-based algorithms…”
The AutoML code should be documented and accessible.
27. Section 2.4.
As with the previous two sections, no citations for the methodology are presented. For example, lines 229-231 note that data were quality screened but no details of the methods are presented and no citation for existing methods are given. This is not sufficient documentation for a data set. This is not at all reproducible.
28. The flowchart, Figure 2, is very helpful. I hope that more details (e.g. “Determine optimal model”) are described later in the text.
29. The QC flags appear to differential Filled data from Prolonged data. This is very helpful. I do not support, however, calling investigator-filled data “True observational data.” This point remains ambiguous. Please clarify. If investigator-filled data cannot be identified, please label all sites with this problem. As a data user I would not want to include these sites in analysis. Ideally I suggest that the authors exclude investigator-filled data since they are conducting a filling exercise.
30. Lines 242-244. “During the AutoML search process, multiple commonly used regression algorithms, including tree-based models and their ensemble variants, were evaluated, and the optimal model was automatically selected based on validation performance.”
What defines optimal? This detail must be explained. There are many metrics that can be used to define optimality.
31. There appear to be five independent gap-filled methods explained. Lines 250-256 explain one approach, based on random removal of data. Section 2.4.2 describes five more approaches, based on elimination of gaps of four different lengths. Each approach is reportedly optimized and applied to each site. How are five different gap filling procedures applied to each flux tower site?
32. Lines 278-279. Please include citations for these “benchmark methods.” If these are routinely used for flux tower gap filling, please provide the citations documenting this work. Please provide citations for the methods you are using.
33. Line 290. Three metrics are presented, with citations, this is helpful. The way that these are used to determine optimal performance, however, is not explained. Please explain. (The text notes that four metrics are employed, but only three are mentioned in the text. Please correct this error.)
34. Section 2.4.3. Similar to the previous section, a number of strategies for testing the prolongation results are described but the performance metrics are not specified. In addition, the decisions made based on the comparison of performance metrics are not specified. Testing the methodology is good, but explaining the metrics and explaining how the tests will be used is also necessary.
35. Figure 3. The authors present box and whisker plots but have not specified the population being analyzed. Is there a single value of each metric for each of the flux towers? This must be explained in the figure caption.
36. Figure 4 presents five different versions of AutoML gap filling. Which one is used to construct the data set presented by this manuscript? Or does this manuscript present a data set that uses five different gap-filling methods? If so, what guidance is presented for potential users of these data?
37. Results. The results immediately focus on comparing AutoML to other methods without showing any of the results of the process used (e.g. sections 2.4 and 2.5 of the methods) to optimize the AutoML approach.
38. Lines 374-375. “This pattern reflects the increased uncertainty of evapotranspiration processes under humid and subtropical climate conditions.”
Your RMSE metric is dimensional. Performance based on these metrics will also be a function of the magnitude of the flux. It is not surprising that the RMSE for arid sites is smaller than the same metrics for humid subropical sites.
39. Section 3.1.2 introduces a metric, the ability to reproduce the shape of the diel cycle, that is not defined in the methods. This metric needs to be explained in the methods. The reason for selecting this metric should also be described in the methods. Why is this metric added?
40. Figure 5. The gaps are not 30 days long. Please explain how this is an illustration of 30-day gap filling.
41. Figure 5. The comparison across methods is not clear. I suggest a more statistical evaluation of the daily cycle - a mean daily cycle and variability, for example. I cannot support the conclusions in the text regarding the superiority of the AutoML results based on the results shown in this format.
42. Figure 5. It appears (e.g. Fig. 5a) that AutoML sometimes underestimates the observed fluxes and creates a very smoothed representation of the fluxes. It would be helpful if the authors could describe whether or not this method creates a time series that is artificially smoothed in comparison to true flux measurements. True flux measurements include random sampling error (e.g. Lenschow and Stankov, 1986; Richardson et al, 2006). If AutoML does not reproduce this inherent feature of an EC flux time series this should be described.
43. Figure 7. The population of points is not described. Please explain what each point on (a) or (b) represents, and how many points make up (a) through (f).
44. Figure 8. Please define the x-axis more precisely in the figure caption and/or axis label. “Year” is not clear enough to be readily and rapidly understood.
45. Lines 454-455. “Overall, the prolonged time series reasonably reproduce the diurnal cycles and amplitude characteristics of the observed LE, exhibiting stable temporal continuity and physically consistent structures.”
46. These claims are very qualitative. The simple time series plots in Figure 9 are not sufficient to justify quantitative statements about the performance of this data product. I suggest that more quantitative metrics be used to illustrate the quality of the reproduction of the diel cycle of fluxes.
47. Lines 465-467. “the half-hourly time series examples in Fig. 9 demonstrate that the proposed prolongation framework is able to stably reproduce high-frequency LE variability and diurnal cycle structures at the half-hourly scale, thereby providing a reliable basis for subsequent aggregation to daily and monthly timescales.”
Flux bias is needed for evaluating aggregation to daily and monthly time scales. The time series comparison does not inform this purpose effectively.
48. Section 3.3.2. A time series comparison is an interesting visual but it does not provide quantitative understanding regarding the performance of the AutoML model. I am uncertain of the value of showing a subset of sites and only a qualitative comparison.
49. Figure 10. Some of the sites show relatively poor agreement with the model either in particular years ((g), 2014) or over the entire sequence (daily correlation, (f)). This isn’t discussed in the text.
50. Lines 492-493. “Fig. 11 presents time series comparisons between prolonged monthly LE and observation-based aggregated LE at several representative sites.”
Text of this variety belongs in the figure caption. Please remove all descriptions of the figures from the text and place them in the figure caption. This issue exists at many points in the document.
51. Lines 500-501. “some uncertainty remains in monthly LE estimates.”
This is not a helpful statement. Please explain “some uncertainty.”
52. Figure 11. I am uncertain of the value of this figure in the manuscript.
53. Lines 518-522. “The medians and distribution ranges of the two datasets are comparable, indicating that the gap-filling and temporal prolongation procedures do not introduce evident systematic biases. This consistency is generally stable across most underlying surface types and climate zones, with relatively larger dispersion observed only in high-variability ecosystems such as desert and shrubland, yet without any directional bias.”
“Comparable”, “high-variability ecosystems” and “relatively larger dispersion” are all vague terms. “high consistency” is also used earlier in this paragraph. These are not useful analyses. The degree to which this data product represents true observations should be described concisely and quantitatively.
54. Figure 13. I am concerned about the degree to which the “ChinaFlux” data (the x-axis) might be a filled data product. This is especially true at monthly to annual time scales where gap filling is essential. These become comparisons between gap filling algorithms, not between observations and a gap-filling algorithm. At minimum I would require any data on the x-axis to have a minimum fraction of true observations, and for that threshold to be stated clearly in the text. I would also eliminate all gap-filled data from the hourly data product comparison.
55. Lines 526-527. Please move this text to the figure caption.
56. Lines 527-529. “As the temporal scale progresses from daily to monthly and annual, the agreement between the two datasets further improves and dispersion decreases markedly, suggesting that the prolongation results effectively preserve the evapotranspiration characteristics reflected by ChinaFlux observations in a long-term mean sense.”
The reduction in variability with time averaging is true when averaging flux observations (and many other data) and is not a metric that the algorithm is effective.
57. Lines 536-540. This is a generic statement that is not helpful. Please delete.
58. Figure 14. The quantitative metrics in this figure are not defined. the population of points (?) shown within the figure is not defined. More information is needed for this to be useful in the manuscript.
59. Figure 15. Same comments as for figure 14.
60. Section 4.3. This section says nothing that I find valuable to publish. I don’t object with the statements, but they bring no new or insightful information to the reader.
61. Section 5.
3) Auxiliary data. I support the inclusion of the ERA-5 and MODIS data at the sites that was used to create the LE data product. I would strongly suggest, however, that the site-level observations (e.g. of radiative fluxes, sensible heat flux, atmospheric and soil state variables) also be included in this data product.
62. I do not see any indication that site metadata is provided in the data set. A description of the site ecosystem, terrain and soil characteristics is very important for data interpretation. Please add these metadata to the data product.
63. Section 6. The conclusions section is a summary of results. The conclusions should be the “take home” message for the readers of the manuscript. The summary of results belongs in the abstract. Detailed results belong in the results section. I suggest that this section should be rewritten.
Citation: https://doi.org/10.5194/essd-2026-22-RC2 -
AC2: 'Reply on RC2', Lifeng Wu, 09 Jun 2026
Dear Reviewer, thank you very much for your valuable comments, which have greatly helped improve the manuscript ESSD-2026-22.
We have carefully addressed all comments one by one and revised the manuscript and supplementary materials accordingly. For clarity and convenience, our detailed responses are provided in the attached PDF file rather than in this text box.
The attached file is:Author_response_to_reviewer2_ESSD-2026-22.pdf
We have not submitted the revised manuscript at this stage, as the system indicates that the revised version should not be uploaded here. However, in the point-by-point response document, we have clearly described all revisions in detail, so that the changes can be fully understood without directly consulting the manuscript.
If the revised manuscript is indeed required, please feel free to let us know, and we will upload it promptly. We apologize for any inconvenience this may cause.
Thank you again for your valuable comments.
-
AC2: 'Reply on RC2', Lifeng Wu, 09 Jun 2026
Data sets
A benchmark dataset for half-hourly evapotranspiration estimation in China from 2000 to 2024 L. Qian et al. https://doi.org/10.5281/zenodo.18194590
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 448 | 214 | 38 | 700 | 40 | 46 |
- HTML: 448
- PDF: 214
- XML: 38
- Total: 700
- BibTeX: 40
- EndNote: 46
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
General comments:
Thanks to the authors for their valuable contribution to the in-situ flux data collection for Latent heat flux data from various Chinese ecosystems for major climate zones.
The manuscript describes a continuous dataset based on half-hourly ET data from eddy covariance measurements complemented by site-specific ML approaches to fill data gaps and extend the time series beyond the observational periods.
The article is appropriate to support the publication of the data set, but could be shortened in some parts as exemplary indicated below.
The dataset is unique and useful in terms of duration and treatment for Chinese ET flux data. The public availability in open access repositories of such a dataset is relevant even if the creation of the gap-filled and prolonged dataset might be repeated in case observed data together with the software codes were available.
The aggregates for daily, monthly and yearly products could be omitted, as those time series could easily be derived by the users. In case those files are published, I’d recommend to change the naming of the single files within each archive: each file name should contain the time resolution as is done for the compressed archives.
Within a framework as established for this manuscript, I would expect some more information related to uncertainty of the final flux products that could e.g. be derived from random repetitions of the procedures.
data quality:
The 25 year long ET time series is of good quality, also input data for variables that drive ET from MODIS and ERA5 seem reasonable.
Specific comments:
What about uncertainty estimates of the gap-filled ET fluxes e.g. based on random repetitions of the procedures?
Several references in the text are missing, even though listed in the ‘References’ section. Please check.
Naming of the model: AutoML-H2O or H2O AutoML or only AutoML? Please use the abbreviation consistently.
Specific remarks related to the text:
Line 26: ‘…conventional methods for long-gap conditions of 7 or 30 days, respectively.’? You might reformulate the sentence as those gap conditions are artificially introduced. Under normal conditions, data gaps vary in duration.
Line 30: does the ‘strict quality control’ relate to the EC data from measurements or to the modelled ET data?
Line 65: ‘EC observations provide half-hourly measurements of latent heat flux (LE) in combination with its driving variables,…’. (EC itself delivers only fluxes)
Line 69: ‘Taking the FLUXNET2015 dataset as an example,…’
Lines 85ff: You claim that Chinese flux sites are underrepresented in integrative flux analysis. Might this also be due to the fact that data from Chinaflux are usually not accessible for non-Chinese researchers? Data sent to the FLUXNET portal should be accessible via the FLUXNET Shuttle (https://data.fluxnet.org/). Also see Papale, D.: Ideas and perspectives: enhancing the impact of the FLUXNET network of eddy covariance sites, Biogeosciences, 17, 5587–5598, https://doi.org/10.5194/bg-17-5587-2020, 2020.
Line 131 and later line 160: is the quality control related to EC data? What are the site selection criteria?
Lines 158ff: references for the processing and quality assurance steps should be added.
Lines 160 and 161ff: references and details for ChinaFlux and FLUXNET procedures as well as previous studies should be added.
Line 178: did you use the data from the 10 sites with pre-gap-filled time series in the same way as the data from the other 40 sites? So only artificial gaps introduced? Are the gap-filled data of those sites marked as such? If these gap-filled data are used for training, ‘no new information is generated’
Lines 182-183: ‘The half-hourly LE data form the foundation for subsequent gap-filling,…’ is that what you want to say? If yes, please re-formulate accordingly
Line 193: citation for ERA5-data missing in the text (e.g.: Copernicus Climate Change Service (2022): ERA5-Land hourly data from 1950 to present. Copernicus Climate Change Service (C3S) Climate Data Store (CDS). DOI: 10.24381/cds.e2161bac (Accessed on 11‑11‑2025)
Line 203: I had no glue on the ERA5-Land data, but is Rn really net solar radiation, which would be Albedo? Instead, Rn is commonly used for net radiation including longwave components. Later in the text (line 571, line 577) Rn is used for net radiation.
Line 213: reference for ‘official documentation’?
Lines 210ff: citation for MODIS products
Line 216: what is the temporal resolution of MODIS products (‘relatively low’)? And is the assumption of constant NDVI and LAI ‘within each compositing period’ sufficient for fast growing plants, disturbances etc.?
Line 228: re-formulate: ‘Each flux tower site was treated…’ – it is not the tower that is of relevance.
Line 230: How was quality screening performed? According to standard FLUXNET and ChinaFlux procedures? See above
Lines 240ff: was the modelling tool adopted from somewhere else (reference!) and what is the contribution of the authors?
Line 249: Which models were selected by AutoML? Very different models? How many different models?
Lines 280ff: ERA5 data are used instead of onsite measured meteorological variables. How do those compare with locally measured data? What about the additional uncertainty?
Line 316: the repetitive sentence can be removed here (These prolonged datasets provide the basis for subsequent construction and analysis of multi-temporal-scale LE products.)
Lines 332-333: does that mean, in case there would be only one half-hour value missing within a 1 d aggregate, the 1 d value would get the flag F? Hardly any ET time series from EC measurements is complete due to unfavourable atmospheric conditions. As a result, any daily (7 day, monthly, respectively) aggregate is based on a mixture of measured and gap-filled data. How are the aggregates flagged then?
Line 432: suggestion: ‘…the 6-year training scenario represents more typical conditions and weather regimes at most sites.’
Chapters 3.3.2 and 3.3.3: this analysis and the accompanying figures seem to be redundant with very little additional information for a data paper, as the monthly values are just aggregated from the daily values which contain gap-filled data. In addition, figure 11 shows again some typical sites but with very different time periods, varying from only 2 years to 10 years. Why are these examples chosen so different if they are compared to check for seasonal variation?
Line 515: what exactly is ‘official ChinaFlux observations’? Please add reference!
Line 547: what is meant with ‘stratification’ in the context?
Lines 560ff: Especially for desert and shrubland, evaporation becomes more dominant compared to transpiration. So it is clear that vegetation-related variables explain less variability. This point might be considered here as well.
Lines 577ff: same as above, more soil is exposed, so more evaporation compared to dense canopy covers.
Figures: Readers should be able to interpret figures with the figure description. Most figures need more descriptive text.
Fig. 1b) more description needed,
Fig. 1d) length of observation periods (in years) for all sites
Figure 2: a bit overwhelming, but a good overview still. I don’t see QA/QC for EC-data (which contributes to gaps). The figures for comparison with MDS and ML methods and also the ones for performance do not add any value due to their small size. The text is not readable and legends are missing. Even if the content becomes clear for the results in the lower right corner, instead of T, F, and P for true, filled and prolonged you might use additional colour indication as T, F and P are not in the figures anyway.
Fig. 5 and 6: what is the measure for the significant bias marked by the blue boxes?
Fig. 7, a) and b): y-axis needs legend. More explanatory test in the figure description is needed, e.g. for ‘Relative density’
Fig. 8a): from the figure description it is not clear whether the bars or the lines relate to the left or right y-axis. Please provide more information in the figure description.
Fig. 9: might be removed or moved to the appendix. Instead give more details about statistics in the text in chap. 3.3.1
Fig. 10 and 11: same as for fig. 9, as daily and monthly values are just aggregates of half-hourly values in case less than 10% of missing data.
Fig. 12: ‘l’ is missing in word ‘Daily’ in c) and d)
Table A1: make sure that words in the table header are not wrapped (looks ugly)
Line 669: What is meant with: ‘..that this station provides only interpolated data.’Are these the same 10 sites mentioned above wwith gap-filled data? Are those data treated as measured data?