the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
A flux tower site attribute dataset intended for land surface modeling
Abstract. Land surface models (LSMs) should have reliable forcing, validation, and surface attribute data as the foundation for effective model development and improvement. Eddy covariance flux tower data are considered the benchmarking data for LSMs. However, currently available flux tower datasets often require multiple aspects of processing to ensure data quality before application to LSMs. More importantly, these datasets lack site-observed attribute data, limiting their use as benchmarking data. Here, we conducted a comprehensive quality screening of the existing reprocessed flux tower dataset, including the proportion of gap-filled data, external disturbances, and energy balance closure (EBC), leading to 90 high-quality sites. For these sites, we collected vegetation, soil, topography information, and wind speed measurement height from literature, regional networks, and Biological, Ancillary, Disturbance, and Metadata (BADM) files. Then we obtained the final flux tower attribute dataset by global data product complement and plant functional types (PFTs) classification. This dataset is provided in NetCDF format complete with necessary descriptions and reference sources. Model simulations revealed substantial disparities in output between the attribute data observed at the site and the defaults of the model, underscoring the critical role of site-observed attribute data and increasing the emphasis on flux tower attribute data in the LSM community. The dataset addresses the lack of site attribute data to some extent, reduces uncertainty in LSMs data source, and aids in diagnosing parameter as well as process deficiencies. The dataset is available at https://doi.org/10.5281/zenodo.10939725 (Shi et al., 2024).
- Preprint
(1821 KB) - Metadata XML
-
Supplement
(539 KB) - BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on essd-2024-77', Anonymous Referee #1, 17 May 2024
Please see attached document with the review of the manuscript.
- AC1: 'Reply on RC1', Jiahao Shi, 26 Jun 2024
-
RC2: 'Comment on essd-2024-77', Anonymous Referee #2, 05 Jun 2024
Shi et al. : A flux tower site attribute dataset intended for land surface modeling
This paper describes a dataset based on flux-tower measurements obtained from network databases, which underwent additional quality control and were combined with ancillary data characterizing the sites. The dataset was created to make flux-tower measurements including site characteristics available to the land surface modelling community, enabling site-level simulations with site-specific soil and vegetation information, where available. The additional quality control reduced the number of available sites and resulted in discontinuous time series at least at some sites. The paper shows that land surface model (LSM) simulations with soil and vegetation characteristics obtained from global gridded datasets instead of site-specific data can lead to large differences in simulated pools and fluxes.
I believe that a dataset including both flux-tower observations as well as site attributes required to run and evaluate LSMs is of interest to the community and useful for model development. The paper is generally well organized and written. There are, however, several sentences, which are not completely clear and should be rephrased. Generally, the paper should be checked and corrected for language issues. I have mentioned some, but not all, of these in the specific comments. I suggest that the below comments should be addressed before publication.
General comments:
It should be made clearer what exactly the quality control entailed and whether all variables were removed from the dataset, when one of the variables was gap-filled or had lower quality data, or if just that particular variable was removed. It is not completely clear to me whether both the atmospheric forcing variables and the flux measurements used to evaluate LSMs have discontinuous timeseries in the dataset. If the forcing variables are discontinuous, the authors should make it clearer how this is handled in LSMs and how the data are still useful for LSMs.
Regarding the soil attributes that were included for the sites, I’d be interested why the authors do not mention soil depth. I’m aware that soil depths measurements are generally not available for the sites, but it is an important variable in many LSMs. Even if it is obtained from global gridded datasets, it could still be useful to include in this dataset. Another variable, which was not included, is the measurement height of air temperature. As this is required in several LSMs and is not always the same height as the measurement height of wind speed, I think it would be useful to include the air temperature measurement height as well or explain why it was not included.
Some of the Tables and Figures could be improved by organizing sites in the same order for the different variables that are shown or to show the selected variables for all the sites. For example, Table 2 and Figure 7 could be made clearer.
Specific comments (by line numbers):
- 15: Be more specific what you mean with “external disturbances”? Aren’t all disturbances external?
- 51: It should be “at some sites”.
- 55 ff.: For site-level simulations, it isn't always the case that gridded data products are used to obtain soil textures, etc., if site-specific information is available in the literature.
- 76: Why are LAI and canopy height included in the four most important attributes, even though they aren't required as inputs for many LSMs? Soil depth, however, is not mentioned, which can strongly impact model outputs and is required my many LSMs as well.
- 85: What are the “7 site-related articles” and why do you mention the number? It doesn’t seem like you use site-specific publications for all the sites, so what is special about these 7?
- 90: Better than what?
- 93: What exactly do you mean with "LAI complements"? Are these site measurements gap-filled with MODIS LAI?
- 96: It should be “use” instead of “using”. Otherwise, the sentence is incomplete.
- 105 f.: Why don’t the soil attributes include soil depth? That is used in many LSMs as well and can have strong impacts on soil moisture and temperatures.
- 108: What do you mean with "revised by wind speed measurement height"? Also, why only wind speed? The measurement height of air temperature is required by many models as well and isn't always the same height as the wind speed measurement height.
- 109: “breakdown to” should be “broken down into”
- Table 1: Why is the MODIS LAI dataset included in the table twice?
- 123: Did you exclude those years for both the fluxes and meteorology? Why did you not just remove the low-quality fluxes, but kept the meteorology and high-quality flux data for those time periods? To evaluate the model simulations, you do not necessarily need all flux data. Only the meteorological forcings have to be complete and they do not have to be of low quality, when some of the flux measurements are.
- 132: What do you mean with “impacted by a sizable body of water”? Was the site flooded or did a lake or so develop at the site?
- 132 f.: “we preserved non-consecutive years that met the criteria” - Does this apply to both the meteorology as well as fluxes? As the meteorology is needed to force LSMs, using discontinuous years of meteorological data seems like it would not be very useful for LSMs and could cause crashes or strange behaviour in models, if the meteorology suddenly shifts with jumps in time. The end of one year could be much colder/warmer or wetter/drier than the beginning of the next available year, which would likely cause the model state to be out of phase with the actual meteorological conditions. Why did you decide on this approach? Also, why not include high-quality gap-filled data at least for the meteorological forcings. For the fluxes, which are only used to evaluate the models, it seems reasonable to only keep measured values, but that does not mean that the meteorology has to be discarded as well.
- 224: Why only the first year and not all available years at the sites? One year could be an unusual/extreme year and not representative of the usual site conditions.
- 225: Why only do such a short spin-up, if GPP is evaluated as well? Are the vegetation and soil C pools prescribed and not dynamic?
- 228 f.: What do you mean here? It seems like the sentence is incomplete. Is the next sentence supposed to be part of this sentence?
- Table 2: Why do you show the different attributes for different sites? Wouldn't it make more sense to select the same sites and same order of sites in the table for all attributes? Then, you also only need the site column once and it's less confusing. Regarding soil texture: are the values averages over different depth or values for the top layer/near-surface?
- Figure 2: Don't you mean “number of years”, not “site numbers” in the caption for (b)? In (d), is this the actual number of sites or the percentage? The name Hcan is a little confusing, as you talk about sensible heat flux as H above and here H is height.
- 269: Didn’t you say that you excluded sites with only one year of data? How can the individual site observations range from 1 to 17 years then?
- Figure 3: I do not see the difference between site and default data for the PCT_PFT. Where is it? This also applies to l. 282. If you have multiple PFTs at the site, is the canopy height the maximum height, the average or an average weighted by the fractions of those PFTs present at the site? The same question also applies to the LAI.
- 285: This should be “at certain sites”.
- 292: Rephrase this sentence to make it clearer. Do you mean the file "provides" and what do you mean with "range of years for maximum LAI"?
- Table 3: Regarding the Reference height: What about the measurement height of air tenperature? That is required by some models as well. It's unclear what you mean with “b Range of years with maximum LAI”. If there are multiple LAI measurements, isn't each measurement for a specific year? Otherwise, if it is the maximum LAI of a timeseries, you should make that clearer.
- 318: It is unclear to me what you mean with “were comparatively equilibrated”. Rephrase this to make it clearer.
- 319: “relatively significant” -> Do you mean it is “statistically significant”?
- Figure 7: Why do you show SWup and GPP at 2 sites only and don't show the LE and H there? Also, it doesn't seem to show observations for SWup at US-KS2. Why show that variable at that site, if observations were not available? Why were these specific 8 sites chosen for the figure (and not all 36 sites) and why don’t you show LE, H, GPP and SWup at all the selected sites?
- 355: I think it would be good to be more specific what exactly you mean here, as for example different land surface modelling groups pay attention to the site-specific data required to set up sites and many measurement groups collect at least some of the data, but it's not always easily accessible. I think it would be important to point out the need for more site attribute data to be included in flux datasets, etc.
- 369: Why was the model run at only 36 of the sites and how were these sites selected?
- 375 f.: Couldn't this also be related to other uncertainties such as in soil textures, soil moisture, thermal and hydraulic conductivities, LAI and GPP affecting canopy evaporation and transpiration? Why focus on the IGBP classification?
- 376 f.: What do you mean with “unit LAI variations”?
- 378: Why did you choose sites with LAI > 2 m2/m2, if the impact is larger at sites with lower LAI? As I’m not sure what you mean with “unit LAI variations”, I might be misunderstanding this though.
- 393 f.: Which site attributes did they modify and to what extent? What kind of site were they looking at? Also, this might be model specific how sensitive the model is to certain variables. Instead of “a previous study viewed”, do you mean “showed”?
- 397: “Mostly during the growing season” -> This depends. For example albedo differences due to PFT selection can have significant impacts when snow is present (depending on whether snow covers the vegetation or not, etc.).
- 402: How exactly are these low-cost? That seems to depend on whether the measurements are already done at a site or not. Especially, measurements that have to be done manually instead of automated can be labor-intensive and thus not inexpensive.
- 405: Why do you make the statement that an increasing array of surface parameters elevates the model to a heightened level of sophistication? New processes and more complexity do not necessarily improve results and increase uncertainty, as many parameter values are not well defined or constrained.
Citation: https://doi.org/10.5194/essd-2024-77-RC2 - AC2: 'Reply on RC2', Jiahao Shi, 10 Aug 2024
-
RC3: 'Comment on essd-2024-77', Anna Ukkola, 13 Jun 2024
Shi et al. improve an existing flux tower dataset developed for land modelling. These efforts are very valuable for the land modelling community and as such it was a pleasure to read this paper. I fully agree with the authors on the need to provide improved ancillary data for modelling and commend the authors’ efforts in collating data on key variables which is not a simple task. This is a valuable contribution to the field but I do have a few comments I would ask the authors to consider:
- I feel this paper is somewhat a missed opportunity by applying the quality control process to PLUMBER2 rather than taking the PLUMBER2 framework and applying it to newer releases of flux tower data. The datasets used in PLUMBER2 are now quite out-of-date and it would have been fantastic to see an update that incorporates newer data and possibly additional sites (e.g. from ICOS and data from individual networks)
- Some of the details around data processing are not described in sufficient detail. I mention a number of specific examples below
- I would be cautious to only provide one LAI product. In the PLUMBER2 paper we found very large differences in LAI from the MODIS and Copernicus products at some sites. A comparison to max LAI is provided in the paper but the time evolution of MODIS and Copernicus can also be very different. I would strongly encourage the authors to also consider alternative LAI products. It was also unclear how the MODIS data was processed?
Specific comments:
L13: Can you mention an example here of what you mean by “site-observed attribute data”?
L18: Please check grammar here, wording unclear
L41: would be good if you could mention some examples of “poor quality data” and “deficiency of attribute data” to make this a bit more concrete
L53: The reason for not screening flux data for gapfilling was that the requirements around this can be very study-specific. Some research questions might need high quality multi-year records whereas others might concentrate on individual events. This is just a comment but screening for flux gapfilling is challenging when creating a dataset for general use.
L55: Yes this is often the case but there are also many studies relying on site-specific information where this is available. Often this is done out of necessity as you point our on L67
L75: No argument that these are important but can we really state that they are the most important attributes?
L87: It is not clear how the Köppen-Geiger classification helps with LSM modelling?
L90: MODIS v6.1 might be better but uncertainties in remotely sensed LAI can be huge. PLUMBER2 provides two independent LAI for this reason as at some sites LAI estimates are vastly different. I’m not sure providing an estimate from a single dataset is helpful or superior. If anything, it would have been valuable to include more LAI datasets to account for uncertainty and constrain these with site observations where available
L90: Can the authors demonstrate that the data is smoother and more consistent? It is also not documented how this data was processed. Taking the raw values without additional quality control is rarely sufficient
L93: please check grammar
L95: Were these PFT estimates cross-checked against site information e.g. from past papers? Global PFT datasets can be highly uncertain at flux towers even if provided at a high spatial resolution
L99: It is a little unclear how it is helpful providing soil type estimates using a dataset already applied in LSMs. As the authors state at the start, the use of such global datasets in flux site simulations risks discrepancies in model-obs comparisons
L104: noting that elevation was provided in PLUMBER2
L109: breakdown -> broken down
Figure 1: “Data complement” is not entirely clear, do you mean supplementing site obs with global datasets?
L122: Would be good to justify why (1) was done? On L133 you say non-consecutive years were kept to maximise utility of obs data, this somewhat contradicts that principle?
L123: As mentioned earlier, is this desirable, restricting how the data can be used for individual applications?
L126: ideally the VPD screening should be done in conjunction with temperature screening as both were used to convert VPD to specific humidity. A very good point that PLUMBER2 only used temperature but not VPD gapfilling information when screening specific humidity data
L127: These sites still provide non-corrected latent heat. It is not clear whether the EBF-corrected data is “better” (see https://egusphere.copernicus.org/preprints/2024/egusphere-2023-3084/)
L140: I don’t quite follow this?
L143: please check grammar
L148: is this really true?
L150-156: All of this needs further details, I don’t follow how these steps were done
L171: It would have been valuable to use these site-observed values to constrain remotely-sensed (MODIS) LAI, was this step done? It would provide a useful guide as to how reliable the MODIS estimates are
L195: elevation was provided for each site in PLUMBER2 so how is this an advance?
L203: This was also provided in PLUMBER2, would be interesting to know if the authors identified different heights to what was reported?
L225: these are not really climate variables?
L227: Runoff is not available at flux sites so I don’t follow how it was used?
L229: grammar
L257: I don’t see superscript “e” in the table?
Figure 2: The IGBP type was provided in PLUMBER2, would be interesting to know how different the PFTs provided here are? Much of the PF T information in PLUMBER2 came from site-specific data provided on Fluxnet and regional network websites
L285: This is why it would have been useful to include alternative LAI products and select the most suitable one at each site (PLUMBER2 attempted this but this could no doubt be improved). Only relying on one dataset is arguably not an improvement given the discrepancies
L292: “provides” might be better
L355: Very much agree with this statement. Would be great for this paper to call for the provision of these data in flux data releases (such as the successors of FLUXNET2015)
L356: This is with the caveat that not all sites had site-observed values for the attributes provided?
L362: grammar
L364: please provide examples here of what you mean
L372-379: this section could be clearer
L392: what do you mean by “the full realization of differences in soil infiltration capacity“?
L399: I don’t follow this? “Nevertheless, the data sources were published works, leading to deficiencies for certain sites“
L407: “facilitating perception of the authentic feedback with diverse schemes and processes.” What does this mean?
L441: This would be a great place to call for attribute data to be routinely released as part of flux tower data collections so ancillary data could be accessed more easily and routinely
Best wishes,
Anna Ukkola
Citation: https://doi.org/10.5194/essd-2024-77-RC3 - AC3: 'Reply on RC3', Jiahao Shi, 10 Aug 2024
-
RC4: 'Comment on essd-2024-77', Lingcheng Li, 09 Jul 2024
The comment was uploaded in the form of a supplement: https://essd.copernicus.org/preprints/essd-2024-77/essd-2024-77-RC4-supplement.pdf
- AC4: 'Reply on RC4', Jiahao Shi, 10 Aug 2024
Data sets
A flux tower site attribute dataset intended for land surface modeling Jiahao Shi et al. https://doi.org/10.5281/zenodo.10939725
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
794 | 166 | 62 | 1,022 | 65 | 29 | 46 |
- HTML: 794
- PDF: 166
- XML: 62
- Total: 1,022
- Supplement: 65
- BibTeX: 29
- EndNote: 46
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1