A flux tower dataset tailored for land model evaluation

Eddy covariance flux towers measure the exchange of water, energy and carbon fluxes between the land and atmosphere. They have become invaluable for theory development and evaluating land models. However, flux tower data as measured (even after site post-processing) are not directly suitable for land surface modelling due to data gaps in model forcing variables, inappropriate gap-filling, formatting and varying data 10 quality. Here we present a quality-control and data-formatting pipeline for tower data from FLUXNET2015, La Thuile and OzFlux syntheses and the resultant 170-site globally distributed flux tower dataset specifically designed for use in land modelling. The dataset underpins the second phase of the PLUMBER land surface model benchmarking evaluation project, an international model intercomparison project encompassing >20 land surface and biosphere models. The dataset is provided in the Assistance for Land-surface Modelling Activities (ALMA) 15 NetCDF format and is CF-NetCDF compliant. For forcing land surface models, the dataset provides fully gapfilled meteorological data that has had periods of low data quality removed. Additional constraints required for land models, such as reference measurement heights, vegetation types and satellite-based monthly leaf area index estimates, are also included. For model evaluation, the dataset provides estimates of key water, carbon and energy variables, with the latent and sensible heat fluxes additionally corrected for energy balance closure. The dataset 20 provides a total of 1040 site years covering the period 1992-2018, with individual sites spanning from 1 to 21 years. The dataset is available at http://dx.doi.org/10.25914/5fdb0902607e1 (Ukkola et al., 2021).


25
The global network of flux towers now encompasses >900 sites globally (https://fluxnet.org/), with the longest records spanning over three decades. With their increasing spatial and temporal coverage, flux towers have become an invaluable dataset for evaluating process representation in land surface models (LSMs). LSMs within climate models are key tools for projecting future climates and also operate within operational weather and 30 seasonal prediction models (Pitman, 2003;Dirmeyer et al., 2019). Their key role is to simulate the terrestrial carbon, water and energy cycles both in coupled climate models and uncoupled stand-alone applications. Flux towers provide simultaneous observations of the meteorological data needed to force offline LSMs as well as estimates of key ecosystem water, energy and carbon fluxes at a spatial scale against which LSMs can be evaluated. Flux towers are also one of the few data sources to provide measurements at time scales appropriate 35 for diagnosing model process representations, providing high frequency sub-daily (typically 30min) observations.
Several global multi-site collections such as FLUXNET2015 (Pastorello et al., 2020) have been released that provide valuable opportunities for evaluating LSMs across multiple climates and biomes. Whilst these collections 5 overcome many limitations of raw flux tower data, the data are not provided in a format directly usable in land surface modelling. The datasets require varying levels of gap-filling, unit conversions and data formatting to be applicable for modelling exercises, and are missing key metadata, such as measurement height and vegetation characteristics. Most importantly, not all flux tower data releases provide temporally continuous meteorological observations which are essential for forcing LSMs. FLUXNET2015 overcomes this key limitation by providing 10 fully gap-filled meteorological observations but includes long periods of gap-filling at some sites, resulting in missing diurnal and/or seasonal cycles. Extended periods of synthesised meteorological variables are problematic in model applications, not only because they bias model estimates at concurrent time steps, but also because they bias future model predictions due to model state memory, such as soil moisture. As such, the data quality requirements for land modelling present a challenge that is not yet met by standard flux tower data releases.

15
Here we present a collection of 170 globally-distributed flux tower sites collated from three data releases (FLUXNET2015, La Thuile and OzFlux) that results from applying land surface model focused quality control and ancillary data collation. By combining multiple data sources, we were able to maximise the number of available sites to enable model evaluation against a wider range of climate and vegetation conditions. The dataset 20 covers the period 1992-2018 (although the majority of site records end in 2014) with individual sites spanning from 1 to 21 years, with a total of 1040 site years. The dataset provides quality-controlled, fully gap-filled meteorological variables for forcing LSMs, together with a comprehensive set of flux variables for model evaluation. The data are provided in the Assistance for Land-surface Modelling Activities (ALMA; https://www.lmd.jussieu.fr/~polcher/ALMA/) format, the international standard in land surface modelling, and 25 are Climate and Forecast (CF) NetCDF (https://cfconventions.org/) compliant. The dataset additionally provides various metadata for the sites, including reference / measurement height (for emulating the lowest layer of the atmospheric model to which the LSM would be coupled), vegetation type (to ensure plant physiological traits are appropriate) and two different satellite-derived estimates of each site's monthly leaf area index (LAI). The dataset underpins the second phase of the Protocol for the Analysis of Land Surface Models (PALS) Land Surface Model

30
Benchmarking Evaluation Project (PLUMBER; Best et al., 2015) which has participants from >20 land surface and biosphere modelling groups internationally. Whilst primarily designed for modelling purposes, the dataset would also be valuable for other applications requiring quality-controlled meteorological data at multiple sites. In the following sections we describe the processing steps to derive the dataset.

Datasets
We collated data for 223 flux towers from three flux tower data collections. We first obtained all available Australian sites from the OzFlux network (Isaac et al., 2017). We then obtained all Tier 1 (open data policy) globally distributed sites from FLUXNET2015 (November 2016 release; Pastorello et al., 2020), excluding sites available in OzFlux. For all FLUXNET2015 sites, data from the "FULLSET" release was used. Finally, additional sites that were not present in OzFlux or FLUXNET2015 were taken from the La Thuile Free Fair-Use release 5 (https://fluxnet.org/data/la-thuile-dataset/). The final dataset consisted of 29 sites from OzFlux, 132 from FLUXNET2015 and 62 from La Thuile. These sites were further screened to derive the final subset of 170 sites using the protocols detailed below.

10
We undertook multiple processing steps to derive the final, quality-controlled dataset. The data were first preprocessed with the FluxnetLSM R package (Ukkola et al., 2017) to convert the files to ALMA-formatted NetCDF files with consistent units and variable conventions. The data were subsequently screened using expert judgement to only retain period of good quality meteorological data. Additional corrections were then made to meteorological 15 data to remove outliers, non-physical values and gap-fill any remaining missing values. The flux variables were not screened but additional latent and sensible heat flux estimates were calculated to correct for energy balance closure. Finally, we derived two independent leaf area index time series for each site from remotely sensed data to account for uncertainties in satellite-derived LAI. A flowchart of the processing pipeline is shown in Figure 1

Initial processing with FluxnetLSM
The three datasets come in various formats, different units and variable naming conventions. We used the FluxnetLSM R package (Ukkola et al., 2017) which has been designed to translate flux tower data for use in land 25 surface modelling. The package was used to process the data into ALMA-formatted CF-compliant NetCDF files with consistent variable names and units to be readily usable in land surface modelling (see Table 1 for ALMA conventions and variables included in the final dataset). In addition, FluxnetLSM was used to further gap-fill meteorological and flux variables and to include additional site metadata, such as elevation, reference and vegetation canopy heights, and vegetation type (following the International Geosphere-Biosphere Programme 30 (IGBP) classification) in the NetCDF files. While some of the information could be obtained from Fluxnet or regional networks, we supplemented site metadata available in FluxnetLSM by extracting information from publications and site principal investigators. These metadata were collected to inform modelling choices and are included in the final NetCDF files. FluxnetLSM is fully reproducible and provides a documented framework to replace ad hoc processing methods used in many previous flux tower collections for LSMs. The version of

35
FluxnetLSM used for processing is documented in the NetCDF file metadata.
FluxnetLSM was run separately for each parent dataset. OzFlux was first pre-processed to remove incomplete years as land surface models require whole years of data for spinning up soil water and temperature states. To achieve this, the data were first gap-filled data to complete days and incomplete years then removed using the 40 FluxnetLSM function "preprocess_OzFlux". This step was not required for FLUXNET2015 and La Thuile as they only report whole years. FluxnetLSM was subsequently used to process each dataset using the commands provided in Supplementary Section 1.
FluxnetLSM can be used to screen the data for missing and gap-filled time steps but this option was not used, 5 instead setting the allowed level of missing and gap-filled data to 100% for all datasets and variables to allow subsequent manual visual data screening (section 2.2.2). However, the gap-filling methods for meteorological variables were set differently for each dataset. FLUXNET2015 provides continuous, downscaled ERAinterim estimates for all meteorological variables; these were used to gap-fill all missing time steps in the meteorological variables (setting met_gap-fill to "ERAinterim" in FluxnetLSM). For OzFlux and La Thuile, statistical gap-filling 10 methods provided in FluxnetLSM were used (setting met_gap-fill to "statistical"). For all variables except surface air pressure and incoming long wave radiation, short data gaps (up to 4 hours) were gap-filled using linear interpolation. Longer data gaps (up to 10 days for OzFlux and 365 days for La Thuile) were gap-filled using "copyfill" which takes the mean of the corresponding time steps during other years. Surface air pressure and incoming longwave radiation were synthesised using empirical methods. Air pressure was calculated from air 15 temperature and elevation using a barometric formula (Ukkola et al., 2017). Longwave radiation was calculated from air temperature and relative humidity using the method of (Abramowitz et al., 2012). The synthesised values were then used to gap-fill missing time steps.
Flux variables were gap-filled using statistical methods for all datasets. As per meteorological variables, short 20 gaps of up to 4 hours were gap-filled using linear interpolation. Longer gaps (up to 30 days for OzFlux and FLUXNET2015, and 365 days for La Thuile) were gap-filled using a linear regression of each flux variable against incoming shortwave radiation, air temperature and humidity (relative humidity or vapour pressure deficit). This approach was demonstrated to outperform a range of LSMs in a broad range of metrics in out of sample tests (see Abramowitz, 2012;Best et al, 2015). In the absence of air temperature or humidity data, the linear regression was 25 constructed against shortwave radiation only. A separate linear model was created for day-and night-time data.
Further details of all gap-filling methods can be found in (Ukkola et al., 2017).

30
We screened the original dataset of 223 sites to only retain sites and time periods with good quality meteorological forcing data. This was done to ensure models were forced with data that was largely observed to avoid biasing the model flux estimates. We used expert judgement to manually screen sites instead of an automated process to be able to compromise between data quality and time series length. During screening, we prioritised five key meteorological variables in site selection that have the largest influence on LSM simulations: incoming shortwave 35 radiation (SWdown), precipitation (Precip), air temperature (Tair), air humidity (Qair) and wind speed (Wind). These variables were allowed to have approximately 10% or fewer gap-filled time steps in any given year. If no years fulfilling this criterion were available, the site was excluded. For sites with heavily gap-filled or missing periods in the middle of the time series, we chose the longest continuous period with good quality meteorological data.
The remaining three meteorological variables (incoming longwave radiation (LWdown), atmospheric CO2 40 concentration (CO2_air) and air pressure (Psurf)) were allowed to be gap-filled or missing for a site to be selected but any missing or poor-quality data were later corrected as a post-processing step (section 3.2.3). Not all sites report these variables and as such, the less strict criteria were applied to retain as many sites as possible. The flux variables were not screened to allow model evaluation at multiple time scales and specific events. The specific criteria for excluding a site or time periods are provided for each site in Table S1. After site selection, the final 5 dataset included 23 sites from OzFlux, 102 from FLUXNET2015 and 45 from La Thuile. Table S1 provides a list of the selected sites, including the criteria for time period selection. Table S2 lists excluded sites and the reason for omitting them.

Further corrections to meteorological data
After selecting the final sites, meteorological variables were further corrected for anomalous values, step changes and missing data. These corrections mainly applied to CO2_air and LWdown due a larger proportion of gap-filled and missing data in these variables. Anomalous or non-physical values in other variables were also corrected at 25 individual sites.
For atmospheric CO2 concentration, we screened the data for step changes, unrealistically high concentrations and missing data. Where CO2_air was not provided for a site, we used annual concentrations from the Mauna Loa atmospheric CO2 time series (https://www.esrl.noaa.gov/gmd/ccgg/trends/mlo.html) for the period covered by

35
For OzFlux sites, unphysical values existed in the dataset that were corrected. These included negative Precip, SWdown, LWdown, Wind and Qair (vapour pressure deficit and/or relative/specific humidity) which were capped at zero. Similarly, relative humidity values above 100% were capped at 100%. At a further 11 sites, we also corrected large step changes in CO2_air, heavily gap-filled periods in LWdown (which led to unrealistic seasonal cycles) and anomalous values in Psurf and relative humidity. Table S3 summarises the corrections made to meteorological data 40 at each site. All corrections done during this post-processing step are documented at https://github.com/aukkola/PLUMBER2/blob/master/functions/site_exceptions.R.

Energy balance closure correction of latent and sensible heat fluxes
5 Latent (Qle) and sensible (Qh) heat fluxes were corrected for energy balance closure (EBC) using the Bowen ratio method (Mauder et al., 2020) to aid model evaluation. At flux tower sites, the sum of measured latent and sensible heat fluxes is commonly lower than available energy (Wohlfahrt et al., 2009), complicating comparison with models which conserve energy. The FLUXNET2015 dataset provides EBC-corrected Qle and Qh and as such, for sites derived from FLUXNET2015 these estimates were used (variables LE_CORR and H_CORR in 10 FLUXNET2015; . For La Thuile and OzFlux sites, Qle and Qh were EBC-corrected using a procedure adapted from FLUXNET2015. The EBC-corrected fluxes were obtained by multiplying Qle and Qh by an EBC correction factor (fEBC). fEBC was calculated for each time step separately as fEBC = (Rnet-G) / (Qh + Qle) where Rnet is net radiation and G ground 15 heat flux (all variables are in W m -2 ). Only time steps for which all four energy balance components were available were used. The fEBC time series was further filtered for data quality to only retain time steps for which observed G, and observed or good quality (qc value £1) Qh and Qle data were available. To remove outliers, fEBC values outside 1.5 times the interquartile range were then discarded.

20
The fluxes were then corrected using a two-step method. First, for each time step, a moving window of ±15 days was used to select fEBC for all time steps within the hours 22:00-2:30 and 10:00-14:30. Other times were discarded to avoid periods of large changes in ecosystem heat storage during sunrise and sunset periods which can bias the energy balance closure estimates (Pastorello et al., 2020). If at least five fEBC values were available within the moving window, the median of these values was used to correct Qle and Qh. Otherwise, the same moving window 25 of ±15 days and hours of the day was applied to the same time step using the current, previous and next year (if available). The median of all available fEBC was then used to correct Qle and Qh. If no available fEBC values were found using this method, the fluxes for that time step were not corrected.

30
We obtained two independent remotely sensed leaf area index (LAI) time series for each site inputs to account for large uncertainties in satellite-derived LAI estimates (Zhu et al., 2016). The LAI time series can be used to force LSMs that do not include a predictive carbon cycle and require prescribed LAI as an input. The standardised LAI time series are also useful reducing the degrees of freedom in evaluation studies by allowing the models to 35 be driven by the same LAI estimates, and allow the minimisation of LAI-driven model errors at sites where observed and modelled LAI converge strongly. The LAI data were derived from Moderate Resolution Imaging Spectroradiometer (MODIS) and Copernicus Global Land Service products as these products provide long-term records at high (£ 1 km) spatial resolution.

MODIS LAI
We used the MODIS product MCD15A2H, which is derived from a combination of the Terra and Aqua sensors at 500 m spatial resolution and 8-daily temporal resolution, starting in January 2000. The LAI data and associated standard deviation and QC flags were obtained using the R package MODISTools (Koen, 2020). The pixel 5 containing the site and its surrounding pixels (in total nine pixels) were obtained for each site. Only good quality data (QC flag values 0,2,24,26,32,34,56 and 58) were kept and all other values were set to missing. At each time step, a weighted mean was then calculated from the nine pixels by weighting them by their standard deviation error (defined as 1/σ 2 ). The resulting 8-daily time series were then gap-filled using a cubic spline function (Forsythe et al., 1977) and any negative LAI values set to zero. To remove unrealistic short-term variability in 10 LAI, e.g. due to cloud artefacts, that remained after the initial quality control, several steps were taken to further smooth the time series. The gap-filled time series was first smoothed using a cubic smoothing spline. A climatology (46 time steps) was then calculated from all available years. An anomaly time series was then created by removing the climatology and smoothed by taking a rolling mean over a window of ±6 time steps to further remove short-term variability. The climatology was then added to the smoothed time series and the 8-daily time 15 series interpolated to the time resolution of the flux tower data, using the climatological values prior to MODIS commencing in January 2000.

20
We used the Copernicus Global Land Service LAI v.2.0.2. which provides LAI estimates at 1 km spatial resolution and 10-daily temporal resolution for the period 1999-2017. The estimates have been derived from SPOT-VGT and PROBA-V sensors (Smets et al., 2019). The 10-daily data were first averaged to monthly by taking the maximum of the three 10-daily values for each month following the maximum composite procedure to remove low values e.g. due to cloud contamination. The data were then smoothed spatially by averaging each pixel with 25 its surrounding pixels (with each pixel representing the mean of nine pixels). The monthly values were then extracted for each site using the pixel containing the site. If the value for the pixel containing the site was missing, the value from the nearest non-missing pixel was used. To remove non-physical short-term variability, the monthly site time series was then smoothed using a cubic smoothing spline. A monthly climatology was then calculated and an anomaly time series calculated by removing the climatology from the monthly LAI time series.

30
The anomaly time series was smoothed by taking a rolling mean over a window of ±6 time steps to further remove short-term variability, before adding the climatology to the smoothed anomalies. Finally, the resulting monthly time series was interpolated to the time resolution of the flux tower data, using the climatology for time periods not covered by the product.

LAI selection for sites
Both Copernicus and MODIS LAI were provided for each site but we selected one as a preferred LAI time series for each site to use as the default for use with LSMs that rely on prescribed LAI. Overall, we selected MODIS as the default time series due to its higher spatial resolution but where MODIS was deemed unrealistic for the site 40 due to its magnitude, seasonal cycle or non-physical short-term variations, using site data where available, Copernicus was selected instead. Table S1 summarises the selected LAI time series for each site. The preferred LAI variable was called "LAI" in the final NetCDF files and the alternative time series "LAI_alternative".

Global distribution of selected sites
The final dataset includes 170 globally-distributed sites shown in Figure 3a. The majority of the sites are located

15
The sites cover a wide range of biomes, ranging from grasslands and savannas to forest ecosystems (Figure 3c).

Impact of screening meteorological variables
For the selected sites, the original time series was reduced at multiple sites to exclude periods of poor-quality meteorological data. The number of years excluded at each site is shown in Figure 4. Regionally, the average number of years excluded was similar over North America (mean: 1.8, median: 1) and Europe (1.9, 1) whereas 30 fewer years were removed over Australia (0.7, 0) (see sub-panels in Figure 3a for region definitions). The number of excluded years was also similar across the FLUXNET2015 (mean: 2.0, median: 1) and La Thuile sites (1.3, 1) datasets but lower for OzFlux (as per Australia). Overall, there were no systematic spatial variations in the number of years excluded.

35
For the selected sites, our data screening reduced the mean record length by 1.7 years (median: 1), ranging from 0 to 12 years for individual sites (Figure 4b). A total of 283 site years were removed. The majority of sites (139 out of 170) had 0-2 years removed, while only 11 sites had >5 years removed. The screening also reduced the proportion of gap-filled meteorological data from 21% to 15% on average for all meteorological variables. For the key variables, the level of gap-filled data was reduced from 10.4% to 3.6% for Tair, from 16% to 5% for Precip, 40 from 10.1% to 7.6% for SWdown, from 7.8% to 2.6% for Qair and from 15.9% to 8.3% for Wind on average across all sites. Less strict criteria were applied to LWdown and CO2_air leading to a larger proportion of gap-filled data in the final dataset. For LWdown, the proportion of gap-filled data was reduced from 48.8% to 41.5%. For CO2_air, the level of gap-filling remained similar (31.5% in screened data and 30.4% in the original data). This was due to the additional gap-filling done at multiple sites to correct for step changes and a large proportion of missing data 5 (7.2%) in the original dataset that was replaced with gap-filled values. The screening and post-processing of all meteorological variables also ensured that no missing values are present in the meteorological variables.

Impact of energy balance closure correction on latent and sensible heat fluxes
10 Flux tower observations do not commonly close the energy balance, with the sum of latent and sensible heat fluxes underestimated relative to available energy (Leuning et al., 2012;Wilson et al., 2002). This problem is particularly common in sites with heterogeneous land cover (Stoy et al., 2013) but is also driven by other factors such as unaccounted energy storage and mesoscale circulation impacts (Panin and Bernhofer, 2008;Leuning et al., 2012).
As LSMs balance all energy fluxes, latent and sensible heat fluxes were corrected for energy balance closure to 15 aid model evaluation. In total, corrected fluxes are available for 143 sites which reported all required variables to perform the correction (Rnet, G, Qle and Qh). FLUXNET2015 already provided EBC-corrected Qle and Qh estimates for 82 sites and we additionally corrected 38 La Thuile sites and 23 OzFlux sites.
At the corrected sites, the instantaneous EBC (i.e. the ratio (Qle+Qh) / (Rnet+G)) was 0.55 on average considering 20 all available data points (note additional filtering was applied during correction). The EBC correction on average increased Qle and Qh by 25% relative to the original estimates. At individual sites, the change in Qle and Qh relative to uncorrected data ranged from 82% lower to 88% higher. However, for the majority of sites (123 out of 143) the correction increased Qle and Qh.

25
The corrected variables should provide a more robust basis for evaluating model biases but rely on the assumption that the measured Bowen ratio is correct. Another limitation of the corrected fluxes is a larger proportion of missing data as the corrected fluxes are only provided for time steps for which the correction could be performed using our method detailed in section 3.2.4. As such, 9.2% of the corrected Qle is missing across all site years compared to 1.3% in the original Qle estimates. Similarly for Qh, 9.2% of corrected fluxes are missing compared 30 to 0.6% in the original data.

Data availability
The final dataset is available at http://dx.doi.org/10.25914/5fdb0902607e1 (Ukkola et al., 2021). The data can 35 also be obtained through https://modelevaluation.org/, including diagnostic plots of key variables for each site.

10
We have presented a quality-controlled flux tower dataset for 170 sites for use in land surface modelling. Whilst the dataset was developed with land surface modelling in mind, it is also suitable for other applications requiring a large collection of sites with good quality meteorological data. In our site selection, we prioritised long continuous periods of high-quality meteorological observations to derive a consistent dataset across individual sites. In doing so, shorter good quality periods were discarded for some sites (e.g. Be-Bra in Figure 2); future 15 work might revisit these choices to retain additional data periods. FluxnetLSM provides one possible reproducible tool for automated data screening to achieve this for the FLUXNET2015, La Thuile and OzFlux releases.
The meteorological data were screened and fully gap-filled using multiple criteria. This screening should allow model simulations to be produced that are less strongly biased by high levels of gap-filling and other data quality 20 issues that affect the original data collections. We did not quality control the flux variables used for model evaluation. This was to enable model evaluation at multiple time scales, ranging from sub-daily to interannual.
This also allows models to be evaluated against individual weather and climate events, such as heatwaves and drought. The lack of screening leads to a much higher proportion of gap-filled data in the flux variables which should be taken into account when selecting sites for individual applications.

40
The dataset additionally provides two alternative LAI time series for each site. These can be used as inputs to those LSMs that require LAI as an input. Alternatively, they can be used to evaluate simulated LAI in those models that predict it or to verify whether model biases arise from predictive LAI feedbacks. However, it should be noted that the remotely sensed LAI estimates are uncertain at site scales, with large differences between Copernicus and MODIS LAI at many sites. This is both because of the difficulties inherent in estimating LAI 5 from satellites (methodological) and the fact the satellite data may be drawn from a different footprint from the one that influences the site scale measured fluxes (De Kauwe et al, 2011). LAI is a key model property and has a strong influence on simulated fluxes. As such, more accurate LAI estimates would be highly valuable for constraining models. Particularly, where site-level LAI is measured, the inclusion of these data in future flux tower collections would allow to better constrain large-scale remote sensing LAI estimates used to drive models 10 or evaluate model-simulated LAI. Additionally, the inclusion of detailed site properties in future collections would strongly benefit model evaluation. This includes information on vegetation composition and crop cycles, disturbance events such as fire, soil properties and irrigation. Furthermore, models ideally require parameters such as reference height and canopy height to reduce model-observations mismatches arising from model inputs. Key metadata were collected from multiple sources for this data collection but the inclusion of site characteristics in 15 future data releases would allow for more direct access to these metadata.
Finally, whilst our dataset includes a large number of globally-distributed flux tower sites, the flux tower network includes >900 sites in total. In constructing our dataset, we used the two most common global multi-site collections (FLUXNET2015 and La Thuile), supplemented by OzFlux. Whilst many flux tower sites are not freely 20 available, regional networks such as AmeriFlux, AsiaFlux and European Fluxes Database provide additional open policy sites that would be valuable in expanding our dataset. The current limitation with collating flux tower sites across multiple regional networks is the different data formats and standards they provide data in. To this end, active discussions are underway with Fluxnet and Ameriflux organisers to incorporate the data processing and formatting detailed in this paper into their automated data processing streams, reducing duplication and lag time 25 for the ecological and modelling community. Standardisation of these datasets into a common format would strongly benefit the wider community, modelling applications and theory development and would likely lead to a greater uptake of these data.