the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
The ABCflux database: Arctic–boreal CO2 flux observations and ancillary information aggregated to monthly time steps across terrestrial ecosystems
Anna-Maria Virkkala
Susan M. Natali
Brendan M. Rogers
Jennifer D. Watts
Kathleen Savage
Sara June Connon
Marguerite Mauritz
Edward A. G. Schuur
Darcy Peter
Christina Minions
Julia Nojeim
Roisin Commane
Craig A. Emmerton
Mathias Goeckede
Manuel Helbig
David Holl
Hiroki Iwata
Hideki Kobayashi
Pasi Kolari
Efrén López-Blanco
Maija E. Marushchak
Mikhail Mastepanov
Lutz Merbold
Frans-Jan W. Parmentier
Matthias Peichl
Torsten Sachs
Oliver Sonnentag
Masahito Ueyama
Carolina Voigt
Mika Aurela
Julia Boike
Gerardo Celis
Namyi Chae
Torben R. Christensen
M. Syndonia Bret-Harte
Sigrid Dengel
Han Dolman
Colin W. Edgar
Bo Elberling
Eugenie Euskirchen
Achim Grelle
Juha Hatakka
Elyn Humphreys
Järvi Järveoja
Ayumi Kotani
Lars Kutzbach
Tuomas Laurila
Annalea Lohila
Ivan Mammarella
Yojiro Matsuura
Gesa Meyer
Mats B. Nilsson
Steven F. Oberbauer
Sang-Jong Park
Roman Petrov
Anatoly S. Prokushkin
Christopher Schulze
Vincent L. St. Louis
Eeva-Stiina Tuittila
Juha-Pekka Tuovinen
William Quinton
Andrej Varlagin
Donatella Zona
Viacheslav I. Zyryanov
Download
- Final revised paper (published on 21 Jan 2022)
- Preprint (discussion started on 28 Jul 2021)
Interactive discussion
Status: closed
-
RC1: 'Comment on essd-2021-233', Anonymous Referee #1, 17 Sep 2021
Summary
The ABCflux dataset and companion manuscript provides, to my knowledge, the largest compilation of Arctic and boreal region carbon dioxide flux data, including net ecosystem exchange, and component ecosystem fluxes. This compilation therefore represents an unprecedented resource for synthesis studies aiming to understand high latitude carbon cycling and it's vulnerability to rapid high latitude global changes. The manuscript summarises the data acquisition and acquisition process undertaken and provides useful visualizations of the dataset that inform the reader of the main characteristics of the carbon exchnages, broken down by measurement approach, as well as dataset spatial and temporal coverage and representativeness. Overall the manuscript is well written and the dataset is comprehensive, logically structured, and carefully compiled, without any obvious errors. However, I have several minor-moderate remarks below about the manuscript and the dataset that should be addressed.
The main comment I have for this dataset and manuscript relates to pre-processing decisions, specifically the gap-filling methods for the eddy covariance data and the aggregation of the data to monthly fluxes, and the potential effect of these two decisions on uncertaintines. Text describing these decisions and their effects on uncertainties is treated summarily in the manuscript text, or not at all, and therefore text should be strengthened/added as required. If a clear justification of these decisions cannot be provided, I think the dataset could be revised to include both a raw data file and a monthly-aggregated data file (the current version) to allow for more customization/fidelity for data users.
Comments on the Manuscript249 - Could you justify more why the decision was made to aggreage to monthly timesteps as opposed to providing raw data and index columns that would allow for users to do their own aggregations.
Table 1 - I am not sure what is meant by “Soil respiration (or NEE)...” in the Natali et al. (2019) entry. Was only one of Rsoil or NEE ever used? Or is it more accurate use “and” as in the entries for the other studies?
2.1.2 Flux repositories - I am confused about the processing pipeline for these tower data. This paragraph should be restructured to follow the steps in a linear sequence as much as possible. Gap-filling is normally performed after USTAR filtering, but it comes first here. What is meant by “When only daily gap-filled data were supplied”? Aren’t the data half-hourly? I also do not understand what the second-to-last sentence (line 349-350) means.
2.1.3 solicitation - what type of data were these?
Line 373 - “needed to be filled”?
2.2 - There is no information on quality screening for chamber measurements. Can it be assumed that the published values are reliable?
Fig 5. Caption is not accurate. Should be a letter for each panel
Table 4 - This table is bit lacking. Separating the flux from uncertaintines makes it hard to read, and the component fluxes can be computed from each other, and thus not terribly informative.
4.1 639-666: I think something that has not been addressed completely is the fact that to compute a daily or monthly aggregated flux from a few chamber measurements one has to not only aggregate, but also upscale significantly more in the temporal domain than compared to EC, which likely has more temporal coverage. A chamber measurement for one half hour may agree closely with an EC measurement of for the same half hour, or perhaps for some period of that day, or perhaps even for that month. However, surely the uncertainty around the upscaled chamber flux must be much larger than the EC aggregation which may have a large number of temporal replicates and is an integration of a larger area? I would like to see this issue expanded upon in this section.
4.2 - 698-730: Building on my earlier comment about the details of the post-processing of EC data (in particular gap-filling choices). I wonder here why the filtered (but not gap-filled) flux columns were not provided alongside the gap-filled and partitioned columns? It seems this would enable quick comparisons for various topics of concern. For instance, how does gap-filling affect monthly aggregations? How much does gap-filling affect mechanistic conclusions of modeling exercises?
4.3 Representativeness - the discussion of geographic bias is useful, as is the comment about biome coverage. However, there is not much detailed consideration of the coverage with respect to the environmental covariates measured. A representativeness analysis like that of the following could be beneficial:
Hoffman, F. M., Kumar, J., Mills, R. T., & Hargrove, W. W. (2013). Representativeness-based sampling network design for the State of Alaska. Landscape Ecology, 28(8), 1567–1586. https://doi.org/10.1007/s10980-013-9902-0
Delwiche, K. B., Knox, S. H., Malhotra, A., Fluet-Chouinard, E., McNicol, G., Feron, S., et al. (2021). FLUXNET-CH4: a global, multi-ecosystem dataset and analysis of methane seasonality from freshwater wetlands. Earth System Science Data, 13(7), 3607–3689. https://doi.org/10.5194/essd-13-3607-2021
Comments on the Dataset:
The number of observations does not match data description. I assume this is because the observation ‘unit’ referred to in the text is not a month/site combination, but rather the flux/month/site combination? I think it would be better to report the month/site combination and describe how much of those month/sites have each component flux of interest. Especially since the unique ID (first column) refers to a month/site combo).
Can you explain more the lack of data citations for 30% of the observations?
I noticed none of the `data_maturity` is “preliminary” or “reprocessed” why are these provided? For future database expansion?
Why is measurement month named `Interval`? That is not intuitive.
Could `Measurement_frequency` be changed to provide the exact number of observations aggregated for the month? Instead, it could be named `Measurement_count`. Additionally a column for `Gap_count` could be provided and grouped with `Gap_perc` column. I think this would provide more useful information and data for dataset manipulation.
`Gap_perc`: why is there only 17% coverage for this variable? Shouldn’t it at least be the same as the next variable (Tower_QA_QC.NEE.flag)? This also raises the question of how to interpret aggregations from sparse chamber measurements (i.e. are you effectively gap-filling?). I assume the real gap-filling is only done for EC, therefore, somewhere you should make a clearer distinction between your methods and assumptions between EC and chamber aggregations.
`Tower_QA_QC.NEE.flag` variable is confusing. It seems to involve both the amount of gap-filled data and the quality of the gap-filing. Please provide a clearer explanation how to interpret the value between 0 and 1.
`Method_error_NEE_gC_m2` why does this variable only have 23% coverage, when the NEE aggregations are for 91% of the dataset? More information needs to be provided about under what circumstances it was deemed possible to compute an error and why. It seems to me that it may be possible to estimate an error or an uncertainty for any aggregation (chamber or EC tower).
Finally, can the authors please justify why they did not include any standardized variables extracted from geospatial products to make the dataset more ready for use? Things like MAT, MAP, and elevation could easily be filled using the WorldClim and GeoMorpo products respectively.
Citation: https://doi.org/10.5194/essd-2021-233-RC1 -
AC1: 'Reply on RC1', Anna-Maria Virkkala, 26 Nov 2021
The comment was uploaded in the form of a supplement: https://essd.copernicus.org/preprints/essd-2021-233/essd-2021-233-AC1-supplement.pdf
-
AC1: 'Reply on RC1', Anna-Maria Virkkala, 26 Nov 2021
-
RC2: 'Comment on essd-2021-233', Anonymous Referee #2, 02 Oct 2021
General Comments:
This manuscript describes a new database (ABC Fluxes) of CO2 flux measurements in arctic and boreal ecosystems. Overall the manuscript is well written and clear. However, the process of downloading data needs to be clarified – as I explain below, the big green “Download Data” button on the ORNL DAAC website did not give me a complete data file. One apparently needs to scroll down to the bottom to request the entire dataset, but this is not at all apparent at first pass and could lead to users missing data.
I also question the decision to exclude studies with limited measurements during the summer (while including limited measurements in off-seasons due to data scarcity during those time periods). While I agree that data sets with limited repetition are more uncertain than data with many repetitions, ideally ABC Fluxes users could make their own judgements about whether to include this data in their work. I recognize that going back to include this data is likely unfeasible at this point, so please instead provide an idea of how many studies were excluded. Then, perhaps future versions of ABC Fluxes could include these additional studies.
Specific Comments:
Line 208: “fluxes from plants and soils to the atmosphere.”
Line 216: Since you have included the measurement scale for eddy covariance and chamber measurements, please comment on the measurement scale for snow diffusion.
Line 250: At some point in paper, please quantify (at least approximately) how much data you lose by using monthly values and excluding papers that do not report data on a monthly basis.
Lines 304 – 309: the sub-plot labels b, c, d, and e are mixed up
Line 313-314: Here you only cite 5 prior synthesis efforts, but your Table has 7 other efforts. Is this an oversight, or is there a reason you did not look at two of the synthesis papers to identify potential data?
Line 323-328: I am curious to hear more about why you decided to exclude summer measurements if there weren’t many replicates. While fewer replicates will make the uncertainty higher, eliminating these data sets entirely could throw away potentially valuable information that ABCfluxes users may want. It seems to me that, in an ideal world, ALL data of sufficient quality (if not quantity) would be included, and then database users can decide whether or not to include these small studies in their results. I am not suggesting you re-do the database now to include all these disregarded summer points, I recognize that would likely be a huge time investment, but I am curious to know approximately how many datasets were discarded.
Line 364-368: Please clarify here that in FLUXNET2015, “_QC = 0” means measured values and “_QC = 1 means good quality gap-filled values. Otherwise, your phrase “0 = extensive gap-filling, 1=low gap-filling” could be interpreted to conflict with the FLUXNET2015 QC designations and may confuse people. You could write something like “indicating percentage of 366 measured (quality flag QC = 0 in FLUXNET2015) and good-quality gap-filled data (quality flag QC = 1 in FLUXNET2015); average from daily data; 0=extensive gap-filling, 1=low gap-filling).
Line 397: Ah! I now see that the dataset is supposed to have 6309 rows. The first time I downloaded this dataset I went to https://doi.org/10.3334/ORNLDAAC/1934, logged in, and clicked on the big green “Download data” button near the top of the page. However, the file I get from this only has 1408 rows. I now see that if I scroll down the webpage I can request a much large file. Why does the “download data” button not provide the full dataset? Can this be changed? If not, you may want to warn the user about this in your text.
Table 3: please clarify that by “Number of observations” you mean # of months of data.
Line 515: Please clarify that the likely reason you have less data from 2015-present is because of a reporting lag, not because eddy covariance towers are measuring less data now.
Citation: https://doi.org/10.5194/essd-2021-233-RC2 -
AC2: 'Reply on RC2', Anna-Maria Virkkala, 26 Nov 2021
The comment was uploaded in the form of a supplement: https://essd.copernicus.org/preprints/essd-2021-233/essd-2021-233-AC2-supplement.pdf
-
AC2: 'Reply on RC2', Anna-Maria Virkkala, 26 Nov 2021