The ABCﬂux database: Arctic–boreal CO 2 ﬂux observations and ancillary information aggregated to monthly time steps across terrestrial ecosystems

. Past efforts to synthesize and quantify the magnitude and change in carbon dioxide (CO 2 ) ﬂuxes in terrestrial ecosystems across the rapidly warming Arctic–boreal zone (ABZ) have provided valuable information but were limited in their geographical and temporal coverage. Furthermore, these efforts have been based on data aggregated over varying time periods, often with only minimal site ancillary data, thus limiting their potential to be used in large-scale carbon budget assessments. To bridge these gaps, we developed a standardized monthly database of Arctic–boreal CO 2 ﬂuxes (ABCﬂux) that aggregates in situ measurements of terrestrial net ecosystem CO 2 exchange and its derived partitioned component ﬂuxes: gross primary productivity and


Introduction
The Arctic-boreal zone (ABZ), comprising the northern tundra and boreal biomes, stores approximately half the global soil organic carbon pool (Hugelius et al., 2014;Tarnocai et al., 2009;Mishra et al., 2021). As indicated by this large carbon reservoir, the ABZ has acted as a carbon sink over the past millennia due to the cold climate and slow decomposition rates (Siewert et al., 2015;Hugelius et al., 2020;Gorham, 1991). However, these carbon stocks are increasingly vulnerable to climate change, which is occurring rapidly across the ABZ (Box et al., 2019). As a result, carbon is being lost from this reservoir to the atmosphere as carbon dioxide (CO 2 ) through increased ecosystem respiration (Reco) (Schuur et al., 2015;Parker et al., 2015;Voigt et al., 2017). The impact of increased CO 2 emissions on global warming depends on the extent to which respiratory losses are offset by gross primary productivity (GPP), the vegetation uptake of atmospheric CO 2 via photosynthesis (McGuire et al., 2016;Cahoon et al., 2016).
Carbon dioxide flux measurements provide a means to monitor the net CO 2 balance (i.e., net ecosystem exchange; NEE, a balance between GPP and Reco) across time and space (Baldocchi, 2008;Pavelka et al., 2018). There are three main techniques used to measure fluxes at the ecosystem level that represent fluxes from plants and soils to the atmosphere: eddy covariance, automated and manual chambers, and snow diffusion methods (hereafter diffusion; for a comparison of the techniques, see Table 1 in McGuire et al., 2012). The eddy covariance technique estimates NEE at the ecosystem scale (ca. 0.01 to 1 km 2 footprint) at high temporal resolution (i.e., 1/2 h) using nondestructive and automated measurements (Pastorello et al., 2020). Automated and manual chamber techniques measure NEE at fine spatial scales (< 1 m 2 ) and in small-statured ecosystems, common in the tundra, where the chambers can fit over the whole plant community (Järveoja et al., 2018;López-Blanco et al., 2017). The diffusion technique, also operating at fine spatial scales, can be used to measure the transport of CO 2 within a snowpack (Björkman et al., 2010b). The eddy covariance technique has been used globally for over three decades and chamber and diffusion techniques for even longer.
Historically, the number and distribution of ABZ flux sites has been rather limited compared to observations in temperate regions (Baldocchi et al., 2018). Due to these data gaps, quantifying the net annual CO 2 balance across the ABZ has posed a significant challenge (Natali et al., 2019a;McGuire et al., 2016;Virkkala et al., 2021a). However, over the past decade, the availability of ABZ flux data has increased substantially. Many, but not all, of the ABZ eddy covariance sites are a part of broader networks, such as the global FLUXNET and regional AmeriFlux, Integrated Carbon Observation System (ICOS) and the European Fluxes Database Cluster (EuroFlux), where data are standardized and openly available (Paris et al., 2012;Novick et al., 2018;Pastorello et al., 2020). These networks primarily include flux and meteorological data but do not often include other environmental descriptions such as soil carbon stocks, dominant plant species, or the disturbance history of a given site (but see, for example, Biological, Ancillary, Disturbance, and Metadata data in Ameriflux), which are important for understanding the controls on CO 2 fluxes. Moreover, even though some ABZ annual chamber measurements are included in the global soil respiration database (SRDB) (Jian et al., 2021), and in the continuous soil respiration database (COSORE) (Bond-Lamberty et al., 2020), standardized datasets providing ABZ CO 2 flux measurements from eddy covariance, chambers, and diffusion, along with comprehensive metadata, have been nonexistent. Such an effort would create potential for a more thorough understanding of ABZ CO 2 fluxes. Therefore, compiling these flux measurements and their supporting ancillary data into one database is clearly needed to support future modeling, remote sensing, and empirical data mining efforts.
Arctic-boreal CO 2 fluxes have been previously synthesized in a handful of regional studies (Belshe et al., 2013;182 A.-M. Virkkala et al.: The ABCflux database Table 1. A summary of past CO 2 flux synthesis efforts. If site numbers were not provided in the paper, this was calculated as the number of unique sets of coordinates. Note: n/a -not applicable.  McGuire et al., 2012;Luyssaert et al., 2007;Baldocchi et al., 2018;Virkkala et al., 2018Virkkala et al., , 2021aNatali et al., 2019a) ( Fig. 1 and Table 1). One of the main challenges in these previous efforts, in addition to the limited geographical coverage of ABZ sites and lack of environmental descriptions, has been the variability of the synthesized seasonal measurement periods. Most of these efforts have allowed the seasonal definitions and measurement periods to vary across the sites, creating uncertainty in the inter-site comparison of flux measurements. An alternative approach to define seasonality is to focus on standard time periods such as months (Natali et al., 2019a). Although focusing on monthly fluxes may result in a small decrease in synthesizable data, because publications, particularly older ones, often provide seasonal rather than monthly flux estimates (see, e.g., Euskirchen et al., 2012;Nykänen et al., 2003;Björkman et al., 2010a;Oechel et al., 2000;Merbold et al., 2009), compiling monthly fluxes has several advantages over the seasonal fluxes. These advantages include (i) better comparability of measurements, (ii) ability to bypass problems related to defining seasons across large regions, and (iii) ease of linking these fluxes to remote sensing and models. Our goal is to build upon past synthesis efforts and compile a new database of Arctic-boreal CO 2 fluxes (ABCflux version 1) that combines eddy covariance, chamber, and diffusion data at monthly timescales with supporting environmental information to help facilitate large-scale assessments of the ABZ carbon cycle. This paper provides a general description of the ABCflux database by characterizing the data sources and database structure (Sect. 2), as well as describing the characteristics of the database (Sect. 3). Additionally, we describe the main strengths, limitations, and opportunities of this database (Sect. 4) and its potential utility for future studies aiming to understand terrestrial ABZ CO 2 fluxes.  Belshe et al., 2013;Natali et al., 2019a;Virkkala et al., 2021a; and this study, ABCflux). The Arctic-boreal zone is highlighted in dark grey; countries are shown in the background. Based on the unique latitude-longitude coordinate combinations in the tundra, there were 136 tundra sites in ABCflux, 104 tundra sites in Virkkala et al. (2021a), 68 tundra sites in Natali et al. (2019a), 34 tundra sites in Belshe et al. (2013), and 66 tundra sites in McGuire et al. (2012). Observations that were included in previous studies but not in ABCflux represent fluxes aggregated over seasonal, not monthly periods.

Data and methods
ABCflux focuses on the area covered by the northern tundra and boreal biomes (> 45 • N), as characterized in Dinerstein et al. (2017), Fig. 2, and compiles in situ measured terrestrial ecosystem-level CO 2 fluxes aggregated to monthly time periods (unit: g C m −2 per month). We chose this aggregation interval as monthly temporal frequency is a common, straightforward, and standard interval used in many synthesis, modeling studies, remote sensing products, and process model output (Didan, 2015;Natali et al., 2019a;Hayes et al., 2014). Furthermore, scientific papers often report monthly fluxes, facilitating accurate extraction to ABCflux. We compiled only aggregated fluxes to allow easy usage of the database and to keep the database concise and cohesive. We designed this database so that these monthly fluxes, compiled from scientific papers or data repositories or contributed by site principal investigators (PIs), can be explored from as many sites as possible and across different months, regions and ecosystems. The database is not designed for studies exploring flux variability within a month, or how different methodological decisions (e.g., flux filtering or partitioning approaches) in- fluence the estimated fluxes. If a potential data user requires fluxes at higher temporal frequency or is interested to study the uncertainties related to flux processing, we suggest they utilize data from other flux repositories (see Sect. 2.1.2.) or contact PIs.
Although the three flux measurement techniques included in ABCflux primarily measure NEE, chamber and eddy covariance techniques can also be used to estimate GPP (the photosynthetic flux) and Reco (comprising emissions 184 A.-M. Virkkala et al.: The ABCflux database from autotrophic and heterotrophic respiration) (Keenan and Williams, 2018), which are also included in the database. At eddy covariance sites, GPP and Reco are indirectly derived from NEE using partitioning methods that primarily use light and temperature data (Lasslop et al., 2010;Reichstein et al., 2005). At chamber sites, Reco can be measured directly with dark chambers, from which GPP can be calculated by subtracting Reco from NEE (Shaver et al., 2007). In general, these partitioned GPP and Reco fluxes have higher uncertainties than the NEE measurements since they are modeled based on additional data and various assumptions (Aubinet et al., 2012). However, GPP and Reco fluxes were included in ABCflux because these component fluxes may help to better understand and quantify the underlying processes of landatmosphere CO 2 exchange.
In addition to CO 2 fluxes, we gathered information describing the general site conditions (e.g., site name, coordinates, vegetation type, disturbance history, a categorical soil moisture variable, and soil organic carbon stocks), micrometeorological and environmental measurements (e.g., air and soil temperatures, precipitation, soil moisture, snow depth), and flux measurement technique (e.g., measurement frequency, instrumentation, gap-filling and partitioning method, number of spatial replicates for chamber measurements, flux data quality), wherever possible.

Literature search
We identified potential CO 2 flux studies and sites from prior synthesis efforts (Belshe et al., 2013;McGuire et al., 2012McGuire et al., , 2018McGuire et al., , 2021bNatali et al., 2019a), including a search of citations within and of the studies included in these prior syntheses. We also conducted a literature search in Web of Science with the following search words: "carbon flux" or "carbon dioxide flux" or "NEE" or "net ecosystem exchange" and "arctic" or "tundra" or "boreal". This was done to ensure that our database included the most recent publications. We included studies that reported at least NEE, presented at monthly or finer temporal resolution, and had supporting environmental ancillary data describing the sites. We did not include fluxes reported at longer time steps (e.g., seasonal aggregations), which, based on our rough estimate, resulted in a 10 %-20 % loss of data from sites and periods that would have been new to ABCflux. These excluded data primarily included some older, non-active eddy covariance sites and seasonal chamber measurements (e.g., Nobrega and Grogan, 2008;Heliasz et al., 2011;Fox et al., 2008). However, many of these data were located in the vicinity of existing sites covered by ABCflux (e.g., Daring Lake, Abisko); thus excluding these measurements does not dramatically influence the geographical coverage of the sites. We extracted our variables of interest (Sect. 2.3.) from these selected papers during 2018-2020. Data from line and bar plots were extracted using Plot Digitizer (http://plotdigitizer.sourceforge.net/, last access: 16 October 2019) and converted to our flux units (g C m −2 per month) if needed. Data from experimental treatments were excluded; however, we included flux data from unmanipulated control plots. Monthly non-growing season fluxes from Natali et al. (2019a) were extracted from the recently published data compilation (Natali et al., 2019b). Winter chamber or diffusion measurements in forests from Natali et al. (2019b) were included in the "ground_nee" field, which represents forest understory (not whole-ecosystem) NEE.

Flux repositories
We downloaded eddy covariance and supporting environmental data products from AmeriFlux (Novick et al., 2018), Fluxnet2015 (Pastorello et al., 2020, EuroFlux database cluster (ICOS, Carbon Extreme, Carbo Africa, GHG Europe, Carbo Italy, INGOS) (Paris et al., 2012;Valentini, 2003), and Station for Measuring Ecosystem-Atmosphere Relations (Hari et al., 2013). Data that were filtered for US-TAR (i.e., low friction velocity conditions) and gap-filled were downloaded from repositories in 2018-2020. USTAR varied among sites due to differing site-level assumptions. We downloaded only gap-filled data that met the USTAR criteria for either the tower PI or given through the database processing pipeline. However, Fluxnet2015 provides several different methods for determining data quality based on different USTAR criteria. In this case, we used the Fluxnet2015 common USTAR threshold (CUT, i.e. all years at the site filtered with the same USTAR threshold; Pastorello et al., 2020). For observations extracted from EuroFlux, USTAR thresholds for each site were derived as described in Papale et al. (2006) and Reichstein et al. (2005), using nighttime data. We extracted fluxes readily aggregated to monthly intervals by the data processing pipeline from Fluxnet2015 and Eu-roFlux. These aggregations were not given in AmeriFlux and SMEAR. We downloaded daily gap-filled data from these repositories and summed the data to monthly time steps. We did not aggregate any repository GPP, Reco, or NEE datasets that were not gap-filled. If fluxes were available for the same site and period both in Natali et al. (2019b) and flux repository extractions, the flux repository observations were kept in the database. Some repositories supplied eddy covariance data version numbers, which were added to the flux database.

Permafrost Carbon Network data solicitation
A community call was solicited in 2018 through a CO 2 flux synthesis workshop Arctic Data Center, 2021), whereby the network of ABZ flux researchers was contacted and invited to contribute their most current unpublished eddy covariance and chamber data. This resulted in an additional 39 sites and 1372 monthly observations (see column extraction_source).

Partitioning approaches at eddy covariance flux sites
ABCflux compiles eddy covariance observations that were primarily partitioned using nighttime Reco, which is based on the assumption that during night, NEE measured at low light levels is equivalent to Reco (Reichstein et al., 2005). This nighttime partitioning approach has been the most frequently used approach to fill gaps in flux time series (Wutzler et al., 2018) due to its simplicity, strong evidence of temperature sensitivity of respiration, and direct use of Reco (i.e. nighttime NEE) flux data to estimate temperature response curves (Reichstein et al., 2005). As the nighttime approach was one of the first widely used partitioning approaches, fluxes partitioned with the approach were the only ones available in the flux repositories at some of the older sites. Daytime partitioning and other approaches started to develop more rapidly in the 2010s (Lasslop et al., 2010;Tramontana et al., 2020). Each of the partitioning approaches has uncertainties related to the ecological assumptions, input data, model parameters, and statistical approaches used to fill the gaps. PIs that submitted data to us directly gap-filled and partitioned fluxes using the approach that they determined works best at their site. Based on similar logic, fluxes extracted from papers were not always partitioned using the nighttime approach. In these cases, we trusted the expertise of PIs and authors and included fluxes partitioned using other methods. Although this created some heterogeneity in the flux processing algorithms in the database, this approach was chosen so that we could be more inclusive with the represented sites.
Thus, in summary, our goal was to compile fluxes that (1) can be easily compared with each other (i.e., have been gapfilled and partitioned in a systematic way), (2) are as accurate as possible given the site conditions and measurement setup (i.e., other approaches were accepted if this was suggested by the PI), and (3) summarize information about the processing algorithms used.

Data quality screening
We screened for poor-quality data, potential unit and sign convention issues, and inaccurate coordinates. Repository eddy covariance data were processed and quality checked using quality flags associated with monthly data supplied by the repository processing pipeline. Fluxnet2015 and EuroFlux database include a data quality flag for the monthly aggregated data indicating percentage of measured (quality flag QC = 0 in FLUXNET2015) and good-quality gap-filled data (quality flag QC = 1 in FLUXNET2015; average from monthly data; 0 = extensive gap-filling, 1 = low gap-filling); for more details see the Fluxnet2015 web page (https://fluxnet.org/data/ fluxnet2015-dataset/variables-quick-start-guide/last access: 7 October 2020) and Pastorello et al. (2020). Note that this quality flag field for the aggregated data differs from the ones calculated for half-hourly data derived directly from eddy covariance tower processing programs (such as Eddypro). We removed monthly data with a quality flag of 0. Data with quality flags > 0 were left within the database for the user to decide on additional screening criteria. Note that the monthly data produced by the repository processing pipeline do not include separate gap-filled percentages or errors of model fit for NEE similar to those associated with the half-hourly data. However, we included these fields to the database as PIs contributing data or scientific papers sometimes had this information; however these fields were not used in data quality screening. Both the monthly quality flag and gap-filled percentage fields describe the number and quality of the gapfilled data that needed to be filled due to, for example, instrument malfunction, power shortage, extreme weather events, and periods with insufficient turbulence conditions. At chamber and diffusion sites, we disregarded observations including a low number of temporal replicates within a month (< 3 individual measurements in summer months) and only one measurement month to ensure the temporal representativeness of the measurements. For the spring (March-May), autumn (September-November), and winter (December-February) months, one temporal replicate was accepted due to scarcity of measurements outside the summer season (June-August); measurement frequency is included in the database. We excluded monthly summertime measurements with < 3 temporal replicates because within summer months, meteorological conditions and the phenological status of the ecosystem can vary significantly Euskirchen et al., 2012;Schneider et al., 2012;Heiskanen et al., 2021), and a single measurement is unlikely to capture this variability. Our decision to exclude measurements that have only one measurement month was based on our goal to assess the temporal variability of fluxes. We justified the acceptance of a lower number of temporal replicates for the other seasons based on the assumption that flux variability is lower during the winter months, and at least during most of the spring and autumn months, due to the insulating effects of snow (Aurela et al., 2002;Bäckstrand et al., 2010). We estimate that excluding measurements with < 3 temporal replicates during the summer months resulted in a 10 % loss of data. In total, 98 % of the chamber observations were from published studies; we assume that the peer review process assessed the quality of published data.
We further screened for spatial coordinate accuracy by visualizing the sites on a map. If a given site was located in water or had imprecise coordinates, the site researchers were contacted for more precise coordinates. We screened for potential duplicate sites and observations that were extracted from different data sources. Duplicate NEEs extracted from papers that were also extracted from flux repositories were compared to estimate uncertainties associated with paper extractions using Plot Digitizer as a means for extracting monthly fluxes. A linear regression between paper (Plot Dig-

186
A.-M. Virkkala et al.: The ABCflux database itizer) and repository extraction showed that data extracted using Plot Digitizer were highly correlated with data from online databases, providing confidence in estimates extracted using Plot Digitizer (R 2 = 0.91, slope = 1.002, n = 192). Out of these duplicate observations, we only kept the data extracted from the repository in the database. Finally, we asked site PIs to verify that the resulting information was correct.

Database structure and columns
The resulting ABCflux database includes 94 variables: 16 are flux measurements and associated metadata (e.g., NEE, measurement date and duration), 21 describe flux measurement methods (e.g., measurement frequency, gap-filling method), 49 describe site conditions (e.g., soil moisture, air temperature, vegetation type), and 8 describe the extraction source (e.g., primary author or site PI, citation, data maturity). A total of 61 variables are considered static and thus do not vary with repeated measurements at a site (e.g., site name, coordinates, vegetation type), while 33 variables are considered dynamic and vary monthly (e.g., soil temperature). Table 2 includes a description of each of the 94 variables, as well as the proportion of monthly observations present in each column. ABCflux is shared as a comma separated values (csv) file with 6309 rows; however, not all the rows have data in each column (indicated by NA for character columns and −9999 for numeric columns).
We refer to all fields included in ABCflux as "observations" although we acknowledge that, for example, GPP and Reco are indirectly derived variables at eddy covariance sites and that some flux and ancillary data can also be partly gapfilled. Further, our database does not include the actual raw observations; rather it provides monthly aggregates. Positive values for NEE indicate net CO 2 loss to the atmosphere (i.e., CO 2 source), and negative numbers indicate net CO 2 uptake by the ecosystem (i.e., CO 2 sink). For consistency, GPP is presented as negative (uptake) values and Reco as positive.

Database visualization
The visualizations in this paper were made with the full ABCflux database using each site month as a unique data point (from now on, these are referred to as monthly observations) and the sites listed in the "study_id_short" field. We visualized these across the vegetation types ("veg_type_short"), countries ("country"), biomes ("biome"), and measurement method ("flux_method").
To understand the distribution and representativeness of monthly observations and sites across the ABCflux as well as the entire ABZ, we used geospatial data to calculate the aerial coverages of each vegetation type and country. Vegetation type was derived from the European Space Agency Climate Change Initiative's (ESA CCI) land cover product aggregated and resampled to 0.0083 • for the boreal biome (Lamarche et al., 2013) and the raster version of the Circum-polar Arctic Vegetation Map (CAVM) for the tundra biome resampled to the same resolution as the ESA CCI product (Raynolds et al., 2019). ESA CCI layers were reclassified by grouping land cover types to the same vegetation type classes represented by ABCflux: boreal wetland and peatland (from now on, boreal wetland; classes 160, 170, 180 in ESA CCI product), deciduous broadleaf forest (60-62), evergreen needleleaf forest (70-72), deciduous needleleaf forest (80-82), mixed forest (90), and sparse and mosaic boreal vegetation (40,100,100,120,121,122,130,140,150,151,152,153,200,201,202). Croplands (10,11,12,20,30) and urban areas (190) were removed. We used the five main physiognomic classes from CAVM in the tundra. Glaciers and permanent water bodies included in either of these products were removed. Note that in ABCflux and for the site-level visualizations in this paper, vegetation type for each of the flux sites was derived from site-level information, not these geospatial layers. These same glacier, water, and cropland masks were applied to the country boundaries (Natural Earth Data, 2021) to calculate the terrestrial area of each country. We further used TerraClimate annual and seasonal air temperature and precipitation layers averaged over 1989-2020 to visualize the distribution of monthly observations across the Arctic-boreal climate space (Abatzoglou et al., 2018).

General characteristics of the database
ABCflux includes 244 sites and 6309 monthly observations, out of which 136 sites and 2217 monthly observations are located in the tundra (54 % of sites and 52 % of observations from North America, 46 % and 48 % from Eurasia), while 108 sites and 4092 monthly observations are located in the boreal biome (59 % of sites and 58 % of observations from North America, 41 % and 42 % from Eurasia) ( Table 3). The largest source of flux data is the flux repositories (48 % of the monthly observations), while flux data extracted from papers or contributed by site PIs amount to 30 % and 22 % of the monthly observations, respectively. The database primarily includes sites in unmanaged ecosystems, but it does contain a small number (6) of sites in managed forests.
The majority of observations in ABCflux have been measured with the eddy covariance technique (119 sites and 4957 monthly observations), whereas chambers and diffusion methods were used at 125 sites and 1352 observations (Table 3). About 46 % of the eddy covariance measurements are based on gas analyzers using closed-path technology (including enclosed analyzers), 40 % are based on open-path technology, 5 % include both and 8 % are unknown. A total of 52 % of chamber measurements were automated chambers (monitoring the fluxes continuously throughout the growing season). Only 3 % of the measurements were completed using diffusion methods during the winter. Chamber and diffusion studies were primarily from  Convention: −ve is uptake, +ve is loss. Chamber measurements from (primarily rather treeless) wetlands are included in the NEE_gC_m2 column.

% ground_gpp
Forest floor ecosystem respiration, measured with chambers (g C−CO 2 m −2 for the entire measurement interval) Report as −ve flux. Chamber measurements from (primarily rather treeless) wetlands are included in the GPP_gC_m2 column.

% ground_reco
Forest floor gross primary productivity, measured with chambers (g C−CO 2 m −2 for the entire measurement interval) Report as +ve flux. Chamber measurements from (primarily rather treeless) wetlands are included in the Reco_gC_m2 column.    the tundra and the sparsely treed boreal wetlands, but a few studies with ground surface CO 2 fluxes from forests (i.e., capturing the ground cover vegetation and not the whole ecosystem) are also included in their own fields so that they can not be mixed up with ecosystem-scale measurements ("ground_nee", "ground_gpp", "ground_reco"). Further, a few soil CO 2 flux sites measuring fluxes primarily on unvegetated surfaces during the non-growing season are included in the database ("rsoil"). These were included in the database because ground surface or soil fluxes during the non-growing season can be of similar magnitude to the ecosystem-level fluxes when trees remain dormant Hermle et al., 2010). Therefore, these ground or soil fluxes could potentially be used to represent ecosystem-level fluxes during some of the non-growing season months. However, we did not make an extensive literature search for these observations, rather we compiled observations if they came up in our NEE search. Therefore, the data in these ground surface and soil flux columns represent only a portion of such available data across the ABZ. The geographical coverage of the flux data is highly variable across the ABZ, with most of the sites and monthly observations coming from Alaska (37 % of the sites and 28 % of the monthly observations), Canada (19 % and 29 %), Finland (7 % and 15 %), and Russia (14 % and 13 %) (Fig. 3). The sites cover a broad range of vegetation types but were most frequently measured in evergreen needleleaf forests (23 % of the sites and 37 % of the monthly observations) and wetlands in the tundra or boreal zone (30 % and 27 %) (Fig. 4). The northernmost and southernmost ecosystems had fewer sites and observations than more central ecosystems (barren tundra: 45 % of the sites and 3 % of the monthly observations, prostrate shrub: 2 % and < 1 %, deciduous broadleaf forest: 1 % and 3 %, deciduous needleleaf forest: 5 % and 4 %, mixed forest < 1 % and < 1 %). The sites in ABCflux cover the most frequent climatic conditions across the Arcticboreal zone relatively well; however, conditions with high precipitation and low temperatures are lacking sites (Fig. 5). ABCflux includes sites experiencing various types of disturbances, with the majority of disturbed sites encountering fires (24 sites and 901 monthly observations), thermokarst (4 sites and 113 monthly observations), or harvesting (6 sites and 258 monthly observations). However, ABCflux is dominated by sites in relatively undisturbed environments or sites lacking disturbance information (only 20 % of the sites and 30 % of the monthly observations include disturbance information).
ABCflux spans a total of 31 years , but the largest number of monthly observations originate from 2000-2015 (80 % of the data) (Fig. 6). The reason for a decrease in flux data over 2015-2020 is likely related to a reporting lag, not a decrease in flux sites and records. The largest number of measurements were conducted during the summer (June-August; 32 %) and the least during the winter (November-February; 18 %) (Figs. 5 and 6). The overall eddy covariance data quality and gap-filled data percentage were lowest during the winter compared to other seasons (0.76 compared to 0.8-0.85 for overall data quality, 0 = extensive gap-filling, 1 = low gap-filling; 69 % compared to 47 % to 59 % for gap-filled data percentage).

Coverage of ancillary data
All of the observations in ABCflux include information describing the site name, location, vegetation type, NEE, mea-surement technique (eddy covariance/chamber/diffusion), and how the data were compiled (Table 2). Details about the measurement technique (e.g., open or closed-path eddy covariance, manual or automated chambers) are included in 93 % of sites and 93 % of monthly observations. Most of the monthly observations further include information about permafrost extent (67 % of the sites and 72 % of the monthly observations) or soil moisture state (47 % of the sites and 56 % of the monthly observations). Data describing air temperature, soil temperature, precipitation, and soil moisture are included in 71, 73, 37, and 35 % of monthly observations, respectively. Some ancillary variables have low data coverage, such as soil organic carbon stocks (16 % of the monthly

Coverage and distribution of flux data
There are 110 sites and 4290 monthly observations for GPP, 121 sites and 4603 monthly observations for Reco, and 212 sites and 5759 monthly observations for NEE in ABCflux. Monthly values range from −2 to −516 g C m −2 per month for GPP, from 0 to 550 g C m −2 per month for Reco, and from −376 to 95 g C m −2 per month for NEE (Table 4). NEE is typically negative during the summer (i.e., net CO 2 sink) and mostly positive during other seasons (i.e., net CO 2 source) (Fig. 7). Out of all site and year combinations, annual cumulative NEE (the sum of monthly NEE values for each year and site) can be calculated for 267 site years. An average annual NEE calculated based on the site-level averages from 1995 to 2020 is −27.9 g C m −2 yr −1 (SD 85.4) for the entire region, −35.5 g C m −2 yr −1 (SD 93.7) for the boreal biome, and −3.3 g C m −2 yr −1 (SD 44.2) for the tundra. However, these averages do not account for the spatial or temporal distribution of the observations and therefore represent coarse summaries of the database.

Strengths, limitations, and opportunities
ABCflux provides several opportunities for an improved understanding of the ABZ carbon cycle. It can be used to calculate both short-and longer-term monthly, seasonal, or annual flux summaries for different regions, or it can be combined with remote sensing and other gridded datasets to build monthly statistical and process-based models for CO 2 flux upscaling. ABCflux can further be utilized to study the interand intra-annual CO 2 flux variability resulting from climate and environmental change. The site distribution in ABCflux can also be used to evaluate the extent of the current flux network and identify under-sampled regions. From a methodological perspective, data users can compare fluxes estimated with the different measurement techniques which can help understand the uncertainties associated with individual techniques. However, there are also some uncertainties that the data user should be aware of when using ABCflux, which we describe below.

Comparing fluxes estimated with different techniques
The ABCflux database comprises aggregated observations using eddy covariance, chamber, and diffusion methods. These methods measure CO 2 fluxes at different spatiotemporal resolutions and are based on different assumptions. The eddy covariance technique is currently the primary method to monitor long-term trends in ecosystem CO 2 fluxes (Baldocchi et al., 2018;Baldocchi, 2008), and the majority of observations in ABCflux (79 %) have been made using the technique. Transforming high-frequency eddy covariance measurements to budgets includes several processing steps that can, without harmonization and standardization of these steps (Baldocchi et al., 2001;Pastorello et al., 2020), lead to highly different budget estimates (Soloway et al., 2017). It is also important to acknowledge that the extent and size of the tower footprint differs across the sites due to differences in the height of the tower and the direction and magnitude of the wind (Chu et al., 2021). When fluxes are aggregated over longer time periods to cumulative budgets, one generally assumes the tower footprint remains relatively constant, capturing fluxes from a similar part of the ecosystem (i.e., the assumption that monthly observations within one site in ABCflux can be reliably compared with each other), but note that at shorter time periods this might not be the case (Pirk et al., 2017;Chu et al., 2021). The different gas analyzer technologies also play an important role for the fluxes estimated with the eddy covariance technique. Sites located in the most northern and remote parts of the ABZ experience a drop in irradiation during autumn and winter which limits solar power availability for eddy covariance measurements. Closed-path systems require more power to run than open-path sensors, but open-path sensors are known to have larger uncertainties. For example, openpath eddy covariance sensors have been shown to incorrectly estimate NEE due to the self-heating effect of the analyzer, which can result in systematically higher net CO 2 uptake compared to closed-path sensors (Kittler et al., 2017a); however, this pattern was not clearly observed in ABCflux when across-site comparisons were made. Furthermore, wintertime fluxes indicating CO 2 uptake can be erroneous due to the limited ability of the gas analyzer to resolve very high frequency turbulent eddies (Jentzsch et al., 2021). Recently, some types of open-path infrared gas analyzers have been found to be prone to biases in NEE that scale with sensible heat fluxes in all seasons rather than with self-heating (Wang et al., 2017;Helbig et al., 2016).
While using eddy covariance to estimate small-scale spatial variability in NEE is challenging (McGuire et al., 2012), this can be accomplished with chamber and diffusion techniques. Chamber measurements can be done in highly heterogeneous environments as long as chamber closure can be guaranteed; however, most of the chamber measurements in ABCflux have been conducted in relatively flat and homogeneous graminoid-and wetland-dominated vegetation types. Most chamber sites in ABCflux include ca. 10-20 individual plots in total from ca. 3-5 land cover types where fluxes are being measured (Virkkala et al., 2018). Chambers can also provide more direct estimates of Reco and GPP relative to eddy covariance-derived fluxes and are therefore useful for estimating the magnitude and range of those component fluxes. However, manual chamber and diffusion measurements are laborious and have limited temporal representation, particularly during the non-growing season when they often have only one monthly temporal replicate in ABCflux Table 4. Mean and standard deviation of monthly observations of net ecosystem exchange (NEE), gross primary productivity (GPP), and ecosystem respiration (Reco) in g C m −2 per month. Seasons were defined based on the climatological definition (autumn: September-November; winter: December-February; spring: March-May; summer: June-August). Positive numbers for NEE indicate net CO 2 loss to the atmosphere (i.e., CO 2 source), and negative numbers indicate net CO 2 uptake by the ecosystem (i.e., CO 2 sink). For consistency, GPP is presented as negative values and Reco as positive. Some sites compute only NEE and, consequently, NEE summaries might not entirely match with GPP and Reco statistics.

Biome
Climatological  (McGuire et al., 2012;Fox et al., 2008). Automated chamber measurements during the non-growing season are also rare in ABCflux. Furthermore, uncertainty around gap-filled monthly chamber fluxes is presumably larger than that of the eddy covariance because of the low temporal replication of chamber measurements. Manual chamber measurements might, for example, be conducted during a limited period which does not cover the range of meteorological and phenological conditions within a month. Additional uncertainties in chamber measurements include, for example, accurate determination of chamber volume, pressure perturbations, temperature increase during the measurement, and collars disturbing the ground and causing plant root excision. Because of these methodological differences across the eddy covariance, chamber and diffusion techniques, comparing fluxes between the methods may result in inconsistencies (Fig. 7). It has been shown that chamber measurements can be both larger or smaller than the fluxes estimated with eddy covariance (Phillips et al., 2017). This difference can be related to the uncertainties with the eddy covariance or chamber technique as described above. The differences can also be due to the mismatch between the chamber and tower footprints (< 1 m vs. 250-3000 m radii over the measurement equipment, respectively) and the difficulty of extrapolating local chamber measurements to landscape scales (Marushchak et al., 2013;Fox et al., 2008). However, several studies have also shown good agreement across the eddy covariance and chamber measurements (Laine et al., 2006;Wang et al., 2013;Eckhardt et al., 2019;Riutta et al., 2007). Potential mismatches may also be due to a bias towards daytime measurements in manual chamber measurements (see field "diurnal_coverage"). During daytime, plants are actively photosynthesizing whereas respiration is the dominant flux at night (López-Blanco et al., 2017). Presumably because of these day vs. nighttime differences, we observed stronger sink strength in manual chamber measure-ments compared to other flux measurements in ABCflux, even though eddy covariance measurements have also been observed to underestimate nighttime CO 2 loss. This underestimation in nighttime eddy covariance measurements is due to suppressed turbulent exchange linked to stable atmospheric stratification and systematic biases due to horizontal advection (Aubinet et al., 2012). Despite these uncertainties, including fluxes estimated with all of these techniques into one database improves the understanding of underlying variability of landscape-scale flux estimates. Indeed, there are roughly 10 sites in ABCflux that include both eddy covariance and chamber/diffusion measurements conducted at the same time. These observations might not have identical site coordinates, but they are often very close to each other (< 500 m away from each other). Including multiple methods from the same site provides an opportunity to compare estimates from different methods over a larger number of sites.

Uncertainties in eddy covariance flux partitioning
Monthly Reco and GPP fluxes derived from eddy covariance were primarily estimated using nighttime partitioning (Reichstein et al., 2005). Focusing on nighttime partitioning ensured that data from older sites using this partitioning method could be included and that most of the fluxes were standardized using one common partitioning method. However, particularly at sites at higher latitudes of the ABZ, low-light nighttime conditions are restricted to rather short periods during summer, limiting the database for assessing Reco rates and therefore increasing uncertainties associated with the nighttime partitioning (López-Blanco et al., 2020). Recent research suggests that other methods such as daytime partitioning (Lasslop et al., 2010), and even more recently artificial neural networks (ANN) (Tramontana et al., 2020), might be more accurate methods for flux partitioning by addressing the assumptions from nighttime partition- Figure 7. The distribution of net ecosystem exchange (NEE; a, b), gross primary productivity (GPP; c, d), and ecosystem respiration (Reco; e, f) across the months and biomes, colored by the flux measurement technique. Positive numbers for NEE indicate net CO 2 loss to the atmosphere (i.e., CO 2 source), and negative numbers indicate net CO 2 uptake by the ecosystem (i.e., CO 2 sink). For consistency, GPP is presented as negative values and Reco as positive. The boxes correspond to the 25th and 75th percentiles. The lines denote the 1.5 IQR of the lower and higher quartile, where IQR is the inter-quartile range, or distance between the first and third quartiles. There is not much chamber data from the boreal regions as they capture NEE only at treeless wetlands.
ing methods (Pastorello et al., 2020;Papale et al., 2006;Reichstein et al., 2005;Keenan et al., 2019). Specifically, the assumption of a constant diel temperature sensitivity during night-and daytime might introduce error in eddy covariancebased Reco estimates extrapolated from nighttime measurements (Järveoja et al., 2020;Keenan et al., 2019). It should be noted that ABCflux database used nighttime partitioning of fluxes extracted from repositories for consistency; however, fluxes contributed by some databases, PIs or extracted from papers may be based on other partitioning methods, as noted in the database. In a few cases, observations from the same site were based on different partitioning methods, which limits the usage of data at those sites for time-series exploration. These different gap-filling and partitioning ap-proaches can impact the magnitude of monthly CO 2 budgets. For example, a study comparing four gap-filling methods in a boreal forest showed that the 14-year average annual NEE budget varied from 4 to 48 g C m −2 yr −1 depending on the gap-filling approach (Soloway et al., 2017). However, a comparison of multiple gap-filling and partitioning methods across sites showed that variation in annual GPP and Reco between partitioning methods was small (Desai et al., 2008), which provides confidence in estimates from partitioned GPP and Reco components from the differing methods used in this database.
Any one choice in gap-filling and partitioning introduces uncertainties, and to understand and minimize those uncertainties remains an important research priority. However, since this database was not designed for detailed explorations of how the different gap-filling and partitioning approaches influence fluxes, we recommend users interested in those to access these data in flux repositories or contact site PIs. Fluxes calculated using multiple gap-filling techniques may be considered in the next versions of ABCflux. We further suggest data users remain cautious when using ABCflux data to understand mechanistic relationships between meteorological variables and fluxes, as the gap-filled and partitioned monthly fluxes already include some information about, for example, air or soil temperatures and light conditions. To completely avoid circularity in these exploratory analyses, we recommend data users download the original and nongap-filled NEE records, or download fluxes partitioned in a way that is consistent and biologically relevant for the particular research question from flux repositories.

Representativeness and completeness of the data
The ABCflux database site distribution covers all vegetation types and countries within the ABZ. However, there are regional and temporal biases in the database due to the differences in accessibility for sampling certain regions (also documented in Virkkala et al., 2019, andPallandt et al., 2021). As a result, the number of monthly observations does not always correlate with the size of the country/region or vegetation type. For example, Russia and Canada cover in total ca. 80 % of the ABZ but include only ca. 40 % of the monthly observations. While the distribution of these measurements is rather balanced between the Russian tundra and boreal biomes, Canadian observations are primarily located in the boreal biome, largely due to the high number of measurements conducted as part of the NASA Boreal Ecosystem-Atmosphere Study (Sellers et al., 1997). Deciduous needleleaf (i.e., larch) forests, the primary vegetation type in central and eastern Siberia, has the smallest number of data compared to its area (< 5 % of monthly observations vs. > 20 % coverage of the ABZ). Additional data gaps are located in barren and prostrate-shrub tundra and sparse boreal vegetation, as well as in areas with high precipitation. Eddy covariance towers in mountainous regions are also rare (Pallandt et al., 2021) as eddy covariance towers are most often set up over homogeneous and flat terrains to avoid advection (Baldocchi, 2003;Etzold et al., 2010). Alaska and Finland cover < 10 % of the ABZ but include > 40 % of the monthly observations.
There are differences in environmental coverage of ABCflux depending on the measured flux, measurement year, and the measurement season. Sites with NEE observations have the largest geographical coverage, with less availability for partitioned GPP and Reco fluxes. Therefore, regional summaries of Reco and GPP do not sum up to NEE. Moreover, although the oldest records in ABCflux originate from 1989, observations from the 1990s are primarily located in a few boreal or Alaskan tundra sites. The measurement records from tundra sites are shorter than boreal sites over the full time span of the database, and it is therefore more uncertain to investigate long-term temporal changes in tundra fluxes. Finally, the lowest number of flux data in ABCflux is during winter, which is the most challenging period for data collection in high latitudes (Kittler et al., 2017b;Jentzsch et al., 2021). Autumn and winter data included in ABCflux further cover a smaller Arctic-boreal climate space, with no data coming from extremely cold or wet conditions (Fig. 5).
Fluxes are generally small during this period (Natali et al., 2019a), leading to higher relative uncertainties in flux estimation compared to other seasons. These regional and temporal biases need to be considered in future analyses to assure the robustness of our understanding of carbon fluxes across the ABZ.
Although ABCflux includes a comprehensive compilation of flux and supporting environmental and methodological information, the information is not exhaustive. We acknowledge that this database is missing some eddy covariance sites that were recently summarized in a tower survey (see preliminary results in https://cosima.nceas.ucsb.edu/ carbon-flux-sites/, last access: 12 February 2020), because these data were unavailable at the time of database compilation. Moreover, the overall quality or the gap-filled percentage of the eddy covariance observations is not reported for each eddy covariance site, limiting the potential to explore the effects of data quality on fluxes across all the eddy covariance sites. Comparing soil temperature or moisture across sites has uncertainties due to differences in sensor depths, which are not always reported in the database. We hope to improve and increase the flux and supporting data in the future as new data are being collected, for example, by leveraging the ONEflux pipeline and its different outputs (Pastorello et al., 2020), as well as aggregating new measurements that are not part of any networks. fully public but should be appropriately referenced by citing this paper and the database (see Sect. 6). We suggest that researchers planning to use this database as a core dataset for their analysis contact and collaborate with the database developers and relevant individual site contributors.

Conclusions
ABCflux provides the most comprehensive database of ABZ terrestrial ecosystem CO 2 fluxes to date. It is particularly useful for future modeling, remote sensing, and empirical studies aiming to understand CO 2 budgets and regional variability in flux magnitudes, as well as changes in fluxes through time. It can also be used to understand how different environmental conditions influence fluxes and to better understand the current extent of the flux measurement network and its representativeness across the Arctic-boreal region.
Author contributions. The ABCflux database was conceptualized and developed by a team led by SMN, BMR, JDW, MM, AMV, and EAGS, with additional comments from OS. KS and SJC compiled the data, with contributions from AMV, MM, DP, CM, and JN, and data screening by AMV and SMN. AMV drafted and coordinated the manuscript in close collaboration with SMN, BMR, JDW, KS, and MM. All authors contributed to the realization of the ABCflux database and participated in the editing of the manuscript. PIs whose data were extracted from publications are not coauthors in this paper, unless new data were provided, but their contact details can be found in the database.
Competing interests. The contact author has declared that neither they nor their co-authors have any competing interests.
Disclaimer. Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. Financial support. This research has been supported by the National Aeronautics and Space Administration (grant nos. NNX17AE13G, NNX15AT81A, NNH17ZDA001N, NNX15AT74A, and NNX16AF94A), the Gordon and Betty