A global CO2 flux dataset (2015–2019) inferred from OCO-2 retrievals using the Tan-Tracker inversion system

Accurate assessment of the various sources and sinks of carbon dioxide (CO2), especially terrestrial ecosystem and 15 ocean fluxes with high uncertainties, is important for understanding of the global carbon cycle, supporting the formulation of climate policies, and projecting future climate change. Satellite retrievals of the column-averaged dry air mole fractions of CO2 (XCO2) are being widely used to improve carbon flux estimation due to their broad spatial coverage. However, there is no consensus on the robust estimates of regional fluxes. In this study, we present a global and regional resolved terrestrial ecosystem carbon flux (NEE) and ocean carbon flux dataset for 2015–2019. The dataset was generated using the Tan-Tracker 20 inversion system by assimilating Observing Carbon Observatory 2 (OCO-2) column CO2 retrievals. The posterior NEE and ocean carbon fluxes were comprehensively validated by comparing posterior simulated CO2 concentrations with OCO-2 independent retrievals and Total Carbon Column Observing Network (TCCON) measurements. The validation showed that posterior carbon fluxes significantly improved the modelling of atmospheric CO2 concentrations, with global mean biases of 0.33 ppm against OCO-2 retrievals and 0.12 ppm against TCCON measurements. We described the characteristics of the 25 dataset at global, regional, and Tibetan Plateau scales in terms of the carbon budget, annual and seasonal variations, and spatial distribution. The posterior 5-year annual mean global atmospheric CO2 growth rate was 5.35 PgC yr, which was within the uncertainty of the Global Carbon Budget 2020 estimate (5.49 PgC yr). The posterior annual mean NEE and ocean carbon fluxes were –4.07 and –3.33 PgC yr, respectively. Regional fluxes were analysed based on TransCom partitioning. All 11 land regions acted as carbon sinks, except for Tropical South America, which was almost neutral. The strongest carbon sinks 30 were located in Boreal Asia, followed by Temperate Asia and North Africa. The entire Tibetan Plateau ecosystem was estimated as a carbon sink, taking up –49.52 TgC yr on average, with the strongest sink occurring in eastern alpine meadows. These results indicate that our dataset captures surface carbon fluxes well and provides insight into the global carbon cycle. The dataset can be accessed at https://doi.org/10.11888/Meteoro.tpdc.271317 (Jin et al., 2021). https://doi.org/10.5194/essd-2021-210 O pe n A cc es s Earth System Science Data D icu ssio n s Preprint. Discussion started: 1 July 2021 c © Author(s) 2021. CC BY 4.0 License.


1
Introduction 35 Anthropogenic carbon dioxide (CO2) emissions are dominant sources for increasing atmospheric CO2 concentrations. Total emissions are partitioned among the atmosphere, land, and ocean. During the past decade (2010-2019), 31 % and 23 % of emissions were absorbed by the land and ocean, respectively (Friedlingstein et al., 2020). However, there is little agreement on the magnitude and interannual variability of land and ocean fluxes at regional scales (Hauck et al., 2020;Piao et al., 2020;Eldering et al., 2017;Piao et al., 2009). Precise knowledge of the various reservoirs within the carbon cycle is a prerequisite 40 to monitor flux changes and make well-informed projections under future climate change (Zscheischler et al., 2017).

Atmospheric inversions provide important insight into the global carbon cycle through absorbing atmospheric CO2
observations. Many inversion studies have utilized high-precision in situ and flask observations to infer surface CO2 fluxes (Peters et al., 2007;Chevallier et al., 2010;Lauvaux et al., 2016). Although this is a useful way to quantify carbon fluxes at large scales, estimate uncertainty grows quickly as the scale shrinks due to the lack of observations outside of North America 45 and Europe (Crowell et al., 2019;Byrne et al., 2017;Peylin et al., 2013). To improve spatial coverage and fill observation gaps of in situ CO2 network, column-averaged CO2 mole fraction (XCO2) observations from satellites have been extensively applied to flux inversions. The Greenhouse gases Observing SATellite (GOSAT) was the first dedicated greenhouse gas satellite (Yokota et al., 2009), followed by Orbiting Carbon Observatory 2 (OCO-2) (Crisp et al., 2004). Inversions using GOSAT retrievals can significantly reduce the uncertainty of flux estimates in regions where surface CO2 observations are sparse; 50 however, no agreement is achieved on the estimates of regional fluxes (Chevallier et al., 2014;Basu et al., 2013;Takagi et al., 2011). Compared with GOSAT, OCO-2 has higher spatial resolution, broader spatial coverage, and collects more measurements per day (Eldering et al., 2017). Its higher measurement signal-to-noise ratio allows for higher-precision XCO2 retrievals, and its higher spatial sampling density facilitates validation using the ground-based Total Carbon Column Observing Network (TCCON) (Liang et al., 2017;Wunch et al., 2017). Based on these characteristics, OCO-2 may provide better 55 constraints on surface CO2 fluxes inversions (Basu et al., 2018). Chevallier et al. (2019) indicated that large-scale annual fluxes estimated from bias-corrected OCO-2 land retrievals are within the uncertainty of fluxes estimated from the surface network. Villalobos et al. (2020) performed idealized flux inversion experiments over Australia using OCO-2 retrievals and found that the integrated flux uncertainty was greatly reduced and flux inversions at the unusually fine scale yielded useful information on the carbon cycle at continental and finer scales. With the refinement of instrument error characterization, retrieval 60 algorithms, and bias correction procedures, the accuracy and precision of satellite-retrieved XCO2 have improved significantly (O'dell et al., 2018;Kiel et al., 2019). The effectiveness and potential of these constantly updated satellite retrievals for inferring surface CO2 fluxes requires further and persistent investigation.
In the global carbon cycle, the response of carbon fluxes in the Tibetan Plateau ecosystems play an important role (Nieberding et al., 2020;Li et al., 2016). The plateau's crucial role is almost entirely due to its height, with an area of 2.5 65 million km 2 at an average elevation exceeding 4000 m above sea level (Qiu, 2008). The temperature on the Tibetan Plateau has increased by 0.35 °C per decade from 1970 to 2014, at a pace that is about three times the observed rate of global warming, resulting in significant permafrost thawing and glacier retreat (Yao et al., 2019). These warming-induced environmental issues may enhance the productivity of grasslands and soil respiration, as well as substantial emissions of old CO2 (Ding et al., 2017;Du et al., 2018). However, there are many uncertainties in the Tibetan Plateau carbon cycle under climate change, which are 70 further amplified by human activities (Chen et al., 2013;Piao et al., 2019). Previous studies on Tibetan Plateau carbon fluxes mainly relied on ground-based measurements; the application of satellite data is promising to compensate for ground-based findings and reveal new information about the carbon cycle.
In this study, we used the Tan-Tracker inversion system (Tian et al., 2014a(Tian et al., , 2014b to generate a global dataset containing terrestrial ecosystem carbon fluxes and ocean carbon fluxes from 2015 to 2019 by assimilating OCO-2 XCO2 retrievals. The 75 Tan-Tracker inversion system adopts a dual-pass inversion strategy and the nonlinear least squares four-dimensional variational data assimilation (NLS-4DVar) method (Tian and Feng, 2015;Tian et al., 2018). The dual-pass strategy well distinguishes simulated CO2 errors from uncertainties in both the initial CO2 concentrations and carbon fluxes. The NLS-4DVar method comprising a fast localization module  is able to assimilate large amounts of observations within an inversion window and obtain high accuracy inversion with low computational complexity. Compared 80 with previous versions of Tan-Tracker, we improved the dual-pass strategy of Tian et al. (2014a), embedded the upgraded ensemble update scheme (Tian et al., 2020) to guarantee the system's stable operation, and updated the GEOS-Chem transport model (Nassar et al., 2010) to v12.9.3 (http://www.geos-chem.org). In the dataset, we present optimized global daily gridded terrestrial ecosystem and ocean carbon fluxes with a spatial resolution of 2° latitude × 2.5° longitude. Corresponding fluxes from other sectors are also available (e.g., fossil fuel emissions, biomass burning emissions) to infer the global carbon budget. 85 Besides, global model-simulated CO2 concentrations forced by optimized fluxes are provided. The optimized fluxes were comprehensively validated against independent OCO-2 retrievals and TCCON XCO2 observations. Data users can calculate the preferred carbon fluxes at a global or regional scale according to their needs for further analysis or comparison with other sources of flux information. This paper is organized as follows: Sect. 2 describes the methods and data employed; Sect. 3 and 4 describe the dataset 90 and its characteristics, respectively; Sect. 5 evaluates the posterior fluxes against OCO-2 retrievals and TCCON observations; Sect. 6 discusses the strengths and limitations of the method and dataset, as well as future developments; Sect. 7 introduces the data availability; the conclusion is presented in Sect. 8. Figure 1 shows the frame of the Tan-Tracker inversion system. The fluxes to be optimized are terrestrial ecosystem carbon fluxes (i.e., net ecosystem exchange, NEE) and atmosphere-ocean carbon exchange, whereas other fluxes (e.g., fossil fuel emissions and biomass burning emissions) are assumed to have no errors. This assumption is commonly adopted in global flux inversion systems (Peters et al., 2007;Nassar et al., 2011;Crowell et al., 2019;Jiang et al., 2021). The core of Tan-Tracker is https://doi.org/10.5194/essd-2021-210 a novel dual-pass inversion strategy with modifications to Tian et al. (2014aTian et al. ( , 2014b and the NLS-4DVar method (Tian and 100 Feng, 2015;Tian et al., 2018). This dual-pass strategy performs assimilations twice within an inversion window. First, the CO2 assimilation pass is performed using the prior fluxes within a short time (5 days) to obtain the optimized initial CO2 concentrations; then, the flux assimilation pass is performed using the optimized initial CO2 concentrations within a longer time (14 days) to obtain the optimized fluxes. This strategy well distinguishes the CO2 simulation deviation caused by errors from initial CO2 concentrations and carbon fluxes. When assimilation of the current inversion window is complete, the prior 105 initial CO2 concentrations before assimilation and the posterior fluxes are used to actuate the atmospheric chemistry transport model, which simulates the period from the initial moment of the current window to the initial moment of the next window to obtain the initial CO2 concentrations of the next window. Then, the previous steps are repeated, performing cyclic data assimilation.

Tan-Tracker inversion system 95
The Tan-Tracker optimization procedure is realized using the NLS-4DVar method (Tian and Feng, 2015;Tian et al., 110 2018). The optimization minimizes the following cost function: where is the state variable (i.e., the variable to be optimized) and is the prior state variable, is the prior error covariance matrix, is the observed CO2 concentrations, ℎ(•) is the observation operator, and is the observation error covariance matrix. The state variable of the CO2 assimilation pass is CO2 concentrations, whereas that of the flux assimilation 115 pass is the scaling factor, that is: where , is the posterior carbon flux, , is the posterior scaling factor, and , is the prior carbon flux, = 1,2, … , , = 1,2, … , , and I and J denote the numbers of grids in longitude and latitude, respectively. Both of fluxes and scaling factors are gridded variables. 120 The Tan-Tracker inversion system uses the global three-dimensional (3D) atmospheric chemistry transport model GEOS-Chem to simulate CO2 concentrations in the atmosphere (Suntharalingam et al., 2004;Nassar et al., 2010. The spatial resolution of GEOS-Chem is 2° latitude × 2.5° longitude, with 47 layers in the vertical direction. The model is driven by the various carbon fluxes, initial CO2 concentrations, and Modern-Era Retrospective analysis for Research and Applications 2 (MERRA-2) meteorological data provided by the Goddard Earth Observing System (GEOS) of the National Aeronautics and 125 Space Administration (NASA) Global Modeling and Assimilation Office (Gelaro et al., 2017). The observations assimilated are OCO-2 column CO2 (XCO2) retrievals (O'dell et al., 2012; O'dell et al., 2018). In this study, we optimized the global NEE and atmosphere-ocean carbon fluxes during 2015-2019 using the Tan-Tracker inversion system. The configurations of Tan-Tracker are described in Table 1.

Prior CO2 fluxes
The various carbon fluxes required to drive GEOS-Chem to simulate atmospheric CO2 concentrations included NEE, ocean carbon fluxes, fossil fuel emissions, biomass burning emissions, ship emissions, aviation emissions, biofuel burning emissions, 135 and chemical production of CO2 ( Table 2). The 3D chemical source of CO2 includes the oxidation of CO, CH4 and other carbon gases. As fossil fuel emission and biospheric inventories used to count CO2 precursor species (CO, CH4 and other carbon gases) as direct CO2 emissions at the surface, whereas real chemical production of CO2 from these species occurs at different times and locations from emission, model implementation of CO2 chemical production requires adjustments to these surface emission inventories (Nassar et al., 2010). 140 The optimized fluxes were NEE and ocean-atmosphere exchange. The prior NEE was obatined from CarbonTracker CT2019B (Jacobson et al., 2020), which provides data from January 2000 until March 2019, with a spatial resolution of 1° × 1° and temporal resolution of 3 h. CarbonTracker is a CO2 measurement and modeling system developed by the National Oceanic and Atmospheric Administration (NOAA) that tracks CO2 sources and sinks worldwide. CarbonTracker uses groundbased atmospheric CO2 observations from a host of collaborators and simulates atmospheric transport to estimate surface CO2 145 fluxes (Peters et al., 2007). The prior ocean carbon fluxes were obtained from Takahashi et al. (2009), which contains monthly data from 2000 to 2013 with a horizontal resolution of 4° × 5°. As the original ocean flux dataset does not cover the simulation period of this study, we used NOAA Marine Boundary Layer (MBL) Reference data of CO2 (https://www.esrl.noaa.gov/gmd/ccgg/mbl/index.html, last accessed: 26 April 2021) to extend the ocean flux to 2018. NOAA MBL Reference is a data product derived directly from measurements of weekly air samples collected by the Cooperative Air 150 Sampling Network; the zonally-averaged MBL reference can be used to identify large-scale trends in atmospheric greenhouse gases (Conway et al., 1994). According to the correlation of MBL CO2 concentration and ocean carbon fluxes described in Takahashi et al. (2009), we constructed the ocean fluxes for 2014-2018 using corresponding MBL data. Due to a lack and deficiency of data for 2019, prior NEE and ocean carbon fluxes for 2018 were used as 2019 prior estimates.

OCO-2 column CO2 observations
The OCO-2 Level 2 Lite (v9r) carbon dioxide column-averaged dry-air mole fraction (XCO2) products retrieved by the Atmospheric Carbon Observations from Space (ACOS) algorithm (Connor et al., 2008) were used in this study (O'dell et al., 2012; O'dell et al., 2018). The OCO-2 satellite carries high-resolution spectrometers that return high-precision measurements of reflected sunlight received within the CO2 and O2 bands in the short-wave infrared spectrum (Crisp et al., 2012). The OCO-160 2 spacecraft flies in a 705-km-altitude sun-synchronous orbit, with a 16-day (233 orbits) ground track repeat cycle. OCO-2 has a footprint of 1.29 × 2.25 km 2 at nadir mode and acquires eight cross-track footprints, creating a swath width of 10.3 km. After filtering and bias correction, the OCO-2 XCO2 retrievals (v7) agree well when aggregated around and coincident with groundbased TCCON data in nadir, glint, and target observation modes, with absolute median differences of < 0.4 ppm and rootmean-square differences of < 1.5 ppm . O'dell et al. (2018) demonstrated that OCO-2 retrieval error 165 variance with respect to TCCON was reduced by 20 % over land and 40 % over ocean between versions 7 and 8, with improved nadir and glint observation consistency over land.
Considering the quality and large amount of OCO-2 XCO2 observations, we performed quality control and observation thinning to filter out some retrievals (Han and Tian, 2019). The observation preprocessing procedure included three steps. In OCO-2 Lite product. xco2_quality_flag = 0 denotes the retrieval quality is good whereas xco2_quality_flag = 1 denotes the quality is bad. We selected only observations with xco2_quality_flag = 0. The xco2_uncertainty parameter represents the posterior uncertainty of XCO2 retrieval, and observations with xco2_uncertainty > 1.5 ppm were filtered out. In the second step, differences between OCO-2 XCO2 and corresponding model-simulated XCO2 were considered. The observation was discarded when the absolute difference exceeded 2 ppm, or the difference was beyond the range of two standard deviations 175 above or below the daily mean difference. In the third step, if the number of observations remaining after the first two filtering steps exceeded 20,000, then the observation too close to the previous one was eliminated to maintain a number of observations around 20,000. Figure 2 shows the OCO-2 observations in the first inversion window before and after quality control and thinning. After the preprocessing procedure, extremely large and small observations were excluded. Observations over the Antarctic were generally discarded due to poor quality (xco2_quality_flag = 1), perhaps because of a small solar altitude angle 180 (O'dell et al., 2018). The xco2_uncertainty parameter provided by the OCO-2 Level 2 Lite product was used to construct the observation error covariance matrix R; observation errors were assumed to be independent of each other. We used an observation operator to project the model-simulated 3D atmospheric CO2 concentration to the satellite column-averaged concentration. According to Connor et al. (2008), the observation operator is expressed as follows: ( 3)  190 where XCO 2 is the model simulated column concentration, XCO 2 is the prior column concentration, is the pressure weighting function, is the averaging kernel matrix, is the modeled CO2 profile, and is the prior profile.

Uncertainty quantification and ensemble update
As a hybrid method based on 4DVar and the ensemble Kalman filter (EnKF), the NLS-4DVar method applies an ensemble to approximate the prior error covariance matrix  as follows: 195 and is the number of prior perturbations. According to Evensen (2009), the ensemble of posterior perturbations after assimilation is calculated as follows: and Ф is a random orthogonal matrix, = ℎ( ) − ℎ( ). Then, the prior ( ) and posterior ( ) error covariance 205 matrices can be calculated using and , respectively, according to Eq. (4). and represent the prior and posterior uncertainties of scaling factors, respectively. In the flux assimilation pass, the prior perturbations of the scaling factor in the first inversion window were obtained through historical sampling, and prior perturbations in the following windows were generated through ensemble updating. Compared with previous versions (Tian et al., 2014a(Tian et al., , 2014b, the current Tan-Tracker system adopts an upgraded ensemble update method that well maintains the ensemble dispersion during cyclic 210 assimilation (Tian et al., 2020). The new ensemble update equation is as follows: where , +1 is the prior perturbations of the (r + 1)th window and , is the prior perturbations of the rth window.
where we assume that the flux uncertainties are time independent.
The uncertainty reduction (UR) rate is an important indicator of inversion system performance. Following Deng et al. 220 (2007), the UR is defined as follows: The magnitudes of the UR rates indicate the degree to which the final inverted fluxes are constrained by atmospheric CO2 observations.

Evaluation of posterior fluxes 225
Generally, it is impossible to directly verify the posterior fluxes due to the lack of flux observations. Instead, we compared the  2021)). The evaluation of posterior fluxes consists of two parts. First, the simulated CO2 concentrations were compared with unassimilated OCO-2 XCO2 retrievals, that is, independent observations. Second, the simulated CO2 concentrations were validated by comparisons with TCCON XCO2 observations. TCCON is a network of 230 ground-based Fourier Transform Spectrometers that record direct solar spectra in the near-infrared spectral region. From these spectra, accurate and precise column-averaged CO2 abundances are retrieved and reported (Wunch et al., 2011). In this study, we used GGG2014 version data from 30 TCCON sites (Wunch et al., 2015); the site locations are shown in Fig. 3.
To statistically evaluate the inversion results, we calculated four statistics: the root mean square error (RMSE), mean bias (BIAS), mean absolute error (MAE), and correlation coefficient (CORR). These four statistics represent the bias, error variance, 235 and linear correlation between the simulated and observed CO2 concentrations, and are formulated as follows: (13)

Dataset description
Here we present a global gridded dataset containing various carbon fluxes and simulated CO2 concentrations from 2015 to 2019 (inclusive). The posterior NEE and ocean carbon fluxes were generated using the Tan-Tracker inversion system inferred from OCO-2 XCO2 retrievals. Apart from these two optimized fluxes, all other emissions listed in Table 2

Global carbon budget
The annual atmospheric CO2 growth rate (AGR) is the net difference between total CO2 emissions and uptake over land and ocean (Jiang et al., 2021). Table 3  CO2 exchange based on atmospheric measurements using atmospheric transport inversion. In this study, we used version s10oc_v2020 of the JCS product. Table 3 shows that prior AGR evaluations were generally greater than those of GCB2020, except for 2018, contributing to a positive annual mean deviation of 0.43 PgC yr -1 , mainly due to prior results for 2016 and 2017, which were 0.61 and 1.14 PgC yr -1 , respectively. The positive sign of the deviation indicates that prior carbon sources were overestimated, or sinks were underestimated. After absorbing the satellite observations, the posterior AGR weakened 275 every year, with an annual mean of 5.33 PgC yr -1 , which was within the uncertainty range of the GCB2020 estimate (5.49 PgC yr -1 ). The AGRs of CT2019B were generally smaller than those of GCB2020. A comparison of the AGR of CT2019B and GCB2020 throughout 2015-2018 showed a negative deviation of 0.12 PgC yr -1 . The AGRs of JCS during 2015-2017 were quite close to the GCB2020 evaluations. The largest deviation in the JCS data occurred in 2019, with a positive deviation of 0.4 PgC yr -1 compared with GCB2020. Overall, posterior AGRs were clearly refined compared to the prior, indicating the 280 effectiveness of the flux inversions. Notably, although the annual mean result is comparatively accurate, the annual AGRs still require improvement, which imposes higher requirements on inversion systems.  sink had an annual value of -3.33 PgC yr -1 , slightly stronger than the prior value of -3.29 PgC yr -1 . The ocean formed a sink for CO2 because the gas is highly soluble in seawater, a property that is critical to the ocean's ability to absorb atmospheric CO2 (Mckinley et al., 2017). During the past two decades, sufficient evidence has been gathered to support the conclusion that the global integrated ocean carbon sink has grown as atmospheric carbon content increases (Khatiwala et al., 2013;Watson et 300 al., 2020). Posterior ocean sinks showed an increasing trend, especially from 2015 to 2018. The GCB2020 ocean flux estimates were quite stable, with an annual mean of -2.60 PgC yr -1 and standard deviation of 0.06 PgC yr -1 . Posterior ocean sinks were universally greater than the GCB2020 estimates, with the largest deviation occurring in 2018 (1.08 PgC yr -1 ). Posterior ocean fluxes were more consistent with the CT2019B estimates; both were stronger than the GCB2020 estimates, and those of CT2019B were even stronger than the posterior fluxes. Compared with the posterior and CT2019B results, JCS had smaller 305 ocean sinks and better coherence with GCB2020.

310
Combining the NEE and ocean fluxes, the total posterior annual mean sink was -7.40 PgC yr -1 , which is strengthened compared to the prior estimate of -6.82 PgC yr -1 , and contributed to a decrease in net carbon emissions that was consistent with the aforementioned AGR estimates. Notably, CO2 emissions and uptake were classified differently among different inversion systems and GCB2020, preventing NEE and ocean flux estimates from being perfectly equivalent. GCB2020 contains fossil CO2 emissions, land-use change emissions, and terrestrial and ocean CO2 sinks. CT2019B includes fossil CO2 315 emissions, biomass burning emissions, and terrestrial and ocean CO2 sinks. In this study, apart from the above mentioned sources and sinks, we additionally considered emissions from shipping, aviation, biofuel burning, and chemical production from CO oxidation (Table 2), providing a more comprehensive consideration of the global carbon cycle. Table A1 shows the annual CO2 emissions and uptake from all considered sectors. The uncertainties of carbon emissions other than NEE and ocean fluxes are unclear and therefore omitted, but these uncertainties actually exist and should be well described. Both the 320 improvements of inversion methods and accompanying emissions estimates are necessary to obtain more accurate NEE and ocean carbon fluxes. Unlike terrestrial ecosystems, the ocean acted as a carbon sink year-around. The prior annual mean ocean uptake values 335 for winter, spring, summer, and autumn were -0.99, -0.86, -0.68, and -0.77 PgC, respectively. The inversion modifications were not significant, generating posterior estimates of -1.01, -0.89, -0.69, and -0.75 PgC, respectively. The largest and smallest uptakes occurred in winter and summer, respectively. Seasonal changes in the ocean flux field are attributed to the combined effects of seasonal changes in water temperature, biological utilization of CO2, water mixing, and wind speeds (Takahashi et al., 2002). 340  Figure 6 shows the global 5-year mean monthly prior and posterior uncertainties for NEE and ocean fluxes. Monthly means 345 were calculated because uncertainties exhibit clear monthly characteristics. Figure A1 includes the daily uncertainties from prior and posterior estimates. The uncertainties were calculated from the ensemble of perturbations and prior fluxes using Eqs.

Uncertainty evaluation
(4), (10), and (11). As the prior ensemble dispersion was maintained when updated over time, the variation in uncertainties mainly depended on the prior fluxes and flux correlations between different grids. The pattern of NEE monthly uncertainties was consistent with NEE monthly variation (Fig. 5a), with the largest sinks (June, July, and August) contributing to the largest 350 uncertainties in the same months. For the ocean fluxes, uncertainties were much smaller than NEE. Because the flux initial ensemble of perturbations was generated through historical sampling, small uncertainties indicated relatively stable ocean fluxes over time. Large ocean uncertainties were concentrated in winter and summer. Large uncertainties in winter were related to the large ocean sink during that season, whereas large uncertainties in summer were caused by the high spatial correlations between gridded fluxes. 355 Figure 7 shows the daily UR rates for NEE and ocean fluxes from 2015 to 2019, which varied prominently over time.
The estimated URs depended strongly on the number of observations and specification of the observation errors, as well as the a priori error covariance matrix (Deng et al., 2014;Jiang et al., 2021). The UR values reflect the decrease of posterior uncertainties constrained by OCO-2 XCO2 retrievals. The NEE URs ranged from 0.00 % to 27.90 %. The 0.00 % URs were caused because no OCO-2 retrievals were provided in the inversion windows from April 21, 2015 to May 6, 2015 and from 360 August 10, 2017 to September 6, 2017. Ocean URs were comparable to those over land, ranging from 0.00 % to 25.13 %. Due to the monthly prior ocean fluxes and 14-day inversion window, dates within the same month and the same inversion window shared common URs. Table 4 presents the global annual and annual mean prior uncertainties as well as the UR rates of NEE and ocean sinks during 2015-2019. Temporal correlations in the error covariance matrix were neglected, resulting in discrepancies among the annual uncertainties and URs (Deng et al., 2014). The annual differences in uncertainty and UR 365 decreased greatly when integrated over a year. The annual prior uncertainty of NEE was ~0.55 PgC yr -1 , and URs ranged from 7.28 % in 2015 to 8.45% in 2018. The annual prior uncertainties associated with ocean sinks were much smaller than those of NEE, and the annual total uncertainties of different years were probably consistent with each other. URs for ocean sinks were generally higher than those of NEE, with larger interannual variability, ranging from 6.59 % in 2014 to 10.49 % in     North America, northern South America, small areas in southern South America, northern and southern Europe, western Russia, scattered areas of northwestern and southwestern China, the Malay Archipelago, and the southern coast of Australia. After assimilating OCO-2 XCO2 retrievals, sinks of central and northeastern North America were impaired whereas sinks of western, southeastern, and southern North America were strengthened. Meanwhile, sinks of the southern Amazon and eastern Brazil were weakened. In Europe, the source was weaker and sink was stronger than prior ones. A widespread weakening of sink 385 occurred in central Russia. Sinks of northeastern and eastern China as well as Australia were reinforced, sinks of Indochinese Peninsula were weakened nevertheless.

Regional fluxes
By contrast, the ocean carbon flux distribution was much simpler, with a clear characteristic that fluxes varied with latitude. The ocean flux sources mainly occurred in tropical oceans and high-latitude Southern Ocean, and the equatorial Pacific was the most prominent source area for atmospheric CO2. Sinks mainly occurred in mid-latitude regions of both 390 hemispheres and high latitude Northern Ocean. After inversion, the overall pattern of ocean fluxes was preserved. The tropical sources were generally impaired, especially the eastern equatorial Pacific. Sinks of North Pacific were strengthened whereas sinks of southern Pacific were weakened. Besides, the northern and southern sinks of Atlantic as well as southern sinks of Indian Ocean were basically weaker than prior estimates. In general, the spatial distribution of NEE was far more complicated than that of ocean carbon fluxes (Fig. 8), and the interannual variation of NEE was also stronger (Fig. A2). Therefore, the 395 regional NEE requires more detailed analysis and attribution. Table 5 lists the prior and posterior NEE for the 11 TransCom land regions. In each region, the prior NEE indicated a carbon sink. The powerful sinks occurred mainly in Asia, Africa, and North America, accounting for 45 %, 27 %, and 18 % of the total carbon sink. Among these regions, Boreal Asia had the strongest sink, followed by Temperate Asia, and the weakest sink occurred in Tropical South America. After inversion, the posterior sinks of most regions were reinforced, except in 400 Tropical South America and Boreal Asia. The European sink showed the largest increase of 0.29 PgC yr -1 . Clear sink enhancement also occurred in Temperate Asia and Northern Africa, of 0.17 and 0.10 PgC yr -1 , respectively. By contrast, Tropical South America shifted from a weak sink to a weak source. Harris et al. (2021) has pointed out that forests in the Brazilian Amazon were a net carbon source of 0.22 GtCO2e yr -1 between 2001 and 2019, and estimated that commodity-driven deforestation was the largest source of gross forest-related emissions. In Boreal Asia, the sink was weakened by 0.14 PgC yr -405 1 , but remained the strongest sink worldwide, followed by Temperate Asia and Northern Africa, accouning for 40 %, 27 %, and 17 % of the posterior global total sink. The interannual variability differed greatly for different regions. The Tropical South America had the most obvious interannual variability, followed by Temperate North America and Southern Africa (Fig. A3).
The flux uncertainties in the Northern Hemisphere were generally higher than those in the Southern Hemisphere because of the more complicated distribution of carbon sources and sinks in the Northern Hemisphere (Fig. 8)

Tibetan Plateau fluxes
The Tibetan Plateau was demonstrated to act as a carbon sink through remote sensing observations, ecosystem models, and atmospheric inversions during the 1980s and 1990s (Piao et al., 2009). Recent flux observations and field soil carbon data have 420 also confirmed the carbon sink function of the Tibetan Plateau (Kato et al., 2006;Li et al., 2016;Ding et al., 2017). The carbon estimates of the Tibetan Plateau reported in this study are consistent with those reported in previous studies. The prior annual mean NEE was -52.65 TgC yr -1 , whereas the posterior NEE was weakened to -49.52 TgC yr -1 during 2015-2019. Figure 9 shows The UR rates on the Tibetan Plateau were comparable to that observed globally and regionally.
The NEE on the Tibetan Plateau showed a clear seasonal cycle. The ecosystem acted as a carbon sink from May to 430 September and as a carbon source from January to April and October to December, with the peak monthly uptake occurring in July. The prior annual mean sink from May to September and source from January to April and October to December were estimated to be -101.65 and 49.00 TgC yr -1 , which contributed to forming a net annual sink. The corresponding posterior values were -104.40 and 54.88 TgC yr -1 , respectively. Both the posterior seasonal sink and source were amplified, whereas the increased amplitude of the source was stronger than that of the sink, resulting in a decrease in the posterior annul net sink.   have reported clear regional heterogeneity in the spatial distribution of carbon sinks on the Tibetan Plateau, with significantly stronger carbon sinks in eastern alpine meadows than in western alpine steppes (Jin et al., 2015;Ding et al., 2017). Our 440 estimates were consistent with these findings. Overall, the Tibetan Plateau ecosystem tended to take up carbon; the eastern plateau was the strongest carbon sink, whereas small parts of the southeastern and northern regions emitted CO2. The eastern distribution was dramatically modified after inversion, with impaired central sinks and amplified surrounding sinks. The central and northern sinks of the plateau were weakened, and the western and southern sinks were slightly strengthened. This spatial heterogeneity in the distribution of carbon sources and sinks on the Tibetan Plateau is related to water conditions. 445 Alpine meadows face lower drought stress than alpine steppes; therefore, the increased vegetation productivity of alpine meadows due to warming is greater than that of alpine steppes, in turn leading to more organic carbon fixation by vegetation input into the soil (Ding et al., 2017;Du et al., 2018;Hopping et al., 2018). The results of field control experiments on the Tibetan Plateau have also shown that warming promotes the net exchange of carbon fluxes in meadow ecosystems with better water conditions, whereas it inhibits carbon fixation in arid steppes (Ganjurjav et al., 2018).

Evaluation using independent OCO-2 observations 455
We evaluated model simulated XCO2 against independent OCO-2 retrievals to indirectly verify the posterior NEE and ocean carbon fluxes. Figure 11 compares prior and posterior simulated XCO2 with independent OCO-2 XCO2 observations. The simulated and observed XCO2 values were daily averages; daily statistics were calculated using all observations within a single day. The prior simulated XCO2 were generally higher than OCO-2 observations, and the deviation accumulated with forward model integration. Since 2016, RMSE, BIAS, and MAE remained above 1.5, 1.0, and 1.5 ppm, respectively. By 2018, these 460 statistics reached 2.3, 2.0, and 2.0 ppm, respectively. Table 6  simulated CO2 concentrations decreased, especially at the surface (Fig. A4); posterior RMSE, BIAS, and MAE were greatly improved, with RMSE decreasing from 2.12 to 1.27 ppm, BIAS from 1.62 to 0.33 ppm, and MAE from 1.80 to 0.95 ppm.
There were no significant changes in CORR before or after inversion. These results show that satellite observations were well absorbed by the inversion system, such that the NEE and ocean carbon fluxes were corrected effectively. 465   Table 7 lists the statistics for simulated XCO2 and TCCON observed XCO2. The prior model-simulated XCO2 were generally higher than TCCON observations except at the HeiFei site, and BIAS exceeded 2 ppm at many sites (e.g., Ny Ålesund, Garmisch, and Nicosia sites). RMSE, BIAS, and MAE between prior simulated and observed XCO2 were 1.89, 1.51, and 1. Hemisphere, due to the more complex carbon flux pattern in the Northern Hemisphere . After inversion, the global carbon sink was reinforced, contributing to a more reasonable atmospheric CO2 simulation. The posterior RMSE, BIAS, and MAE were 1.06, 0.12, and 0.81 ppm, decreasing by 44 %, 92 %, and 50 %, respectively, compared to the prior statistics, demonstrating the optimization of posterior fluxes through absorbing OCO-2 satellite observations. Figure 12 shows the time series of prior and posterior simulated XCO2 and observed XCO2 at 30 TCCON sites. The 480 global CO2 concentrations generally increased over time. In the Northern Hemisphere, CO2 concentrations showed a strong seasonal cycle. The strong source in winter caused high CO2 concentrations, whereas the strong sink in summer caused low CO2 concentrations. In the Southern Hemisphere, the seasonal cycle was much weaker than in the Northern Hemisphere, and became less clear as latitude increased. The Southern Hemisphere is mainly covered by ocean, and the seasonal variation in ocean fluxes was much less pronounced than NEE (Fig. 5), with a less complex spatial distribution than that over land (Fig. 485 8), contributing in turn to the weak seasonal cycle of CO2 concentrations in the Southern Hemisphere. Generally, posterior simulated XCO2 characterized the temporal variation characteristics of CO2 observations. At most sites, prior simulated XCO2 were larger than TCCON observations. The overestimation of winter CO2 concentrations in the Northern Hemisphere was very clear, especially in the mid-and high latitudes. After inversion, the simulated concentrations agreed better with TCCON observations (Fig. 13). The overestimation of winter CO2 concentrations 490 at mid-and high latitudes of the Northern Hemisphere was also well mitigated due to the weakening of winter carbon sources (Sec. 4.1.2). The inversion results were not ideal at the Jet Propulsion Laboratory, Caltech, and HeFei sites. At these three sites, prior simulated XCO2 were close to the TCCON observations, whereas posterior simulations were about 1 ppm smaller than the TCCON observations, suggesting that carbon sinks were overestimated or carbon sources were underestimated. This may be related to biases of OCO-2 retrievals (O'dell et al., 2018). Besides, the resolution of the Tan-Tracker inversion system is 2° 495 latitude × 2.5° longitude; therefore, it cannot achieve ideal results at every site worldwide. Improvements of inversion resolution and observation quality are necessary to infer fluxes at finer regional scales.

Discussion
The atmospheric CO2 observations assimilated in this study were OCO-2 XCO2 retrievals. Although satellite observations have better spatial coverage than ground-based observations, they remain unable to continuously monitor CO2 concentrations with global coverage, resulting in observational gaps. Spatial and temporal gaps in ground-and space-based observations can introduce artifacts into flux estimates, leading to difficulties in constraining carbon fluxes at regional scales (Basu et al., 2018;510 Byrne et al., 2017;Liu et al., 2014). Advances in satellite retrieval algorithms allow significant improvements in the consistency of space-and surface-based flux constraints (Byrne et al., 2020). The simultaneous assimilation of ground-and space-based atmospheric CO2 observations will be carried out in the Tan-Tracker inversion system, expecting more precise estimation of regional fluxes.
In this study, we optimized terrestrial ecosystem carbon fluxes and ocean carbon fluxes, while assuming no error in other 515 emission inventories. As the largest emission sector, global fossil CO2 emissions are estimated with an uncertainty at ±5 % according to GCB2020 (Friedlingstein et al., 2020). However, uncertainties of national bottom-up fossil fuel emissions vary across countries according to data collection and management practices (Andres et al., 2014;Marland, 2008;Oda et al., 2018a).
Inaccuracies in fossil CO2 emissions are added to NEE optimization based on our assumption, which may cause bias in NEE estimates. Fossil CO2 emissions are anthropogenic, whereas NEE are natural emissions or uptakes; both have different 520 emission characteristics and spatial distributions. In a future study, we plan to simultaneously optimize fossil emissions and NEE to distinguish errors from anthropogenic and natural sectors.
The treatment of various errors during inversion also requires improvement. The transport model is assumed to have no error, and all errors associated with simulated CO2 are attributed to the initial CO2 concentrations and carbon fluxes. An independent and comprehensive quantification of the systematic errors of transport models is required to distinguish CO2 525 simulation errors from different sources. To account for prior flux uncertainties, we used 36 samples to approximate the prior error covariance matrix and updated the matrix over time. To date, there is no consensus on the accurate calculation of prior uncertainties. In this study, when gridded flux uncertainties were integrated over space and time, only the spatial correlation of gridded fluxes was considered, whereas the temporal correlation was omitted. However, the gridded fluxes were correlated in time according to their diffusive and accumulative characteristics, resulting in underestimated monthly or yearly 530 uncertainties. Further efforts are required to improve confidence in the resulting posterior fluxes and corresponding uncertainties.
The global-scale flux inversion performed in this study had a spatial resolution of 2° latitude × 2.5° longitude. Inversions with this resolution are feasible and effective for large and global regions. However, the requirements for fine and accurate regional inversions are increasing, and the present resolution cannot meet the stricter standards anticipated in the future. 535 Therefore, we plan to perform finer inversions at regional scales to obtain a better understanding of regional carbon cycles.

Data availability
The dataset is available in the National Tibetan Plateau/Third Pole Environment Data Center and can be accessed at https://doi.org/10.11888/Meteoro.tpdc.271317 (Jin et al., 2021). As the satellite XCO2 retrievals, carbon fluxes, and meteorological data are persistently improving and updating, we plan to update the dataset annually in the future, aiming to support current 540 scientific research and policy making.

Conclusion
Here we present a global and regional resolved NEE and ocean carbon flux dataset during 2015-2019. The dataset was generated by the Tan-Tracker inversion system constrained by OCO-2 XCO2 retrievals. We qualitatively validated the posterior fluxes by comparing the posterior simulated CO2 concentrations with OCO-2 independent retrievals and TCCON 545 observations. After inversion, posterior RMSE, BIAS, and MAE for simulated and OCO-2 XCO2 were greatly improved, with RMSE decreasing from 2.12 to 1.27 ppm, BIAS from 1.62 to 0.33 ppm, and MAE from 1.80 to 0.95 ppm. The posterior RMSE, BIAS, and MAE for simulated and TCCON XCO2 were 1.06, 0.12, and 0.81 ppm, decreasing by 44 %, 92 %, and 50 %, respectively, compared to the prior statistics. Both validations demonstrated the optimization of posterior fluxes through absorbing OCO-2 satellite observations. We described the characteristics of the dataset at the global, regional and Tibetan 550 Plateau scales from aspects of carbon budget, annual and seasonal variations, and spatial distribution. Also, the posterior fluxes were assessed through comparisons with other flux estimates, such as GCB2020, CT2019B, and JCS data. The posterior 5year annual mean global AGR was 5.35 PgC yr -1 and within the uncertainty of the GCB2020 estimate. The posterior annual mean NEE and ocean carbon fluxes were -4.07 and -3.33 PgC yr -1 , respectively. Regional fluxes are analysed based on TransCom partition. All 11 land regions acted as carbon sinks apart from Tropical South America, which was almost neutral. 555 The strongest carbon sinks were seen in Boreal Asia, followed by Temperate Asia and North Africa. The Tibetan Plateau ecosystem was estimated as a carbon sink as a whole, absorbing -49.52 TgC yr -1 for the annual mean. The sinks in the eastern alpine meadows were much stronger than that in the western steppes, while sporadic areas in the southeastern and northern regions acted as sources. This dataset can help understand global and regional carbon distribution and its variability, support the formulation of climate policies, and make well-informed projections under future climate change. 560

Appendix A
In the Appendix, we include the table and figures to support the main text. Surface correction -1.14 -1.14 -1.14 -1.