the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
A long-term (2000–2020) global 0.05° continuous atmospheric carbon dioxide dataset (GCXCO2) combining OCO-2 observations and model simulations based on stack learning
Abstract. High-accuracy atmospheric (carbon dioxide) CO2 concentration data are critical in understanding the global carbon cycle, but there is still a lack of a high-resolution CO2 product with long-term and global seamless coverage. In this study, a global continuous 8-day XCO2 (column-averaged CO2 dry air mole fraction) product (GCXCO2) was reconstructed at a spatial resolution of 0.05° from 2000 to 2020, based on OCO-2 satellite data. An ensemble machine learning stacking regression model, which combines light gradient boosting machine (LGBM), extreme gradient boosting (XGB), extremely randomized trees (ETR), gradient boosting regression (GBR), and random forest (RF), was utilized to model the relationships between XCO2 data and auxiliary satellite, simulation data, and meteorological data. A dynamic normalization strategy was developed to handle the great temporal variation issue and ensure the temporal expansion of the prediction model. Multiple validation methods were applied to comprehensively evaluate the spatial and temporal generalization ability of the model and product. The 10-fold cross-validation shows an overall satisfactory result at a global scale, with R2 = 0.974 and root-mean-square error (RMSE) = 0.551 ppm (parts per million). Further spatial extension and temporal prediction experiments also proved that dependable results could be obtained in the regions and time periods without valid OCO-2 satellite observations (R2 = 0.958 and R2 = 0.886, respectively). Compared with Total Carbon Column Observing Network (TCCON) ground station observations, the GCXCO2 product performs better than the model simulation data, demonstrating a better accuracy and a higher spatial resolution. Based on the GCXCO2 product, an upward annual trend of approximately 2.09 ppm/year can be found for global XCO2 between 2000 and 2020, and significant differences are found between the Northern and Southern hemispheres in different seasons. This product may well be the first remote sensing-based global high-precision long-term XCO2 dataset, which will help advance the understanding of climate change and carbon balance. The dataset can be obtained freely at https://doi.org/10.5281/zenodo.10083102 (Guan and Sun, 2023).
- Preprint
(2754 KB) - Metadata XML
-
Supplement
(2352 KB) - BibTeX
- EndNote
Status: closed
-
RC1: 'Comment on essd-2023-465', Anonymous Referee #1, 30 Dec 2023
The comment was uploaded in the form of a supplement: https://essd.copernicus.org/preprints/essd-2023-465/essd-2023-465-RC1-supplement.pdf
-
RC2: 'Comment on essd-2023-465', Anonymous Referee #2, 01 Jan 2024
In this study, the authors aimed to create a method for fixing gaps in satellite carbon dioxide data. Although this is valuable, I believe it falls short of article publication. They filled in missing CO2 data using extra satellite info and simulations. Here are the main problems:
- The study treats satellite CO2 as observations. While satellite data can replace actual observations in some cases, it must fully match observations. Past studies found significant differences between oco-2 measurements and ground observations.
- Before building the model, CT assimilation system simulations closely matched observations. The model's high accuracy is because CT's CO2 simulation was precise. So, developing a machine-learning model just to reduce a not-so-significant error rate doesn't make sense.
- To show the model's quality, the authors used ground-based data. CAMS and CT assimilation systems use ground data to improve predictions. CT simulations matching observations well is likely because these systems previously used ground-based data to correct simulations.
- Similarly, satellite CO2 is used in assimilation systems for correction. The CT and CAMS simulations used as inputs depend on the machine learning model's output (satellite CO2). This dependency raises questions about the model's reliability.
- Creating a machine learning model for satellite data using dynamic model simulations contradicts the main advantage of statistical models—low computational cost. In reality, implementing this model means getting outputs from both CT and CAMS, undermining its intended efficiency.
While there are additional aspects to consider, fundamental issues in developing the model for this study hinder further exploration. Regrettably, given these challenges, I anticipate difficulties in publishing this article in a reputable journal such as ESSD.
- RC3: 'Comment on essd-2023-465', Anonymous Referee #3, 06 Feb 2024
Status: closed
-
RC1: 'Comment on essd-2023-465', Anonymous Referee #1, 30 Dec 2023
The comment was uploaded in the form of a supplement: https://essd.copernicus.org/preprints/essd-2023-465/essd-2023-465-RC1-supplement.pdf
-
RC2: 'Comment on essd-2023-465', Anonymous Referee #2, 01 Jan 2024
In this study, the authors aimed to create a method for fixing gaps in satellite carbon dioxide data. Although this is valuable, I believe it falls short of article publication. They filled in missing CO2 data using extra satellite info and simulations. Here are the main problems:
- The study treats satellite CO2 as observations. While satellite data can replace actual observations in some cases, it must fully match observations. Past studies found significant differences between oco-2 measurements and ground observations.
- Before building the model, CT assimilation system simulations closely matched observations. The model's high accuracy is because CT's CO2 simulation was precise. So, developing a machine-learning model just to reduce a not-so-significant error rate doesn't make sense.
- To show the model's quality, the authors used ground-based data. CAMS and CT assimilation systems use ground data to improve predictions. CT simulations matching observations well is likely because these systems previously used ground-based data to correct simulations.
- Similarly, satellite CO2 is used in assimilation systems for correction. The CT and CAMS simulations used as inputs depend on the machine learning model's output (satellite CO2). This dependency raises questions about the model's reliability.
- Creating a machine learning model for satellite data using dynamic model simulations contradicts the main advantage of statistical models—low computational cost. In reality, implementing this model means getting outputs from both CT and CAMS, undermining its intended efficiency.
While there are additional aspects to consider, fundamental issues in developing the model for this study hinder further exploration. Regrettably, given these challenges, I anticipate difficulties in publishing this article in a reputable journal such as ESSD.
- RC3: 'Comment on essd-2023-465', Anonymous Referee #3, 06 Feb 2024
Data sets
Global continuous 0.05 degree atmospheric carbon dioxide dataset (GCXCO2) based OCO-2 satellite, CAMS and CarbonTracker simulation data from 2000 to 2020 Guan Xiaobin https://doi.org/10.5281/zenodo.10083102
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
667 | 210 | 52 | 929 | 54 | 49 | 53 |
- HTML: 667
- PDF: 210
- XML: 52
- Total: 929
- Supplement: 54
- BibTeX: 49
- EndNote: 53
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1