A long-term (2000&ndash;2020) global 0.05&deg; continuous atmospheric carbon dioxide dataset (GCXCO<sub>2</sub>) combining OCO-2 observations and model simulations based on stack learning

Guan, Xiaobin; Sun, Zhihao; Chu, Dong; Xie, Guanglei; Wang, Yuchen; Shen, Huanfeng

doi:10.5194/essd-2023-465

Preprints

https://doi.org/10.5194/essd-2023-465

Preprints

17 Nov 2023

| 17 Nov 2023

Status: this discussion paper is a preprint. It has been under review for the journal Earth System Science Data (ESSD). The manuscript was not accepted for further review after discussion.

A long-term (2000–2020) global 0.05° continuous atmospheric carbon dioxide dataset (GCXCO₂) combining OCO-2 observations and model simulations based on stack learning

Xiaobin Guan, Zhihao Sun, Dong Chu, Guanglei Xie, Yuchen Wang, and Huanfeng Shen

Abstract. High-accuracy atmospheric (carbon dioxide) CO₂ concentration data are critical in understanding the global carbon cycle, but there is still a lack of a high-resolution CO₂product with long-term and global seamless coverage. In this study, a global continuous 8-day XCO₂ (column-averaged CO₂ dry air mole fraction) product (GCXCO₂) was reconstructed at a spatial resolution of 0.05° from 2000 to 2020, based on OCO-2 satellite data. An ensemble machine learning stacking regression model, which combines light gradient boosting machine (LGBM), extreme gradient boosting (XGB), extremely randomized trees (ETR), gradient boosting regression (GBR), and random forest (RF), was utilized to model the relationships between XCO₂ data and auxiliary satellite, simulation data, and meteorological data. A dynamic normalization strategy was developed to handle the great temporal variation issue and ensure the temporal expansion of the prediction model. Multiple validation methods were applied to comprehensively evaluate the spatial and temporal generalization ability of the model and product. The 10-fold cross-validation shows an overall satisfactory result at a global scale, with R²= 0.974 and root-mean-square error (RMSE) = 0.551 ppm (parts per million). Further spatial extension and temporal prediction experiments also proved that dependable results could be obtained in the regions and time periods without valid OCO-2 satellite observations (R² = 0.958 and R² = 0.886, respectively). Compared with Total Carbon Column Observing Network (TCCON) ground station observations, the GCXCO₂ product performs better than the model simulation data, demonstrating a better accuracy and a higher spatial resolution. Based on the GCXCO₂ product, an upward annual trend of approximately 2.09 ppm/year can be found for global XCO₂ between 2000 and 2020, and significant differences are found between the Northern and Southern hemispheres in different seasons. This product may well be the first remote sensing-based global high-precision long-term XCO₂dataset, which will help advance the understanding of climate change and carbon balance. The dataset can be obtained freely at https://doi.org/10.5281/zenodo.10083102 (Guan and Sun, 2023).

Received: 13 Nov 2023 – Discussion started: 17 Nov 2023

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Preprint (PDF, 2754 KB)

Supplement (2352 KB)

Download & links

Xiaobin Guan, Zhihao Sun, Dong Chu, Guanglei Xie, Yuchen Wang, and Huanfeng Shen

Status: closed

RC1: 'Comment on essd-2023-465', Anonymous Referee #1, 30 Dec 2023

The comment was uploaded in the form of a supplement: https://essd.copernicus.org/preprints/essd-2023-465/essd-2023-465-RC1-supplement.pdf

Citation: https://doi.org/10.5194/essd-2023-465-RC1
RC2:
'Comment on essd-2023-465', Anonymous Referee #2, 01 Jan 2024
In this study, the authors aimed to create a method for fixing gaps in satellite carbon dioxide data. Although this is valuable, I believe it falls short of article publication. They filled in missing CO2 data using extra satellite info and simulations. Here are the main problems:
The study treats satellite CO2 as observations. While satellite data can replace actual observations in some cases, it must fully match observations. Past studies found significant differences between oco-2 measurements and ground observations.

Before building the model, CT assimilation system simulations closely matched observations. The model's high accuracy is because CT's CO2 simulation was precise. So, developing a machine-learning model just to reduce a not-so-significant error rate doesn't make sense.

To show the model's quality, the authors used ground-based data. CAMS and CT assimilation systems use ground data to improve predictions. CT simulations matching observations well is likely because these systems previously used ground-based data to correct simulations.

Similarly, satellite CO2 is used in assimilation systems for correction. The CT and CAMS simulations used as inputs depend on the machine learning model's output (satellite CO2). This dependency raises questions about the model's reliability.

Creating a machine learning model for satellite data using dynamic model simulations contradicts the main advantage of statistical models—low computational cost. In reality, implementing this model means getting outputs from both CT and CAMS, undermining its intended efficiency.

While there are additional aspects to consider, fundamental issues in developing the model for this study hinder further exploration. Regrettably, given these challenges, I anticipate difficulties in publishing this article in a reputable journal such as ESSD.
Citation: https://doi.org/10.5194/essd-2023-465-RC2
RC3: 'Comment on essd-2023-465', Anonymous Referee #3, 06 Feb 2024

Effort as described fails to meet internal metrics and journal expectations. Recommend rejection.

Citation: https://doi.org/10.5194/essd-2023-465-RC3

Status: closed

RC1: 'Comment on essd-2023-465', Anonymous Referee #1, 30 Dec 2023

The comment was uploaded in the form of a supplement: https://essd.copernicus.org/preprints/essd-2023-465/essd-2023-465-RC1-supplement.pdf

Citation: https://doi.org/10.5194/essd-2023-465-RC1
RC2:
'Comment on essd-2023-465', Anonymous Referee #2, 01 Jan 2024
In this study, the authors aimed to create a method for fixing gaps in satellite carbon dioxide data. Although this is valuable, I believe it falls short of article publication. They filled in missing CO2 data using extra satellite info and simulations. Here are the main problems:
The study treats satellite CO2 as observations. While satellite data can replace actual observations in some cases, it must fully match observations. Past studies found significant differences between oco-2 measurements and ground observations.

Before building the model, CT assimilation system simulations closely matched observations. The model's high accuracy is because CT's CO2 simulation was precise. So, developing a machine-learning model just to reduce a not-so-significant error rate doesn't make sense.

To show the model's quality, the authors used ground-based data. CAMS and CT assimilation systems use ground data to improve predictions. CT simulations matching observations well is likely because these systems previously used ground-based data to correct simulations.

Similarly, satellite CO2 is used in assimilation systems for correction. The CT and CAMS simulations used as inputs depend on the machine learning model's output (satellite CO2). This dependency raises questions about the model's reliability.

Creating a machine learning model for satellite data using dynamic model simulations contradicts the main advantage of statistical models—low computational cost. In reality, implementing this model means getting outputs from both CT and CAMS, undermining its intended efficiency.

While there are additional aspects to consider, fundamental issues in developing the model for this study hinder further exploration. Regrettably, given these challenges, I anticipate difficulties in publishing this article in a reputable journal such as ESSD.
Citation: https://doi.org/10.5194/essd-2023-465-RC2
RC3: 'Comment on essd-2023-465', Anonymous Referee #3, 06 Feb 2024

Effort as described fails to meet internal metrics and journal expectations. Recommend rejection.

Citation: https://doi.org/10.5194/essd-2023-465-RC3

Xiaobin Guan, Zhihao Sun, Dong Chu, Guanglei Xie, Yuchen Wang, and Huanfeng Shen

Supplement

https://doi.org/10.5194/essd-2023-465-supplement

Data sets

Global continuous 0.05 degree atmospheric carbon dioxide dataset (GCXCO2) based OCO-2 satellite, CAMS and CarbonTracker simulation data from 2000 to 2020 Guan Xiaobin https://doi.org/10.5281/zenodo.10083102

Xiaobin Guan, Zhihao Sun, Dong Chu, Guanglei Xie, Yuchen Wang, and Huanfeng Shen

Viewed

Total article views: 2,139 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	Supplement	BibTeX	EndNote
1,499	539	101	2,139	202	130	189

HTML: 1,499
PDF: 539
XML: 101
Total: 2,139
Supplement: 202
BibTeX: 130
EndNote: 189

Views and downloads (calculated since 17 Nov 2023)

Month	HTML	PDF	XML	Total
Nov 2023	58	18	1	77
Dec 2023	67	21	5	93
Jan 2024	83	20	2	105
Feb 2024	53	18	6	77
Mar 2024	49	29	5	83
Apr 2024	37	7	9	53
May 2024	44	14	9	67
Jun 2024	53	9	4	66
Jul 2024	35	13	5	53
Aug 2024	36	13	5	54
Sep 2024	34	13	0	47
Oct 2024	21	8	0	29
Nov 2024	32	7	0	39
Dec 2024	40	11	0	51
Jan 2025	30	12	1	43
Feb 2025	18	5	0	23
Mar 2025	36	14	1	51
Apr 2025	31	11	2	44
May 2025	31	10	2	43
Jun 2025	27	18	2	47
Jul 2025	28	20	3	51
Aug 2025	62	14	1	77
Sep 2025	259	24	1	284
Oct 2025	40	38	2	80
Nov 2025	46	28	6	80
Dec 2025	32	34	6	72
Jan 2026	41	12	8	61
Feb 2026	29	7	2	38
Mar 2026	44	27	5	76
Apr 2026	51	20	5	76
May 2026	35	33	1	69
Jun 2026	9	4	1	14
Jul 2026	8	7	1	16

Cumulative views and downloads (calculated since 17 Nov 2023)

Month	HTML	PDF	XML	Total
Nov 2023	58	18	1	77
Dec 2023	67	21	5	93
Jan 2024	83	20	2	105
Feb 2024	53	18	6	77
Mar 2024	49	29	5	83
Apr 2024	37	7	9	53
May 2024	44	14	9	67
Jun 2024	53	9	4	66
Jul 2024	35	13	5	53
Aug 2024	36	13	5	54
Sep 2024	34	13	0	47
Oct 2024	21	8	0	29
Nov 2024	32	7	0	39
Dec 2024	40	11	0	51
Jan 2025	30	12	1	43
Feb 2025	18	5	0	23
Mar 2025	36	14	1	51
Apr 2025	31	11	2	44
May 2025	31	10	2	43
Jun 2025	27	18	2	47
Jul 2025	28	20	3	51
Aug 2025	62	14	1	77
Sep 2025	259	24	1	284
Oct 2025	40	38	2	80
Nov 2025	46	28	6	80
Dec 2025	32	34	6	72
Jan 2026	41	12	8	61
Feb 2026	29	7	2	38
Mar 2026	44	27	5	76
Apr 2026	51	20	5	76
May 2026	35	33	1	69
Jun 2026	9	4	1	14
Jul 2026	8	7	1	16

Viewed (geographical distribution)

Total article views: 2,091 (including HTML, PDF, and XML) Thereof 2,091 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 30 Jul 2026

Download

Preprint (2754 KB)
Metadata XML

Short summary

Although there are various XCO₂ products, they are all limited by the spatial resolution or spatiotemporal coverage. In this study, the first global 0.05° XCO₂ product (GCXCO₂) for 21 years is generated by combining the OCO-2 satellite observations and models simulations. The dynamic normalization strategy is applied to enhance the temporal expansibility of stacking learning model, and the product is superior than the model simulations showing similar characteristic with OCO-2 observations.


Total:	0
HTML:	0
PDF:	0
XML:	0

A long-term (2000–2020) global 0.05° continuous atmospheric carbon dioxide dataset (GCXCO2) combining OCO-2 observations and model simulations based on stack learning

Supplement

Data sets

Viewed

Viewed (geographical distribution)

A long-term (2000–2020) global 0.05° continuous atmospheric carbon dioxide dataset (GCXCO₂) combining OCO-2 observations and model simulations based on stack learning