A long-term (2000–2020) global 0.05° continuous atmospheric carbon dioxide dataset (GCXCO2) combining OCO-2 observations and model simulations based on stack learning
Abstract. High-accuracy atmospheric (carbon dioxide) CO2 concentration data are critical in understanding the global carbon cycle, but there is still a lack of a high-resolution CO2 product with long-term and global seamless coverage. In this study, a global continuous 8-day XCO2 (column-averaged CO2 dry air mole fraction) product (GCXCO2) was reconstructed at a spatial resolution of 0.05° from 2000 to 2020, based on OCO-2 satellite data. An ensemble machine learning stacking regression model, which combines light gradient boosting machine (LGBM), extreme gradient boosting (XGB), extremely randomized trees (ETR), gradient boosting regression (GBR), and random forest (RF), was utilized to model the relationships between XCO2 data and auxiliary satellite, simulation data, and meteorological data. A dynamic normalization strategy was developed to handle the great temporal variation issue and ensure the temporal expansion of the prediction model. Multiple validation methods were applied to comprehensively evaluate the spatial and temporal generalization ability of the model and product. The 10-fold cross-validation shows an overall satisfactory result at a global scale, with R2 = 0.974 and root-mean-square error (RMSE) = 0.551 ppm (parts per million). Further spatial extension and temporal prediction experiments also proved that dependable results could be obtained in the regions and time periods without valid OCO-2 satellite observations (R2 = 0.958 and R2 = 0.886, respectively). Compared with Total Carbon Column Observing Network (TCCON) ground station observations, the GCXCO2 product performs better than the model simulation data, demonstrating a better accuracy and a higher spatial resolution. Based on the GCXCO2 product, an upward annual trend of approximately 2.09 ppm/year can be found for global XCO2 between 2000 and 2020, and significant differences are found between the Northern and Southern hemispheres in different seasons. This product may well be the first remote sensing-based global high-precision long-term XCO2 dataset, which will help advance the understanding of climate change and carbon balance. The dataset can be obtained freely at https://doi.org/10.5281/zenodo.10083102 (Guan and Sun, 2023).