the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
A full-coverage daily XCO2 dataset in China from 2015 to 2020 based on DSC-DF-LGB
Abstract. Carbon dioxide (CO2), as a major greenhouse gas, is one of the important causes of global warming. In recent years, the atmospheric CO2 concentration in China has been increasing year by year. Satellite observation is the main means of obtaining atmospheric CO2 concentration. However, the current onboard sensors used for measuring atmospheric CO2 have a narrow observation range and cannot obtain spatiotemporal continuous atmospheric CO2 concentrations. Therefore, this paper proposes a daily full-coverage XCO2 dataset generation method based on the DSC-DF-LGB (Deep Separable Convolutional Neural Network and Deep Forest concatenated with LightGBM) model to obtain the spatiotemporal distribution of atmospheric CO2 in China. The DSC-DF-LGB model was established to train the mapping relationship between OCO-2 XCO2 retrieval and related variables (reanalysis XCO2, vegetation parameters, human factors, elevation, and meteorological parameters). The model was used to generate a daily 0.1° full-coverage XCO2 dataset for China from 2015 to 2020. The cross validation (CV) result indicates that the model has strong performance in estimating XCO2, with R2 and RMSE of 0.9633 and 0.9761 ppm. The TCCON independent site validation result indicates that the estimated XCO2 has high consistency with in-situ measurements, with R2 and RMSE of 0.8786 and 1.5452 ppm. The full-coverage and high-resolution XCO2 dataset can provide data support for research on carbon sources and sinks. The dataset is available at https://zenodo.org/doi/10.5281/zenodo.12696674 (Huang, 2024).
- Preprint
(2344 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 16 Mar 2025)
-
RC1: 'Comment on essd-2024-371', Anonymous Referee #1, 29 Nov 2024
reply
Using satellite observations, reanalysis data, and other auxiliary variables to produce spatiotemporally continuous XCO2 data is meaningful for carbon cycle studies. This paper employs the DSC-DF-LGB model to generate daily XCO2 data for China from 2015 to 2020 based on the OCO-2 XCO2. Cross-validation and independent site validation both demonstrate that the generated dataset is highly accurate. However, the novelty of this paper seems insufficient for the ESSD journal, as there has been a considerable amount of research in this area recently, generating many similar datasets. So, where exactly does this paper contribute? Is it in achieving higher accuracy for the dataset, or is the proposed method significantly different from existing studies? I do not recommend this paper for publication in ESSD, and here are some major concerns:
1. What is the contribution of this paper compared to existing studies and datasets? Machine learning methods have increasingly been used for XCO2 reconstruction, and there is no substantial difference in accuracy. The daily XCO2 data produced here does not generate new insights, nor is it compared with existing datasets. From a scientific perspective, I believe the novelty of this paper is insufficient.
2. Regarding the DSC-DF-LGB method proposed in the paper, the advantages of combining DF and LGB are not well demonstrated. Does it perform better than a single model in terms of accuracy? Additionally, the role and meaning of DSC are unclear. It seems to only be used for feature extraction, but it’s not clear why it needs to be involved in the training process.
3. The results in Section 3.2 are surprising. XCO2 shows an increasing trend year by year, and theoretically, using machine learning to predict past or future XCO2 data will inevitably lead to some over- or under-estimations. However, the degree of error described in the paper seems exaggerated, especially given that the prediction period is only one year away from the model training period. Also, what is the purpose of this section in the paper? Does it imply that the DSC-DF-LGB method lacks generalization capability?
4. The analysis of the results lacks depth. For example, Figure 8 could benefit from more quantitative analysis, rather than just a simple qualitative comparison. In addition, Figure 9 presents the growth of concentrations, which has already been reported in many studies. Does this paper offer any new analysis or findings?
5. Just as I find the novelty of this paper lacking, the authors are also unclear in stating the motivation for their work. The last part of the introduction, which describes the problem or research objectives this paper aims to address, is vague and unclear. I suggest reorganizing this section.
Citation: https://doi.org/10.5194/essd-2024-371-RC1 -
RC2: 'Comment on essd-2024-371', Anonymous Referee #2, 18 Feb 2025
reply
Huang et al. presents a methodology for generating a high-resolution, full-coverage daily XCO2 (column-averaged CO2) dataset for China from 2015 to 2020 using a novel DSC-DF-LGB model. The researchers combined OCO-2 satellite data with various environmental and anthropogenic variables to create the dataset, claiming that they could achieve strong validation results. The resulting dataset reveals important spatiotemporal patterns in China's atmospheric CO2 concentrations. However, as a manuscript presenting a data product, the method section lacks a significant amount of details. The scope of science is not clear and there are fundamental errors. I do not think this manuscript can be published in its current form and major revision is necessary. Here are my main concerns.
Method:
- As a data paper, it should clearly demonstrate the product's generation process. The method description section (2.2.2) lacks crucial details, making it difficult for readers to understand the methodology.
- The authors acknowledge poor temporal generalization of their method, which raises fundamental concerns about its reliability. I suggest validating the method by randomly selecting one year as a validation set while using the remaining years for training.
- The use of CAMS XCO2 as a predictive variable raises concerns about model dependency. The authors should demonstrate the relative importance of each predictive variable in reconstructing satellite XCO2.
- The manuscript lacks comparison with existing machine learning approaches for high-resolution XCO2 retrieval, making it unclear what improvements their method offers.
- The comparison between TCCON site measurements and model reconstruction is problematic. While comparing XCO2_TCCON with XCO2_OCO2 measures satellite retrieval error, and comparing XCO2_reconstructed with XCO2_OCO2 measures algorithm error, comparing XCO2_TCCON with XCO2_reconstructed lacks clear scientific justification.
Scientific scope:
- While reading the paper, I think none of the authors are familiar with carbon cycle science in general. There are fundamental errors in presenting the science.
- L265-266: I think carbon uptake by boreal forests plays an even more prominent role.
- L297: XCO2 is influenced by emissions (not only from fossil fuels, but also from land), but is also largely mitigated by atmospheric mixing. Claiming that high XCO2 reflects increased human activities without specifying the timescale is fundamentally wrong. For example, while a single retrieval might correlate with an emission spike, the annual average XCO2 might not correlate with emissions at all. Additionally, this apparent dependency could be a result of using emissions as a predictive variable.
- Figure 9: The comments on the growth rate trend are fundamentally wrong. The 2016 anomaly is largely a result of the strong 2015-2016 ENSO event that led to larger CO2 outgassing from low-latitude lands. It is expected to have a larger growth rate in 2016 compared to the following few years. The interannual variability of CO2 growth rate is not a good metric for emission reduction due to substantial natural variability. From a purely statistical perspective, you cannot draw this conclusion based on only 5 years of data, especially starting with a strong ENSO year. If you remove the first point, you don't see a decline; if you remove the first two points, you actually see an increasing growth rate. Furthermore, have you compared your trend estimate with observations from the large array of surface stations (e.g., NOAA boundary layer average)?
I suggest the authors have a carbon cycle scientist read their draft and provide feedback on the scientific merit. For example, they should address why high-resolution XCO2 is needed, and why specifically over China.
- The authors claim they generate daily XCO2 over China. However, I do not see any validation of the daily reconstruction performance. Does this method really capture synoptic-scale variability? Does the daily reconstruction make sense? What is the importance of having XCO2 at a daily timescale?
Writing:
Some sentences read awkwardly. I suggest having the manuscript proofread by native speakers.
Citation: https://doi.org/10.5194/essd-2024-371-RC2
Data sets
Full-coverage daily 0.1° XCO2 in China Xinfeng Huang https://zenodo.org/doi/10.5281/zenodo.12696674
Model code and software
Full-coverage daily 0.1° XCO2 in China Xinfeng Huang https://zenodo.org/doi/10.5281/zenodo.12696674
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
354 | 83 | 39 | 476 | 10 | 15 |
- HTML: 354
- PDF: 83
- XML: 39
- Total: 476
- BibTeX: 10
- EndNote: 15
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1