the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Signal-Domain Guided Deep Learning for Gap-Filling of XCO and XCH4: A Masked Spatio-Temporal Fusion of TROPOMI and GEOS-Chem (2019–2023)
Abstract. Long-term, high-resolution monitoring of carbon monoxide (CO) and methane (CH4) is essential for understanding their spatiotemporal variability and guiding climate mitigation strategies. However, satellite observations like TROPOMI are often incomplete, and existing fusion methods have limitations in accuracy and continuity. This study proposes a signal-domain fusion approach combining 3D discrete cosine transform (DCT) and singular value decomposition (SVD) to integrate TROPOMI data with GEOS-Chem simulations. A lightweight residual U-Net is employed to refine the initial reconstruction by learning the residual field using meteorological drivers and model outputs, guided by a masked loss. The method produces global 0.25° and China-specific 0.05° daily gap-free XCO and XCH4 datasets from 2019 to 2023. The fused results outperform GEOS-Chem and are comparable or superior to TROPOMI, with R² values of 0.92 for XCO and 0.85 for XCH4. Trend analysis reveals regional patterns such as XCO increases in North America and declines in Eastern China, and widespread CH4 growth. High-resolution data captures enhancements during the 2022 Chongqing wildfires, with average increases of 17.1 ppb in XCO and 24.5 ppb in XCH4, and reveals lower XCH4 increases over rice-growing areas compared to TROPOMI, with overestimation reduced by 17–26 %, and stronger XCO reductions, with satellite underestimations up to 38 %. These results highlight agricultural contributions and policy impacts. This approach effectively reconstructs missing observations and enhances the utility of satellite–model data for atmospheric research and emission assessments. The generated daily gap-free datasets are publicly available at https://doi.org/10.5281/zenodo.17936461.
- Preprint
(17187 KB) - Metadata XML
-
Supplement
(7491 KB) - BibTeX
- EndNote
Status: open (until 23 Mar 2026)
- RC1: 'Comment on essd-2025-817', Anonymous Referee #1, 06 Mar 2026 reply
-
RC2: 'Comment on essd-2025-817', Anonymous Referee #2, 08 Mar 2026
reply
General comments:
This manuscript presents a two‑stage framework that combines signal‑domain reconstruction (DCT + SVD) with a residual U‑Net to generate gap‑free global datasets of XCO and XCH4 from TROPOMI and GEOS‑Chem simulations. The topic is relevant for ESSD, and the proposed framework is interesting because it integrates signal processing and deep learning for spatiotemporal data fusion. However, the dataset, particularly the CH4 record, covers only five years (2019–2023), which limits its value for atmospheric and climate research. Moreover, several methodological details and validation aspects require further analysis before the dataset and method can be fully assessed.
Specific comments:
- The manuscript adopts a signal‑domain reconstruction approach using 3‑D DCT followed by SVD truncation. However, the rationale for choosing this specific combination is not sufficiently explained. In particular, the manuscript states that 80% of the singular value energy is retained, but no sensitivity analysis is provided.
- The proposed method contains two key stages:(1) signal‑domain reconstruction using DCT and SVD, and (2) residual correction using a deep neural network. However, the current experiments do not clearly isolate the contribution of each stage. The authors compare several machine learning models, but an explicit ablation study would help clarify the role of each component.
- The model is trained using a masked loss that only considers locations with valid satellite observations. However, the final dataset is expected to fill large areas where satellite observations are missing due to clouds or aerosols. The method is the same as in previous studies, for example: https://doi.org/10.1016/j.atmosres.2025.108411 or https://doi.org/10.1038/s41467-023-43862-3.
- The validation against TCCON uses an averaging radius of ±2°. Since TCCON provides point‑scale measurements while the fused dataset has a grid resolution (0.25°), the spatial representativeness mismatch should be discussed in more detail.
- The manuscript mainly compares the fused results with TROPOMI and GEOS‑Chem simulations. However, other atmospheric composition datasets or assimilation products exist (e.g., CAMS atmospheric composition products or other fused greenhouse‑gas datasets). Including comparisons with at least one existing product would help better position the proposed dataset within the current landscape of atmospheric composition data products.
- In several places the manuscript suggests that the fused dataset “outperforms TROPOMI” This statement should be phrased more carefully. Since satellite observations are direct measurements, the improvement likely reflects reduced random noise or improved spatiotemporal consistency rather than strictly higher accuracy than the observations themselves.
- Finally, can robust CO and CH4 trends be derived from the data without applying gap‑filling?
Technical corrections:
- Figure 1: missing “.”.
- Lines 87-88, Your model is also limited by the accuracy of prior model simulations and the representativeness of assimilated observations (GEOS‑Chem).
- Line 122: avoid redundant definition (e.g., S5P).
Citation: https://doi.org/10.5194/essd-2025-817-RC2 -
RC3: 'Comment on essd-2025-817', Anonymous Referee #3, 09 Mar 2026
reply
The manuscript presents a novel gap-free dataset of daily column-averaged dry-air mole fractions of carbon monoxide (XCO) and methane (XCH₄) for 2019–2023, derived by fusing satellite observations from TROPOMI with GEOS-Chem simulations using signal-domain reconstruction and deep learning refinement.
The topic is timely and potentially valuable. However, substantial clarification and additional information are required. Currently, there are many issues regarding dataset files, reproducibility, validation, and confidence in interpretation and language.
General (major) comments
DatasetI checked the linked repository and evaluated selected files. The NetCDF files lack basic CF-compliant metadata. Time is not defined in physical units, variable attributes are missing, and units are not provided. The dataset in its current form is not self-describing and, in my opinion, does not meet ESSD data standards.
- Data content description states that there is Global 0.25x0.25° data as well as Regional (China) data at 0.05x0.05° resolution. In the repository, I could find only 0.25x0.25°. data.
- The files contain monthly aggregated data, yet there is no time variable, so analyzing a large set of files would be inconvenient.
- There is no CH4 variable in the files.
- The variable named CO should be XCO.
- There are no error/uncertainty/ML_confidence /reconstruction_flag/ observation_fraction and any similar variables. As a user, I would expect at least some of this information in the file. Also, without this information, the dataset risks being interpreted as observation-based while large regions may be model-driven.
Manuscript
- Language. Multiple grammatical and style issues, typos ("satelite"), missing spaces before references, etc. Professional English editing required.
- Lack of the actual data product description. Useful would be a section e.g. Data product summary that covers:
- clear dataset specs: variables, domains, temporal coverage, file formats, QC flags,
- Table of key validation metrics across all TCCON sites,
- User guidance: strengths + limitations,
- File naming conventions and Zenodo structure. - Make it clear what is novel versus previous fusion methods (e.g. Wang et al., 2023 https://doi.org/10.5194/essd-15-3597-2023).
- Reproducibility Gaps - Methods must be sufficiently described to allow reproduction.
- Validation description is insufficient. Contradictory descriptions of TCCON selection criteria (missing rate >0.5 vs. "highest availability" stations). "Manual bias correction" of GEOS-Chem is vaguely described – was this a single global offset, site-specific, applied before or after fusion? No uncertainty quantification: R²/RMSE values given without confidence intervals or cross-validation across years/sites.
- There are a lot of very strong, overconfident conclusions/statements in the manuscript. Please tone them down or provide supporting data.
- I would be very careful about drawing any conclusions on trends calculated over such a short period; I would rather refer to them as short‑term changes. The associated uncertainty and statistical significance should be assessed. One option is to perform a regression on deseasonalized data (e.g., with the annual cycle removed) and report the confidence interval of the trend, and optionally the trend significance using the Mann–Kendall test and Sen’s slope estimator.
- Acronyms should be defined at first use (plus in the abstract) and their expansions should not be repeated later.
- Gap-free sounds great. But does it mean that f.e. oceanic Intertropical Convergence Zone cloudy regions are 90% ML?
Specific comments
- Introduction. I would suggest a shorter general background and a clearer focus on: (a) the current state of satellite/model CO and CH₄ products, (b) identified data gaps, and (c) how your new dataset addresses them.
- line 22. "The fused results outperform GEOS-Chem and are comparable or superior to TROPOMI." Those are very strong conclusions, yet with unclear supporting data (R² values - how calculated?)
- line 63. Acronym, first use, please expand.
- lines 63-74. TROPOMI description appears later again in Sec. 2.1.1 (lines 119–126), with redundant information. I suggest keeping a concise overview in the Introduction, and moving the detailed orbit/spectral descriptions entirely to Sec. 2.1.1 to avoid repetition.
- lines 71-72. You give information that TROPOMI measures "..XCO and XCH4 at the surface". Please correct this sentence (also, the wording of the latter part of this sentence should be improved).
- lines 107-109. Add one explanatory sentence on why this is advantageous, and clarify whether mask information influences extrapolation behaviour in heavily cloudy regions.
- lines 121-123. "Sentinel-5 Precursor" and "equipped with TROPOMI" repeated twice. Please rewrite those sentences.
- Sec. 2.1.1. Specify version numbers and product IDs explicitly.
- line 134. You select QV following Kawka et al. (2021), but that work focuses on NO₂. Can it be applied and why to the CO/CH₄.
- line 141-144. Repetition.
- lines 150-152. What’s the implication of the spatial resolution of GEOS-Chem on the spatial resolution of the output dataset? How was regridding done?
- line 154. Clarify whether XCO/XCH₄ are model native outputs or diagnosed from species, and how column averaging (dry-air) is computed.
- Sec. 2.1.3. Explain the site selection criteria in detail. Please clarify: whether all stations in Tab. 1 are used in global validation, which are used only for examples, and any filtering by cloud cover or data gaps beyond what is stated later in Sec. 3.1.
- Fig. 2. Check spelling (TROPOIMI). The font on the U-Net Prediction Residual part is too small.
- Lines 214–221 and 223–229. Some information is partially repeated from the previous section (e.g., QV).
- Line 228. Mention how sensitive your method is to this choice (e.g., would results change significantly at 0.5° or 0.1°?) or note that you did not explore other resolutions.
- line 248. Eq. 2 Meaning of this eq. depends on which type of multiplication the author intends (asterix used).
- line 257. Those acronyms were explained before.
- line 283. Parameter ε, stated to be "in the middle of the range from 10³ to 10⁻¹", which is ambiguous. Please give explicit values used in production and a brief justification (e.g., grid search examples, sensitivity tests, or literature-based choice). Also specify whether ε is the same for XCO and XCH₄ and for global vs. China domains.
- line 285. Eq. 6 and 7 are just placed there without linking them to adjacent text.
- line 296. "80% singular value energy". Please indicate whether you tested other thresholds (e.g. 90%, 95%) and whether these materially change RMSE/R².
- line 310. "Tables S1 and S2" compare five methods using 2021 data. Please clarify: (i) whether the residual U-Net architecture and hyperparameters were selected based on these 2021 results (risk of tuning to that year); (ii) whether cross-validation across years or sites was performed to avoid overfitting to specific conditions.
- Eq.12. Masked MSE only over M = 1 pixels. Comment briefly on the risk of biased training if M = 1 is spatially clustered (e.g. cloud-free regions).
- line 345. lightweight residual U-Net architecture. Information on hyperparameters such as levels, filters, dropout rate, epochs, etc., should be added. Do you train separate networks for XCO and XCH₄ or a joint multi-output model?
- line 355. Missing details on optimizer, learning rate schedule, batch size, number of epochs, train/validation split, and whether any data augmentation was used.
- line 359. (γ) value? Can his truncation bias extremes?
- lines 382–383. "Only those cases in which the satellite data exhibit a missing rate exceeding 0.5 at the site are retained for comparison", while lines 417–419 mention stations with "highest availability ratios" for a different selection. Please clarify.
- line 370. How are p-values used here (for linear regression slope significance, correlation significance?). What hypothesis tests did you run?
- line 408. The statement "The average bias of the GEOS-Chem simulation data was corrected using a manual correction method…" is vague. Please provide a detailed description.
- line 411. What fraction of the TCCON dataset falls into this category?
- line 436. Please indicate explicitly which metrics this refers to (R², RMSE, μ, σ).
- Fig.8. Which % of grids are pure ML reconstruction? A map of the coverage ratio would be useful.
- line 501-505. The methodology for trend computation (regression model, handling of autocorrelation, treatment of missing values, and significance level) is not detailed. Also - short period.
- line 518. The reported mean increases (17.1 ppb XCO, 24.5 ppb XCH₄) depend critically on the definition of "fire-affected area", the temporal window considered as fire period vs. reference, and whether the baseline is climatological or just adjacent days. Please specify.
- line 524. How is the annual growth rate computed (linear fit to daily values?), and over what exact years? Similarly, for XCO decline rates (line 536), clarify whether the % differences are relative or absolute, and provide confidence intervals or at least an idea of uncertainty.
- line 559. Add evidence.
- line 569. "excellent instrument for monitoring severe weather and pollution occurrences". More neutral language would be better.
- Lack of the limitations section, please highlight user-relevant aspects:
-Potential biases in regions without TCCON or other ground validation.
-Dependence on GEOS-Chem priors in persistent-cloud regions.
-Known artefacts or discontinuities (e.g., around June 2019 TROPOMI pixel size change).
-Discuss ML "hallucination" risk. Residual U-Net loss masked, model learns: what usually happens here, not what happened that day. - line 577. "All created fusion data may be obtained upon request from the authors for academics and policymakers." Different info is given below (line 592).
- line 592-595. Data availability statement. This should be given in a separate section (which is placed below).
- A lot of references have incomplete information (e.g., lines 713, 847, 875).
Citation: https://doi.org/10.5194/essd-2025-817-RC3
Data sets
Signal-Domain Guided Deep Learning for Gap-Filling of XCO and XCH₄: A Masked Spatio-Temporal Fusion of TROPOMI and GEOS-Chem (2019–2023) Zhiwei Li, Yuan Tian, Peize Lin, Bowen Chang, and Jingkai Xue https://doi.org/10.5281/zenodo.17936461
Viewed
| HTML | XML | Total | Supplement | BibTeX | EndNote | |
|---|---|---|---|---|---|---|
| 224 | 146 | 24 | 394 | 74 | 36 | 37 |
- HTML: 224
- PDF: 146
- XML: 24
- Total: 394
- Supplement: 74
- BibTeX: 36
- EndNote: 37
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
This manuscript proposes a signal-domain fusion framework that integrates 3D discrete cosine transform (DCT), singular value decomposition (SVD), and a lightweight residual U-Net to generate gap-free, high-resolution global (0.25°) and China-specific (0.05°) daily XCO and XCH4 datasets for 2019–2023. By combining TROPOMI retrievals with GEOS-Chem simulations and applying a masked loss strategy, the authors aim to reconstruct missing observations while improving spatiotemporal consistency. The reconstructed products are evaluated against TROPOMI and ground-based data, and subsequently used to analyze regional trends and event-scale signals, including the 2022 Chongqing wildfires.
However, the current manuscript requires substantial revision before it can be considered for publication. In particular, the scientific motivation, methodological justification, validation strategy, and interpretation of results need to be strengthened. I do not recommend publication unless the authors carefully address the following points in a substantially revised manuscript.
Major comments:
Specific comments: