the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
A surface ocean pCO2 product with improved representation of interannual variability using a vision transformer-based model
Abstract. The ocean plays a crucial role in regulating the global carbon cycle and mitigating climate change, with the spatial distribution and temporal variations of ocean surface partial pressure of CO2 (spCO2) directly determining the air-sea CO2 flux. However, constructing a global spCO2 data product that is able to resolve interannual and decadal variability remains a challenge due to the spatial sparsity and temporal discontinuity of observational data. This study presents an approach based on the Vision Transformer (ViT) model, combining high-quality observational data from the CO2 Atlas (SOCAT) with multiple advanced global ocean biogeochemical models results to reconstruct a global monthly spCO2 dataset (SJTU-AViT) at 1° resolution from 1982 to 2023. The approach employs the self-attention mechanism of the ViT model to enhance the modeling of the spatial and temporal variations of spCO2, as well as incorporates physical-biogeochemical constraints from the derivative of spCO2 with respect to key controlling factors as additional features. The incorporation of advanced ocean biogeochemical models during the training process allows the ViT-based model to capture more accurate spCO2 variability in these data-sparse regions. Evaluations demonstrate that the new data product effectively captures spCO2 variability at both global and regional scales, showing good consistency with SOCAT observations, long-term ocean station data, and global atmospheric CO2 trends. The reconstructed spCO2 demonstrates strong capability in reproducing spCO2 anomalies during El Niño-Southern Oscillation (ENSO) events, particularly in the eastern Pacific Ocean, where it shows a correlation of 0.81 with the Niño 3.4 index and demonstrates high consistency with cruise data. Based on the SJTU-AViT dataset, the estimated global air-sea CO2 flux patterns are consistent with known regional features such as strong uptake in the Southern Ocean and outgassing in the tropical Pacific. This study not only provide a new 42-year data product for advancing understanding of the ocean carbon cycle and global carbon budget assessments, but also introduces a new Transformer-based deep learning framework for Earth system data reconstruction. The data product is publicly accessible at https://doi.org/10.5281/zenodo.15331978 (Zhang et al., 2025) and will be updated regularly.
- Preprint
(11146 KB) - Metadata XML
-
Supplement
(5987 KB) - BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on essd-2025-286', Anonymous Referee #1, 07 Sep 2025
-
AC1: 'Reply on RC1', Xueying Zhang, 29 Sep 2025
Dear reviewer,
Thanks for reviewing this manuscript. We have carefully revised the manuscript based on the constructive and valuable suggestions from you and another reviewer. The manuscript has been extensively improved during this revision. Please see the attached file for details.
Best,
Xueying Zhang.
-
AC1: 'Reply on RC1', Xueying Zhang, 29 Sep 2025
-
RC2: 'Comment on essd-2025-286', Anonymous Referee #2, 07 Sep 2025
Zhang et al. present a global monthly surface ocean pCO2 dataset (SJTU-AViT) and corresponding air-sea CO2 fluxes spanning 1982-2023 at 1° resolution, developed using a Vision Transformer-based deep learning model. The approach combines SOCAT observation, and observations of climate data with multiple ocean biogeochemical models and incorporates physical-biogeochemical constraints. The authors show that their product successfully captures the spatial and temporal variations of observed pCO2 patterns, from seasonal cycles to interannual variability. The product shows more realistic small-scale spatial variability and temporal interannual variability than previous pCO2 products. The resolved air-sea CO2 fluxes agree with other estimates based on pCO2 observations. The paper is well written, the methodology is robust, and the line of thought is mostly clear to me. I only have minor comments regarding some of the technical details and presentation.
Main comments:
- The description of methodology is overall complete. However, certain technical details are still missing. It is not clear how pre-training on CMIP6 models contributes to the final model. It is not clear what the fine-tuning of MOM6 really does. Are your results sensitive to the choice of CMIP6 models and the fine-tuning? How do SOCAT data fold into your refinement? For the physical-biogeochemical constraints, are you only using what is derived from MOM6, or also from CMIP6 models as well? How are your results, particularly on the seasonal cycle, impacted by these physical-biogeochemical constraints? In other words, if you exclude these constraints, how is the representation of the seasonal pCO2 cycle affected?
- The uncertainty quantification might benefit from more detail. For u_map, what if there are no observations in one grid? How do you then quantify u_map there? Have you conducted an analysis on the spatial heterogeneity of the dominant source of uncertainty? In addition, I think it would be more appropriate to replace u_map with "algorithm uncertainty." Perhaps this can be done by generating a large ensemble of spCO2 Alternatively, this can be done by using synthetic data. You might consider subsampling SOCAT data from one of your models and then applying the ML model to subsampled model fields to generate an spCO2 map. Then you can compare the absolute differences between pCO2 from the ocean model and the ML reconstruction.
Minor comments:
L15-16: The statement that ocean surface partial pressure of spCO2 directly determines the air-sea CO2 flux is not exactly correct. It is the air-sea pCO2 difference, which is modulated by surface wind speed and gas exchange velocity.
Introduction: Perhaps it is also worth mentioning that previous ML-interpolation of pCO2 overly smooths the spatial patterns and interannual variability.
L195: Is the interpolation based on inverse distance weighted average? How do you deal with the fine-resolution time (i.e., not monthly average)?
Figure 3: Systematic biases are clear at Iceland and Irminger, with SJTU-AViT underestimating the pCO2. Any clues why?
Figure 5: The negative bias would lead to an overestimation of global ocean CO2 uptake through the bulk equation. Might be worth mentioning when you talk about the flux.
Fig. 6b: Seems like the bias PDF is wider in certain years. Speculation?
L369-372: The section title is on the seasonal cycle, but the first few sentences focus on variability at all time scales. Might consider moving this to a later section. Also, the trend should be removed beforehand in calculating STD in Fig. 7.
L391-396: A presentation issue. The seasonal changes are, physically, attributed to these factors you mentioned. This is based on our understanding of the ocean carbon dynamics rather than being directly learned from ML output. The sentences read like you confirm these dominant factors from your model output. Might consider making it clear that these are not model results. Or, indeed, you could do factor contribution analysis.
Figure 9: I think what is missing here is to show whether the seasonal phases are consistent compared to SOCAT.
Figure 11: Linearly detrended spCO2?
L568-571: PDO-related SST patterns are used in your training; incorporating other indices (e.g., directly using PDO) would be double counting?
Citation: https://doi.org/10.5194/essd-2025-286-RC2 -
AC2: 'Reply on RC2', Xueying Zhang, 29 Sep 2025
Dear reviewer,
Thanks for reviewing this manuscript. We have carefully revised the manuscript based on the constructive and valuable suggestions from you and another reviewer. The manuscript has been extensively improved during this revision. Please see the attached file for details.
Best,
Xueying Zhang.
Data sets
A surface ocean pCO2 product with improved representation of interannual variability using a vision transformer-based model Xueying Zhang, Enhui Liao, Wenfang Lu, Zelun Wu, Guansuo Wang, and Shiyu Liang https://doi.org/10.5281/zenodo.15331978
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
1,169 | 102 | 29 | 1,300 | 44 | 25 | 36 |
- HTML: 1,169
- PDF: 102
- XML: 29
- Total: 1,300
- Supplement: 44
- BibTeX: 25
- EndNote: 36
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
This manuscript introduces a novel machine learning framework (SJTU-AViT) for reconstructing global sea surface pCO₂ at 1°×1° monthly resolution over the period 1982–2023. By incorporating physical–biogeochemical constraints as derived features, the approach enhances the quality of ocean carbon data reconstruction. The evaluation is comprehensive, covering mean states, seasonal cycles, and interannual variability, and shows strong skill in reproducing ENSO-related signals. This study makes a substantial contribution by providing a valuable new ocean carbon data product for the ocean carbon community and a useful machine learning framework in the field of ocean data reconstruction. The subject is highly relevant to the scope of Earth System Science Data. However, I have several general and specific comments and suggestions that should be addressed before the manuscript can be considered for publication.
General comments
Specific comments: