A surface ocean pCO2 product with improved representation of interannual variability using a vision transformer-based model
Abstract. The ocean plays a crucial role in regulating the global carbon cycle and mitigating climate change, with the spatial distribution and temporal variations of ocean surface partial pressure of CO2 (spCO2) directly determining the air-sea CO2 flux. However, constructing a global spCO2 data product that is able to resolve interannual and decadal variability remains a challenge due to the spatial sparsity and temporal discontinuity of observational data. This study presents an approach based on the Vision Transformer (ViT) model, combining high-quality observational data from the CO2 Atlas (SOCAT) with multiple advanced global ocean biogeochemical models results to reconstruct a global monthly spCO2 dataset (SJTU-AViT) at 1° resolution from 1982 to 2023. The approach employs the self-attention mechanism of the ViT model to enhance the modeling of the spatial and temporal variations of spCO2, as well as incorporates physical-biogeochemical constraints from the derivative of spCO2 with respect to key controlling factors as additional features. The incorporation of advanced ocean biogeochemical models during the training process allows the ViT-based model to capture more accurate spCO2 variability in these data-sparse regions. Evaluations demonstrate that the new data product effectively captures spCO2 variability at both global and regional scales, showing good consistency with SOCAT observations, long-term ocean station data, and global atmospheric CO2 trends. The reconstructed spCO2 demonstrates strong capability in reproducing spCO2 anomalies during El Niño-Southern Oscillation (ENSO) events, particularly in the eastern Pacific Ocean, where it shows a correlation of 0.81 with the Niño 3.4 index and demonstrates high consistency with cruise data. Based on the SJTU-AViT dataset, the estimated global air-sea CO2 flux patterns are consistent with known regional features such as strong uptake in the Southern Ocean and outgassing in the tropical Pacific. This study not only provide a new 42-year data product for advancing understanding of the ocean carbon cycle and global carbon budget assessments, but also introduces a new Transformer-based deep learning framework for Earth system data reconstruction. The data product is publicly accessible at https://doi.org/10.5281/zenodo.15331978 (Zhang et al., 2025) and will be updated regularly.