Articles | Volume 17, issue 12
https://doi.org/10.5194/essd-17-7169-2025
https://doi.org/10.5194/essd-17-7169-2025
Data description paper
 | 
15 Dec 2025
Data description paper |  | 15 Dec 2025

Climatological fields of Southern Ocean interior carbonate system parameters and anthropogenic CO2 reconstructed and integrated from float- and ship-based observations

Wanqin Zhong, Xin Ma, Yingxu Wu, Chenglong Li, Tianqi Shi, Wei Gong, and Di Qi
Abstract

The Southern Ocean plays a crucial role in regulating atmospheric carbon dioxide (CO2) concentrations and modulating the global oceanic carbon cycle, thereby substantially mitigating the effects of anthropogenic climate change. However, due to the region's challenging environment and sparse observational coverage, large uncertainties remain regarding the magnitude and mechanisms of carbon uptake in the Southern Ocean. In recent decades, the deployment of Argo float arrays has facilitated autonomous and continuous profiling of hydrographic and biogeochemical properties from the surface to depths of up to 6000 m, complementing traditional ship-based observations. Nevertheless, high-resolution, integrated datasets that combine ship-based and Argo-derived observations remain rare, partly due to the challenges of data harmonization, quality control, and uncertainty estimation, as well as the indirect nature of carbonate system parameter retrievals from Argo measurements. Here, we present a comprehensive, quality-controlled reconstruction of key carbonate system parameters in the Southern Ocean interior – including total alkalinity (TA), dissolved inorganic carbon (DIC), pH (total scale), nitrate (NO3), phosphate (PO4), silicate (SiO4), anthropogenic carbon (Cant), and aragonite saturation (Ωar) – by leveraging machine learning techniques and integrating all available Argo float profiles with ship-based survey data. The resulting datasets are gridded at 1°×1° horizontal resolution and 84 vertical pressure levels (0–5600 m), and are provided as distinct climatological products: the Float Grid (using all Argo float profiles) and the All-Data Grid (integrating all available Argo and ship-based observations). The Float Grid is further separated into the Non-O2-Float Grid (limited to Core Argo floats) and O2-Float Grid (limited to oxygen-measured Biogeochemical Argo floats). Each gridded product is accompanied by uncertainty estimates. The climatological products cover nearly the whole Sothern Ocean based on direct measurements instead of applying interpolating mapping methods, thereby providing a more robust result. Model performance is assessed through cross-comparison of Argo and shipboard measurements. The gridded products, collectively termed SOCOML (Southern Ocean CO2 Machine Learning products, https://doi.org/10.17632/xzr59ngmpz.2, Zhong et al., 2025a; https://doi.org/10.25921/8c29-rv75, Zhong et al., 2025b), are freely available for downloaded and are expected to support future studies of Southern Ocean carbon cycle.

Share
1 Introduction

The Southern Ocean (south of 30° S) plays a pivotal role in the global carbon cycle by facilitating anthropogenic carbon uptake from the atmosphere and transporting to the ocean interior (Morrison et al., 2022), thereby modulating CO2 concentrations from past climates to the present and into the future (Hauck et al., 2023). Since industrialization, rising atmospheric CO2 concentration has been the primarily driver of the strengthening ocean carbon sink, with the Southern Ocean accounting for around one-quarter of the anthropogenic carbon (Cant) uptake (Gruber et al., 2019b). Oceanic carbon uptake is fundamentally constrained by the amount of carbon in the upper ocean and by the rate at which Cant, in the form of dissolved inorganic carbon (DIC), is transported into the ocean interior (Bopp et al., 2015). The large-scale upwelling limb of the meridional overturning circulation (MOC) in the mid-latitude Southern Ocean enables the uptake of excess carbon and its subsequent transport northward into the upper ocean (Marshall and Speer, 2012; Pellichero et al., 2018) or southward to fill the global abyssal carbon reservoir (Rios et al., 2012; Pardo et al., 2017; Mahieu et al., 2020; Zhang et al., 2023). Both carbon transport pathways in mid-latitude and high-latitude Southern Ocean are interconnected via the global thermohaline circulation, contributing to the removal of anthropogenic carbon from the surface ocean.

The continuous uptake of Cant by the ocean leads to declines in seawater pH and calcium carbonate (CaCO3) saturation, collectively referred to as ocean acidification (OA) (Doney et al., 2009). In the Southern Ocean, substantial CO2 uptake causes buffering capacity and aragonite saturation states (Ωar) to decline faster than the global average (Orr et al., 2005; Petrou et al., 2019). Recent multidecadal studies found reinvigoration of carbon sink since 2000s (Landschützer et al., 2015; Zemskova et al., 2022) and pronounced acidification particularly in the Antarctic Zone (Bednaršek et al., 2012; Xue et al., 2018). To quantitatively assess and understand the underlying feedback mechanism involved in carbon uptake and storage, sustained high-quality oceanic measurements across timescales and the entire Southern Ocean are highly needed. Key oceanic interior variables of the carbonate system – total alkalinity (TA), dissolved inorganic carbon (DIC), and pH – each has strengths for explaining climate change process. For example, increasing DIC from Cant storage leads to pH reduction, while TA reflects the ocean's capacity to buffer pH changes (Orr et al., 2005). Moreover, measuring nutrient concentration (nitrate, phosphate and silicate) is also associated to the oceanic biogeochemical process (e.g., involved in the calculation of seawater carbonate chemistry and Cant, Gruber et al., 1996; Sharp et al., 2023). Therefore, a comprehensive dataset that combines TA, DIC, pH, and nutrients offers detailed insights into the variability of ocean carbon sink (characterized by Cant), the progression of OA (characterized by Ωar), and its potential impacts on marine ecosystems (Doney et al., 2020; Gruber et al., 2019a; Kroeker et al., 2013; Sabine et al., 2004).

Despite its importance in the global carbon cycle, the vast and remote nature of the Southern Ocean severely limits observational coverage, especially with regard to biogeochemical variables. Two major databases compile shipboard measurements: the Surface Ocean CO2 Atlas (SOCAT, Bakker et al., 2016) provides a quality-controlled dataset of the CO2 fugacity for the global surface ocean and coastal seas, while the Global Ocean Data Analysis Project version 2 (GLODAPv2, Olsen et al., 2016) offers quality-controlled data as well as climatological products (Lauvset et al., 2016) from the surface into the ocean interior, including TA and DIC. However, the scarcity of shipboard measurements, particularly during austral winter, leads to large uncertainty in evaluating the Southern Ocean carbon sink (Friedlingstein et al., 2025; Hauck et al., 2023; Lo Monaco et al., 2005). Measurements of more difficult-to-observe variables, such as TA, DIC and pH, are particularly scarce, comprising only about half of the data available for other variables in GLODAPv2 database (Fig. 1d).

https://essd.copernicus.org/articles/17/7169/2025/essd-17-7169-2025-f01

Figure 1Spatial and temporal coverage of the measurements in 7607 stations (dots) from the GLODAPv2.2023 database (Lauvset et al., 2024) and 4603 Argo profiling floats (lines). (a) Geographic distribution of GLODAP, Argo (CTD), and Argo (CTD+O2) which is categorized into three types according to the maximum pressure of observation. (b, c) Number of profiling measurement covered by shipboard (GLODAP) and float-based (Argo) observations for the entire period since 1972 per 1°×1° bin. (c) The parameter types, seasonal, and latitudinal distribution of profiles from all three dataset (grey for GLODAP, blue for Argo only with CTD, and red for Argo with CTD and oxygen sensor). The number of GLODAP profiles in (e, f) has been multiplied by a factor of 3 for visibility.

Novel observations recently collected by profiling floats, as part of the Argo program, have revolutionized the ability to monitor the Southern Ocean since the 2000s (Riser et al., 2016; Silvano et al., 2023). These autonomous floats, including Core-Argo and Biogeochemical Argo (BGC-Argo), measure seawater properties (temperature, salinity, and pressure) and optional biogeochemical variables (oxygen, pH, and nitrate, normally for BGC-Argo) between the surface and depths of 2000 m, with Deep-Argo floats reaching depths of up to 6000 m. The rapid increase in BGC-Argo floats has significantly expanded the amount of carbonate system data and thus revealed spatial and temporal variability with depth globally or regionally (Williams et al., 2017; Wu et al., 2022; Wu and Qi, 2023). Despite transformative potential, BGC-Argo floats currently constitute less than a quarter of all Argo floats in the Southern Ocean (Fig. 1). This limited coverage highlights the need to develop robust methods for deriving carbonate system parameters from all Argo observations, which would greatly improve and supplement current observation-based CO2 datasets and support more comprehensive monitoring of ocean carbon dynamics.

Multiple efforts have focused on retrieving carbonate chemistry variables by utilizing the strong regional correlations among seawater properties and by estimating carbonate chemistry variables using combinations of more readily available variables, such as temperature, salinity, and dissolved oxygen. This approach is effective because oceanographic processes influence the distributions of many seawater properties in similar ways, allowing algorithms to be trained to reproduce carbonate system parameters from co-located measurements of other seawater properties (Carter et al., 2021b). Among the primary methods, multilinear regression (MLR) and neural networks (NN) are widely used to estimate various seawater properties, including nutrients and carbonate chemistry variables. MLR models, such as LIAR (Locally Interpolated Alkalinity Regression, Carter et al., 2021a, 2016), are straightforward and interpretable but are limited to capturing linear relationships. In contrast, neural network approaches, like Bayesian neural network (BNN)-based method (CANYON-B, Bittig et al., 2018; Sauzède et al., 2017) can model more complex, nonlinear patterns and often provide higher accuracy. Building on MLR and NN methods, the ESPER_LIR and ESPER_NN routines were recently introduced to further expand predictive capabilities. For instance, Asselot et al. (2024) applied the ESPER_NN method to reconstruct Cant from Argo data, demonstrating the combination of Argo float observations with machine-learning approaches offers new perspectives and robust insights into the storage and transport of Cant in the interior ocean.

In this study, we leverage ESPER_NN model, integrating the high-accuracy GLODAPv2 database with profiling measurements of fine spatiotemporal resolution, to generate a comprehensive carbonate system dataset throughout the interior Southern Ocean that extends from the surface to the deep ocean (5600 m). TA, DIC, pH (total scale), nitrate (NO3), phosphate (PO4), silicate (SiO4), and O2 (when direct measurements are unavailable) are obtained through neural networks, while Ωar is computed using CO2SYS based on reconstructed TA and DIC. And Cant is estimated using TrOCA method (Zhang et al., 2023, and Metzl et al., 2024, doing the same with TrOCA in Southern Ocean). We refer to the data products as Southern Ocean CO2 Machine Learning products (SOCOML). The rest of the paper describes the data and methodology used in the estimation of dataset for ocean carbon research. This is followed by the assessment and climatological variability of the dataset. Last, we discuss the uncertainty estimation process and potential influence.

2 Data used and reprocessing

2.1 GLODAP

The Global Ocean Data Analysis Project (version GLODAPv2.2023), a bias-corrected observational ocean biogeochemical dataset, serves as the ship-based observational data source for this study. Data from GLODAPv2.2023 collected south of 30° S are selected, including concurrent measurements of hydrographic properties, nutrients (NO3, PO4 and SiO4), and carbonate system parameters (TA, DIC, and pH), as detailed in Table 1. Before reprocessing the data, two cruises (Expocode: 316N19871123 and 318M19771204) were excluded due to noisy at depth or large quality control (QC) adjustments, as reported by Carter et al. (2021b). Subsequently, the remaining 160 cruises undergo secondary QC and adjustment check. Measurements flagged as poor quality includes TA and DIC values with adjustments exceeding ±10µmol kg−1, pH adjustments greater than ±0.015 pH units, and nutrient data with multiplicative adjustments surpassing 10 % (Carter et al., 2018; Olsen et al., 2016). The precise adjustment values are documented in the GLODAPv2 Adjustment Table, accessible at https://glodapv2.geomar.de/ (last access: 10 April 2025). Following this QC step, five cruises are excluded for TA, three cruises (1112 measurements) for DIC, one cruise (1474 measurements) for PO4, and one cruise (940 measurements) for SiO4 (see detailed exclusions in Table A1). Although no significant offsets were identified in pH measurements, one cruise (Expocode: 49HG19950414) are excluded as noted by Carter et al. (2018). Importantly, quality control is performed independently for each variable. Subsequently, TA and DIC measurements are retained only when nutrient observations are available, following Carter et al. (2021b). The pH data in GLODAPv2 comprise a mixture of spectrophotometric- and potentiometric-derived measurements. To ensure data consistency, pH measurements are homogenized to align with pH calculated from TA and DIC, following Carter et al. (2018). Classification of pH data are conducted based on documentation available from https://cchdo.ucsd.edu/ (last access: 10 April 2025), as shown in Table A2 and Fig. A1.

Table 1Numbers of shipboard GLODAPv2 measurements and Argo float profiles for each variable in the Southern Ocean used in this study. The assessment dataset of GLODAPv2 data product used for mode-performance comparisons contains cruises added after the GLODAPv2.2020 release – specifically, those with cruise identifiers 2107.

Download Print Version | Download XLSX

The ESPER_LIR and ESPER_NN model were trained using data from GLODAPv2.2020, whereas the CANYON-B model utilized the original GLODAPv2 release. Assessment dataset for model performance comparison is identified from the GLODAPv2.2023, consisting of cruises added subsequent to the GLODAPv2.2020 release (i.e., cruise numbers 2107). Initial comparative analysis among CANYON-B, ESPER_NN, and ESPER_LIR models is conducted using this assessment data.

2.2 Argo data preparation and description

The Argo float data were download from the Argo Data Assembly Canters (GDACs; ftp://ftp.ifremer.fr/ifremer/argo/dac/, last access: 23 February 2025) and processed using adapted code from the SAGEO2 toolbox. This dataset comprises three types of Argo floats (Core Argo, BGC-Argo, and Deep Argo) for reconstructing carbonate system parameters and nutrients using models. Since 2000, the Core Argo network has provided high-resolution temperature and salinity profiles with broad coverage (0–2000 dbar at 10 d intervals), forming the foundation for extensive studies of oceanographic processes. Building upon this framework, the BGC-Argo extends observational capabilities by employing biogeochemical sensors to measure oxygen, pH, and nitrate. To address ongoing uncertainties regarding deep ocean, the recent deployment of Deep Argo floats enables data collection down to 6000 dbar in targeted Southern Ocean basins, providing unprecedented insights into carbon dynamics in abyssal waters.

Rigorous quality control leads to the exclusion of three categories of problematic data: (1) floats on the Argo Program's grey list identified for sensor drift or transmission errors; (2) floats with 10 or fewer operational cycles, due to insufficient calibration stability; and (3) aberrant profiles with incomplete measurements. Additionally, only adjusted data flagged as “Good” or “Probably Good” (QC flags 1 and 2, respectively) were included. The remaining floats were systematically classified based on the presence or absence of oxygen data. This classification yielded two distinct float categories, underpinning our dual-pathway analytical approach and ensuring robust estimation across diverse observational regimes. Overall, this study includes data from 4346 Argo floats, of which 525 are equipped with oxygen sensor providing 73 296 profiles, and the remaining 3821 floats without oxygen sensor providing 647 650 profiles (Table 1).

There are substantial spatial sampling gaps in the high-quality GLODAP data, particularly in the high-latitude Southern Ocean (Fig. 1). Furthermore, Fig. 1 reveals a pronounced seasonal bias toward the austral summer, with nearly four times as many measurements collected during this period compared to winter. In contrast, Argo floats provide extensive spatiotemporal coverage, owing to their flexible deployment and consistent 10 d sampling cycles. Although the number of Argo floats equipped with oxygen sensors have increased greatly in recent decades (Fig. 1f), the Argo observational network is still predominantly composed of Core-Argo floats without oxygen sensors, which constitute over 85 % of the dataset and achieve nearly complete spatial coverage across the Southern Ocean (Fig. 1a and c). The broad coverage offers an unprecedented foundation for reconstructing carbon system dynamics in the region. However, because of current limitations in data quality and correction methods (Maurer et al., 2021; Williams et al., 2017), nitrate and pH measurements from BGC-Argo floats are not used in this study; only temperature, salinity, and O2 are employed. Ongoing improvements in quality control and correction procedures may enable the incorporation of these measurements in future studies.

3 Methodology

3.1 Reconstruction of carbonate system parameters and nutrients

This study employs a dynamically adaptive framework to reconstruct carbonate system parameters and nutrients by integrating heterogeneous Argo float observations with high-quality GLODAP measurements. Figure 2 illustrates the overall workflow for generating gridded products in the Southern Ocean. Based on performance comparisons (see Sect. 4.1 for detail), the best-performing model is applied to reconstruct key biogeochemical tracers (oxygen, nitrate, phosphate, silicate) as well as carbonate system parameters (TA, DIC, pH). To accommodate differences in observational capabilities among Argo floats, particularly regarding the presence or absence of oxygen sensors, input observations are dynamically sorted into two reconstruction pathways:

  1. Full-parameter pathway (green in Fig. 2): This pathway utilizes all available measured variables, including hydrographic properties, and dissolved oxygen concentrations from floats equipped with oxygen sensors.

  2. Hydrography-only pathway (blue in Fig. 2): This pathway reconstructs targeted variables and oxygen concentration based solely on CTD measurements (salinity, temperature, depth) from floats lacking oxygen sensors.

https://essd.copernicus.org/articles/17/7169/2025/essd-17-7169-2025-f02

Figure 2Overall workflow for generating interior carbonate system gridded products in Southern Ocean. The top panel shows data inputs: Argo with CTD only (in blue), Argo with CTD and oxygen sensor (in green), and GLODAP (in purple). All data undergo quality control procedures. The workflow comprises three main steps: the top panel depicts model selection (CANYON-B, ESPER_LIR, and ESPER_NN); the middle panel illustrates the use of ESPER_NN to predict carbonate system parameters with or without oxygen data; the bottom panel shows the integration of Argo/GLODAP data into gridded products with derived anthropogenic CO2 and aragonite saturation data (see Sect. 3.2 for detailed calculations).

The resulting dataset includes both reconstructed variables (derived indirectly from Argo profiles) and direct high-quality ship-based observations.

3.2 Estimation of anthropogenic carbon (Cant) and aragonite saturation state (Ωar)

Typical methods for calculating Cant from DIC measurements include the ΔC* method (Gruber et al., 1996, Gruber, 1998; Sabine et al., 2004), the extended multiple linear regression (eMLR) method (Gruber et al., 2019b), and the Tracer combining Oxygen, inorganic Carbon, and total Alkalinity (TrOCA) method (Touratier et al., 2007; Touratier and Goyet, 2004) has been widely applied. Although the ΔC* method has been widely used to quantify Cant inventory, it relies on parameter customization through Optimum Multiparameter (OMP) analysis, which requires direct nutrient measurements of NO3, PO4, and SiO4 – data not available in our dataset. The eMLR method relies on repeat hydrographic measurements (Friis et al., 2005), which are not available for Argo profiles, and its output reflects temporal change in Cant rather than absolute concentration. For these reasons, this study employs the TrOCA method, which is relatively straightforward, extensively utilized in Southern Ocean studies (Metzl et al., 2024; Zhang et al., 2023), and has been demonstrated to be reliable through comparative analyses (Lo Monaco et al., 2005; Vázquez-Rodríguez et al., 2009; Mahieu et al., 2020; Zhang et al., 2023).

(1)TrOCA=O2+aDIC-12TA(2)Cant=TrOCA-TrOCA0a=O2+1.279DIC-12TA-e7.511-(1.087×10-2)θ-7.81×105TA21.279

where θ is potential temperature, °C; O2 is dissolved oxygen, µmol kg−1; DIC is total inorganic carbon, µmol kg−1; TA is total alkalinity, µmol kg−1. In the full-parameter pathway, calculated values utilized observed oxygen concentrations alongside model-derived TA and DIC estimates. Conversely, the hydrography-only pathway employed model-derived estimates for all three variables (oxygen, TA, and DIC). Finally, the Cant values are scaled to the reference year 2013 to deal with exponential increase of anthropogenic CO2 burden in the climatological products (see Sect. 3.3) (Carter et al., 2021b; Tanhua et al., 2007). A detailed description of the scaling method is given in the Appendix B1. It should be noted that the TrOCA approach is limited to waters below the euphotic layer (Lo Monaco et al., 2005), therefore the Cant estimates above 100 m are excluded from this dataset.

The aragonite saturation state (Ωar) is calculated using the CO2SYS software (v3, Sharp et al., 2023), requiring TA, DIC, temperature, salinity, and pressure as inputs to minimize uncertainty in the results (Orr et al., 2018). The following thermodynamic parameterizations are employed: carbonic acid dissociation constants from Lueker et al., 2000, hydrogen fluoride (HF) dissociation constants from Perez and Fraga, 1987, the ratio of total boron (BT) to practical salinity (Sp) from Lee et al. (2010), and bisulfate dissociation constants (KHSO4) from Dickson et al. (1990).

3.3 Construction of gridded products

Profile data for each parameter are sorted into spatial bins of 1° longitude × 1° latitude bins and 84 vertical levels to generate homogenized three-dimensional gridded products. Data derived from both float- and ship-based observations are integrated into this spatial framework, ensuring robust spatial and depth coverage. To maximize data density, we construct an “All-Data Grid” by merging all available reconstructions and observations. In addition, three specialized gridded products are generated: the “Float Grid”, comprising only float-based reconstructions; the “Non-O2-Float Grid”, limited to floats without oxygen measurements; and the “O2-Float Grid”, limited to BGC-Argo floats. The latter two grids facilitate sensitivity analyses of oxygen's influence on carbonate system parameter reconstructions. All these gridded datasets serve as the basis for subsequent analyses.

Figure 3 demonstrates the vertical sampling spacing of CTD and dissolved oxygen from Argo floats. Typically, floats sample at intervals of 10 m or finer from the surface down to 200 m and at intervals of 50 m or finer between 500 to 2000 m. Floats equipped with oxygen sensors sample dissolved oxygen at higher resolution. Measured CTD profiles are prioritized, but interpolated profiles are used when concurrent oxygen data are unavailable. To align with the float sampling scheme and maximize data utilization, the water column (0–5600 m) is divided into 84 vertical depth levels (highlighted in yellow in Fig. 3): 0–100 m at 5 m intervals (20 levels), 100–500 m at 25 m intervals (15 levels), 500–2000 m at 50 m intervals (30 levels), and 2000–5600 m at 200 m intervals (19 levels). The deepest level (5600 m) corresponds to the maximum float measurement depth.

https://essd.copernicus.org/articles/17/7169/2025/essd-17-7169-2025-f03

Figure 3(a) Sampling spacing of all Argo floats from sea level to 5600 m. The color of the scattered points shows the frequency. The yellow line illustrates the pressure level used in this study to match sampling spacing and maximum utilize available profile data. (b) Histogram of number of profiles per 1°×1° bin. (c) Histogram of initial year (in blue) and final year (in red) per 1°×1° bin.

Download

Each Argo float profile is interpolated to these predefined vertical levels using the Piecewise Cubic Hermite Interpolating Polynomial routine (“intprofile.m” in the 2nd QC toolbox, Lauvset and Tanhua, 2015). After interpolation and prior to gridding, extreme outliers are identified and removed. For each pressure level and Longhurst Biogeographical Province (available at http://comlmaps.org/how-to/layers-and-resources/boundaries/longhurst-biogeographical-provinces/, last access: 26 May 2025), the interquartile range (IQR) is calculated for each parameter, and values exceeding 1.5×IQR are flagged as outliers (Johnson and Purkey, 2024). When an outlier is detected for any parameter at a given vertical level within a bin, all variables from that profile and level are discarded. Following Gruber et al. (1998), negative Cant estimates are preserved as negative in the averaging process. Finally, for each grid cell, all valid measurements are averaged to obtain representative values, while cells without observations are left empty. This bin-averaging approach ensures that the gridded products are entirely observation-based and preserve the genuine spatial structure of the compiled dataset.

3.4 Comparative analysis between ship-based observations and float-based reconstructions

To ensure consistency and reliability between float-based reconstructions (including both full-parameter and hydrography-only pathways) and ship-based observations, three comparative analysis of reconstruction-derived variables (Cant and Ωar) are implemented.

First, methodological discrepancies between variables from each pathway are assessed using GLODAPv2 data. Specifically, values of Cant and Ωar calculated directly from ship-based observations are compared with those estimated from the full-parameter pathway (Cant_ship_f, Ωar_ship_f) and the hydrography-only pathway (Cant_ship_h, Ωar_ship_h), respectively. This analysis quantifies the biases inherent in each reconstruction pathway and is detailed in Sect. 4.2. Second, within regions exhibiting spatial overlap between float-derived reconstructions and independent ship-based estimates, comparisons of float-based Cant and Ωar from each pathway are performed following established cross-over quality control procedure (results shown in Sect. 4.2). Comparisons are restricted to cases where differences are within ±0.005kg m−3 in potential density (σθ), ±0.005 in neutral density (τ), and ±100 dbar between 1400 and 2100 dbar depth range (Bushinsky et al., 2025). This targeted analysis serves to assess potential discrepancies arising from differences between oxygen-equipped floats and floats without oxygen sensors. These two analysis are essential to evaluate potential biases introduced by differences between oxygen-equipped and CTD-only Argo floats, which is particularly important as oxygen-equipped floats comprise approximately 11 % of total Argo float deployments in the Southern Ocean – potentially leading to disproportionate representation and biases in gridded products.

Finally, a detailed zonal analysis among the gridded products is conducted to evaluate how observational differences impact the spatial consistency and reliability of the final products (results show in Sect. 4.4).

3.5 Uncertainty assessment

3.5.1 Uncertainty of oceanic interior carbonate system parameters

Uncertainty assessment in this study is designed to comprehensively quantify error propagation throughout the reconstruction process. The uncertainties of the reconstructed variables – including TA, DIC, pH, and nutrients – are evaluated by considering both the instrument measurement accuracy of Argo sensors and the model-based reconstruction uncertainty. Additionally, estimated Cant and Ωar (see Sect. 3.2) accounts for uncertainty propagation in the calculation (Fig. 4).

https://essd.copernicus.org/articles/17/7169/2025/essd-17-7169-2025-f04

Figure 4Error propagation that may arise during the calculation process of reconstructed variables and calculated values (Cant and Ωar).

Download

Instrument measurement errors reflect the inherent limitations of sensors and are quantified either from the specified precision of the instruments or by comparing Argo measurements against independent, high-quality reference data (GLODAP). These measurement errors establish the baseline uncertainty that propagates through subsequent steps.

Model uncertainties are evaluated for each reconstruction method. Different models employ distinct approaches to uncertainty estimation. For example, CANYON-B model expresses neural network weights as probability distributions, providing probabilistic predictions that incorporate model weight uncertainty. In contrast, ESPER_LIR and ESPER_NN report uncertainties based on the root mean square error (RMSE) of their validation dataset and through interpolation across depth and salinity space.

Additionally, uncertainties in the calculations of Cant and Ωar are addressed via Monte Carlo simulation (following the principle described in Qi et al., 2022). Both measurement and model-derived uncertainties are propagated through the TrOCA and CO2SYS calculation steps by repeatedly sampling input variables within their respective uncertainty bounds. This generates distributions of Cant and Ωar, and the standard deviations of these distribution are taken as the estimate of propagated uncertainty.

3.5.2 Uncertainty of gridded products

The uncertainty estimation for gridded products derived from float-based (and optionally, ship-based) observations consists of two main components: parameter profile sensitivity and spatial spread uncertainty, which together determine the total uncertainty at each grid cell.

At each pressure level, parameter profile sensitivity (σparam_prof) is assessed by iteratively perturbing reconstructed variables according to their parameter-specific uncertainties, which is accompanied by the construction of the gridded products as described in Sect. 3.3. For float-based reconstructions, uncertainties are evaluated as detailed in Sect. 3.5.1. For ship-based observations, both systematic and random uncertainties are incorporated following recommendations from Carter et al. (2024), with each observations categorized as direct, calculated, or combined; detailed uncertainty estimates are provided in Table A3.

During the gridding process, all N parameter values (yparam,i) within each 1°×1° spatial bin and specific pressure level are combined using weighted averaging:

(3) y grid = i = 1 N w i y param , i i = 1 N w i

where ygrid is the gridded parameter value, yparam,i is the ith profile value interpolated to the pressure level, and wi is the inverse variance weight (Eq. 4), with σparam_prof being the measurement uncertainty of each observation as provided in the SOCOML profile data.

(4) w = 1 σ param_prof 2

The weighted spread of observations within each grid cell (σspread) is calculated following Bittig et al. (2018). To estimate the uncertainty of the climatological mean (σgrid), the σspread is divided by the square root of the effective sample size (Neff,) computed using the Kish formula.

(5)σspread=i=1Nwi(yparam,i-ygrid)2i=1Nwi-i=1Nwi2i=1Nwi(6)σgrid=σspreadNeff(7)σgrid=i=1Nwi2i=1Nwi2
4 Results and discussion

4.1 Model performance comparison

Model performance is evaluated for three widely used approaches for estimating oceanic biogeochemical properties: ESPER_LIR, ESPER_NN, and the CANYON-B. The models are trained on a different version of the GLODAPv2 dataset (CANYON-B on GLODAPv2.2016; ESPER_LIR and ESPER_NN on GLODAPv2.2020). Independent assessment data not included in model training are used for comparison (Table 1). Both the full-parameter and hydrography-only reconstruction pathways are assessed.

Under the full-parameter pathway, ESPER_NN achieves the lowest RMSE for most reconstructed variables (Table 2), including TA (4.37 µmol kg−1), DIC (6.09 µmol kg−1), PO4 (0.07 µmol kg−1), and SiO4 (2.59 µmol kg−1). ESPER_LIR performs slightly better for pH and NO3. Generally, ESPER_NN's RMSE values for TA and DIC are several percent lower than those of ESPER_LIR and CANYON-B, demonstrating superior accuracy relative to the observations.

Table 2The statistics (r2, RMSE, and mean bias) were compared between estimated parameter values from ESPER_NN and CANYON-B using assessment data added after the original GLODAPv2.2020 release (i.e., all cruises with GLODAPv2 cruise numbers 2107, Table 1). In addition, the reconstructed performance of model was examined for the entire water column as well as for specific depth ranges: surface (0–200 dbar), intermediate (1000–2000 dbar), and deep and abyssal (>2000 dbar).

Download Print Version | Download XLSX

Under the hydrography-only pathway, omission of oxygen leads to a notable increase in RMSE for DIC (8.78 µmol kg−1), particularly in deep and abyssal waters (Table 2). This highlights the critical role of oxygen measurements as predictors for deep DIC. Relative to CANYON-B, both TA and DIC exhibit systematic underestimation, with mean full-column mean bias of −0.6 and −1.7µmol kg−1, respectively. These biases are more pronounced than those reported in earlier evaluation (Carter et al., 2021b), who found biases of −0.4 and −0.8µmol kg−1 using 2019–2020 GLODAPv2 data. The increasing discrepancies suggest that prediction errors in CANYON-B may be accumulating over time. Similar trends are observed for ESPER_LIR and ESPER_NN, indicating that periodic updates to model training datasets are necessary to mitigate future underestimation of DIC.

Uncertainty magnitudes vary among models, reflecting both differences in model structure and in uncertainty estimation methodologies. CANYON-B, which directly incorporates measurement uncertainties from input variables, produces larger uncertainty. Vertical performance analysis shows that the lowest RMSE values for most variables are found in deep and abyssal layers (comprising 24 % of full-column data), while the largest errors are found in surface waters (25 %), likely due to greater variability in surface carbonate chemistry.

Overall, the ESPER_NN demonstrates the highest accuracy and lowest uncertainty under both reconstruction pathways (Table 2), supporting its selection as the primary model for reconstructing carbonate system parameters and nutrients in the Southern Ocean throughout this work. Based on the estimated RMSE of ESPER_NN, float-derived estimates are expected to fall within twice the model's estimated uncertainty range, serving as a criterion for the quality control applied to the Argo float dataset.

4.2 Evaluation of bias between full-parameter and hydrography-only pathways

We first analyze the methodological discrepancies between the two reconstruction pathways using high-quality shipboard measurements. Biases in Cant and Ωar are quantified by using ship-based observations with concurrent O2, TA, and DIC measurements (N=93 667). Parameter values derived directly from measured shipboard data are compared with those calculated from reconstructed DIC, TA for two pathways (Fig. 5).

https://essd.copernicus.org/articles/17/7169/2025/essd-17-7169-2025-f05

Figure 5Histograms of calculation biases between the full-parameter pathway (orange; includes oxygen concentration) and the hydrography-only pathway (cyan, excludes oxygen concentration) for (a, b) anthropogenic carbon (Cant) and (c, d) aragonite saturation state (Ωar). Bias is defined as the difference between values calculated using ESPER_NN-derived variables or GLODAP shipboard measurements. The grey background denotes the full range of bias values, with all x axis centered at zero (bias=0) for visibility. (b) and (d) were the same as (a) and (b), but with a restricted x axis range of ±20µmol kg−1 for Cant and ±0.3 for Ωar. Figure legends indicate the calculation pathway, number of data, median values, mean values ±1 SD, and 95 % confidence intervals.

Download

Under the full-parameter pathway, Cant exhibits a slight negative bias relative to shipboard derived values (Cant_ship_M), with a median of −0.08µmol kg−1 and a mean of −0.16± 4.74 µmol kg−1 (95 % CI: [-0.19,-0.12]µmol kg−1 ). When oxygen is omitted, the Cant bias distribution broadens and shifts slightly positive, with a median of 0.07 µmol kg−1 and a mean of 0.02 ± 6.60 µmol kg−1 (95 % CI: [-0.02,0.07]µmol kg−1). Ωar biases remain small in both pathways but show a similar pattern: for the full-parameter pathway, Ωar bias is centered near zero (median=0.0006; mean=0.0006± 0.0433; 95 % CI: [0.0005, 0.0011]), whereas the hydrography-only pathway yields a slightly larger mean bias and spread (median=0.0021; mean=0.0021± 0.0618; 95 % CI: [0.0010,0.0018]). The hydrography-only pathway results in a median difference (Cant_ship_H−Cant_ship_F; Ωar_ship_H−Ωar_ship_F) of +0.1µmol kg−1 and 0.001 units with an added methodological uncertainty of ±2.4µmol kg−1 and ±0.02 units (Fig. A3) compared to the full-parameter pathway.

Subsequently, biases in float-based estimates from the two pathways are further assessed. The comparison results, restricted to cases within the 1400–2100 dbar depth range and specific seawater property differences (see Sect. 3.4), are shown in Fig. 6a–d. Oxygen concentrations estimated by both pathways exhibit insignificant systematic offset, indicating robust performance of float-based oxygen reconstructions. Biases in float-based Cant estimates exhibit a moderate positive correlation with oxygen biases, especially at higher latitudes, while Ωar biases exhibit a slight positive association with O2 biases, particularly in the mid-latitudes. The difference in float-based Cant and Ωar between the two reconstructed pathways are within ±10µmol kg−1 and ±0.075 unit, respectively. These ranges are roughly half those observed for the bias distributions between float-based and ship-based values.

https://essd.copernicus.org/articles/17/7169/2025/essd-17-7169-2025-f06

Figure 6Scatter comparisons and spatial distributions of difference in O2, Cant and Ωar between the two reconstruction pathways, restricted to the 1400–2100 dbar range. (a) Scatter plot comparing float-based and ship-based oxygen measurements under the full-parameter (red) and hydrography-only (blue) pathways, and (b) spatial distribution of O2 differences. (c, d) Scatter plots illustrating correlations between differences in O2 and differences in Cant and Ωar, colored by latitude.

Overall, the two reconstructed pathways for Argo floats have a narrow bias distribution for reliable Southern Ocean analyses. Considering both the methodological uncertainty and random uncertainty (estimated following Sect. 3.5.1), the uncertainties of Cant in both pathways are about ±4–6 µmol kg−1, which remain acceptably low for large-scale biogeochemical reconstructions (e.g., compared to Pardo et al., 2014, of ±6µmol kg−1 and Asselot et al., 2024, of ±5.2µmol kg−1).

4.3 Climatological distributions

Climatological spatial distributions of interior carbonate system parameters are obtained by averaging measured and reconstructed values from both ship-based observations and Argo float-based reconstructions, as well as their calculated values of Cant, and Ωar. Shipboard measurements span 1972–2020, while float–based observations span 2000–2025, with oxygen-equipped floats contributing data since 2003. Figures 7 and 8 illustrate the climatological spatial distribution of the interior carbon system parameters of the Float Grid. The spatial patterns of the gridded products are generally consistent, except for Cant and Ωar in the southwestern Atlantic (see Fig. B2). To illustrate this discrepancy, the corresponding O2-Float Grid distributions are also provided. Figure 9 presents the distributions in the abyssal layer. Because observations below 4000 m are extremely scarce, the All-Data Grid is used to provide the most comprehensive depiction of deep-ocean conditions. Section 4.4 further provides a detailed comparison among these products and shipboard estimates.

https://essd.copernicus.org/articles/17/7169/2025/essd-17-7169-2025-f07

Figure 7Averaged climatological distribution of DIC (a–d) and Cant (e–h) in oceanic sectors and three layers: subsurface layer (100 to 300 m), intermediate layer (1400 to 2000 m), and deep layer (2000 to 4000 m). The climatology is based on the Float Grid derived from Argo profile data spanning 2001–2024 for DIC, while Cant is scaled to the reference year 2013. In the southwestern Atlantic, where noticeable differences of Cant in deep waters occur between different data products (see explanation in Appendix B2 and B3), the corresponding O2-Float Grid distributions for the intermediate and deep layers are additionally shown as the inset to the subplot (f, g). The thin black lines show, from north to south, the Subtropical Front (STF), the Subantarctic Front (SAF), the Polar Front (PF), and the Southern Antarctic Circumpolar Current Front (SACCF) (Orsi et al., 1995). Note that the color scales differ among the individual maps.

https://essd.copernicus.org/articles/17/7169/2025/essd-17-7169-2025-f08

Figure 8Averaged climatological distribution of pH (a–d) and Ωar (e–h) in oceanic sectors and three layers: subsurface layer (100 to 300 m), intermediate layer (1400 to 2000 m), and deep layer (2000 to 4000 m). The climatology is based on the Float Grid derived from Argo profile data spanning 2001–2024. In the southwestern Atlantic, where noticeable differences of Cant in deep waters occur between different data products (see explanation in Appendix B2 and B3), the corresponding O2-Float Grid distribution for the intermediate layer is additionally shown as the inset to the subplot (f). The thin black lines show, from north to south, the Subtropical Front (STF), the Subantarctic Front (SAF), the Polar Front (PF), and the Southern Antarctic Circumpolar Current Front (SACCF) (Orsi et al., 1995). Note that the color scales differ among the individual maps.

https://essd.copernicus.org/articles/17/7169/2025/essd-17-7169-2025-f09

Figure 9Averaged climatological distribution of DIC (a), Cant (b), pH (c), and Ωar (d) in the abyssal layer (4000 to 5600 m), The climatology is based on the All-Data Grid combined float- and ship-based observations. Note that the color scales differ among the individual maps.

The absence of continental barriers across much of the Southern Ocean and the transport of the Antarctic Circumpolar Current (ACC) result in pronounced meridional gradients dominating the spatial patterns of interior biogeochemical properties. These meridional gradients are closely linked to the spatial distribution of the circumpolar hydrographic fronts, including the Subtropical Front (STF), the Subantarctic Front (SAF), the Polar Front (PF), and the Southern Antarctic Circumpolar Current Front (SACCF), which are indicated by black lines in the climatological distribution maps. The climatological distributions further reflect inter-basin variability driven by ocean basin geometry, bathymetry, and ocean circulation differences among the Pacific, Atlantic, and Indian Oceans. The averaged profile distributions in the Pacific, Atlantic, and Indian sectors of the Southern Ocean ae shown in Figs. 7d, h and 8d, h.

The distribution of DIC and Cant exhibit strong spatial relationships (Fig. 7). In the subsurface layer, high DIC and low Cant concentrations are found south of the PF, the southern boundary of the ACC, due to upwelling of older, DIC-rich and Cant-poor deep waters (Marshall and Speer, 2012). Conversely, the northern portion of the Southern Ocean, in north of the SAF, display low DIC and high Cant concentrations attributed to the transport of Subantarctic Mode Water (SAMW) (Talley, 2013). As depth increases into the intermediate layer (1400–2000 m), Cant concentrations decline significantly, accompanied by increases in DIC. Cant demonstrates pronounced basin-scale variability, with notably low concentrations (0–5 µmol kg−1) in mid-to-high latitudes of the southeastern Pacific Ocean, particularly south of the SACCF between 120 and 180° W. Conversely, higher Cant concentrations (>20µmol kg−1) are observed in the Pacific sectors and areas south of the PF in the eastern Antarctic region. Regions with elevated DIC typically show lower Cant concentrations, and vice versa. In the deep and abyssal layer (2000–5600 m, Fig. 9c), the spatial patterns of DIC and Cant remain unchanged, and their vertical profiles flatten. Both DIC and Cant exhibit relatively high concentrations in the eastern Antarctic region, where Antarctic Bottom Waters (AABW) forms (Morrison et al., 2020). This enrichment is consistent with AABW-driven transport of anthropogenic carbon into the deep ocean.

The accumulated Cant uptake and increased DIC concentrations intensify OA, leading to declines in both pH and Ωar. Spatial distributions of pH and Ωar (Fig. 8) closely resemble those of DIC (Fig. 7a–d). In the subsurface layer, pH exhibits a spatial distribution pattern nearly identical to DIC. However, Fig. 8e demonstrates distinctly lower Ωar value south of the STF. Both pH and Ωar values decrease markedly from the surface to approximately 1000 m, with more gradual declines at depths below 1000 m. In the intermediate and deep layer, the Pacific Sector shows the lowest pH and Ωar values, followed by the Indian Sector and Atlantic Sector, respectively.

https://essd.copernicus.org/articles/17/7169/2025/essd-17-7169-2025-f10

Figure 10Panels (a) and (c) show the climatological distribution of Cant and Ωar in the intermediate layers (1400–2000 m), respectively. Panel (b) and (d) present latitudinal distributions of Cant and Ωar averaged over four depth layers: subsurface layer (100 to 300 m), intermediate layer (1400 to 2000 m), deep layer (2000 to 4000 m), and abyssal layer (4000 to 5600 m). Black, red, blue, green and purple symbols and lines represent Non-O2-Float Grid, Float Grid, All-Data Grid, GLODAP-derived data, and TraceV1-derived data, respectively.

4.4 Assessment of differences among gridded products

The Float Grid including the Non-O2-Float Grid and the O2-Float Grid represents observation-based climatological products with strong capability to resolve fine-scale horizontal and vertical distributions of interior ocean carbonate system parameters. Figure 10 presents the latitudinal distributions of Cant and Ωar across the Southern Ocean for four gridded products in this study as well as GLODAP-derived data. Additionally, we apply the TRACE method (Carter et al., 2025) to estimate Cant, generating a gridded dataset as described in Sect. 3.3, which serves as an additional comparison (Fig. 10b). TRACEv1 adopts a hybrid conceptual framework, with surface-ocean estimates that are observation-based, whereas deep-ocean fields are model-based and tuned against observations.

The four gridded products show latitudinal variations that are closely aligned with the GLODAP-derived data (green lines). Notably, in the intermediate layers characterized by variety in Cant distributions across oceanic basins (Fig. 8f), the GLODAP-derived dataset north of 45° S exhibit pronounced zonal gradients, likely due to sparse longitudinal sampling. In contrast, our products, benefiting from enhanced spatial coverage, better capture the integrated regional variations. The TRACE-derived dataset (purple lines) yields lower Cant concentration than the TrOCA-derived values, particularly in intermediate waters. This difference may partly arise because the model-based approach is difficult to constrain accurately in regions with sparse transient tracer observations and complex vertical structures associated with Southern Ocean upwelling (as suggested by Carter et al., 2025), leading to deviations from the observation-based TrOCA estimates. In deep and abyssal layers, Cant concentrations show an increasing trend from lower latitudes toward higher latitudes (60–70° S). This pattern may be linked to the formation of AABW, which drives transport of Cant into the deep oceans (Zhang et al., 2023).

A direct comparison between the O2-Float Grid (red lines) and the Non-O2-Float Grid (black lines) elucidate differences attributable to Core Argo versus BGC Argo observations. Reconstructions of Cant and Ωar are broadly consistent across subsurface and intermediate layers for both float types. Significant discrepancies and steep gradients among gridded products are evident in all water layers south of 65° S. Figure A7 illustrates the geographical coverage of float and ship-based observations at high latitudes. In the abyssal layer, float observations are restricted to the eastern Weddell Sea near 4100 m, as they may not be representative of abyssal-layer conditions. Notably, a hotspot of high Cant and vertically confined low Ωar is identified in the southwestern Atlantic Ocean near the SAF and PF in the Non-O2-Float Grid (Fig. 10a). These noticeable differences likely arise from the limited number of shipboard training samples in this region, leading to reduced performance of the machine-learning reconstructions. For regional studies in the southwestern Atlantic, we therefore recommend using the O2-Float Grid provided in this study. The corresponding vertical profiles are presented in Appendix B3. Although float-based measurements introduce additional uncertainties, their extensive spatial and temporal coverage enables our products to offer unprecedented insight into the previously under-sampled Southern Ocean, particularly in the deep ocean. Overall, the All-Data Grid offers a comprehensive representation of the Southern Ocean interior, while the O2-Float Grid and Non-O2-Float Grid demonstrate the potential and limitations of Argo-based reconstructions for studying carbon dynamics.

4.5 Uncertainties assessment

The uncertainty of gridded products arise from both the uncertainties in the parameter estimates and mapping (sampling) errors. Considering the nonnegligible trends of accumulated Cant, the evaluation of uncertainty in this section mainly focuses on the anthropogenic CO2. Both random errors and potential biases contribute to the uncertainties in the Cant estimates. In Sect. 4.1, the random errors for individual measurements have been estimated to be about ±4–6 µmol kg−1 in the full-parameter pathway and the hydrography-only pathway. The potential bias including uncertainty in inversion algorithm assumptions and formula parameters are more difficult to assess quantitatively, but have little effect on the climatological distribution. The mapping errors, reflecting uncertainties introduced during spatial interpolation, are also challenging to evaluate precisely.

Traditionally, distribution maps of carbonate system parameters, including Cant, were constructed using limited GLODAPv2 cruises data (sampling stations are shown in Fig. 1b), with spatial coverage extended via regression and interpolation methods (Gruber et al., 2019b; Sabine et al., 2004; Barth et al., 2014). In these earlier products, mapping errors strongly depended on the vertical and horizontal data distribution and were assumed to be less than 15 % (Sabine et al., 2004). In contrast, our gridded products leverage the statistical advantage of aggregating multiple independent observations, resulting in gridded uncertainties that are smaller than individual observation uncertainties, and mapping discrepancies reduced to below 7.5 % (Fig. A9). Although our approach may underestimate uncertainty due to potential representativity error, our dataset offers a significant improvement in both accuracy and spatial representativeness over previous gap-filling approaches, and it further extends coverage to the ocean bottom, whereas earlier analyses were largely restricted to the upper 0–3000 m (Gruber et al., 2019a). This enhancement is especially valuable for robustly assessing variability and climatological trends in the historically data-sparse Southern Ocean.

5 Data availability

The raw Argo profile measurements used in this study are publicly available from the Argo Global Data Assembly Center (GDAC) at ftp://ftp.ifremer.fr/ifremer/argo/dac/. The processed Argo profile dataset and SOCOML gridded products, including the primary product ALL-Data Grid and auxiliary products, are available at https://data.mendeley.com/datasets/xzr59ngmpz/2 (last access: 28 October 2025) (https://doi.org/10.17632/xzr59ngmpz.2, Zhong et al., 2025a) and NOAA National Centers for Environmental Information (NCEI, https://doi.org/10.25921/8c29-rv75, Zhong et al., 2025b).

6 Conclusions

As the Southern Ocean Argo array has expanded, we applied the ESPER_NN model to reconstruct eight key carbonate system parameters – TA, DIC, pH, NO3, PO4, SiO4, Cant and Ωar from Argo profiles. These reconstructions were then gridded into a 1°×1° product with 84 pressure levels. The input variables were dynamically partitioned into full-parameter (with O2 measured) and hydrography-only (without O2 measured) pathways to leverage the extensive Argo network. To account for differing data sources, we generated four gridded products: the All-Data Grid, integrating both Argo and GLODAP data, and the Float Grid, further divided into the Non-O2-Float Grid and the O2-Float Grid. Although the All-Data Grid provides a comprehensive climatological distribution derived from multiple integrated data sources, the Float Grid demonstrates greater internal consistency. This is because discrepancies arising from measurement instrumentation differences cannot be fully eliminated, as clearly illustrated in Fig. 7c. Consequently, the All-Data Grid is more suitable for large-scale studies, whereas investigations focusing on smaller regions should incorporate more rigorous analyses of accuracy and uncertainty.

Model comparisons and evaluations reveal increasing underestimation of DIC over time, particularly along the hydrography-only pathway, which lead to progressive underestimation of Cant. This variation of bias underscores the inherent constraints of machine learning models trained on data confined to a fixed temporal scope; they cannot extrapolate beyond the observed period to capture emerging trends. Despite this, ESPER_NN maintains robust generalization performance against assessment data. And the bias between two pathways remains relatively small compare to the difference of reconstructed variables and GLODAP measurements. Cant pathway biases remain within ±10µmol kg−1, and Ωar pathway biases within ±0.075. These biases exhibit latitudinal variability correlated with oxygen bias. This supports the feasibility of using machine learning models to integrate both Core Argo and BGC Argo data, and highlights the potential for future improvements through the assimilation of nitrate and pH observations from Argo floats.

We offer all gridded products including eight oceanic interior carbonate system parameters, along with their uncertainty estimates, to the scientific community for advancing Southern Ocean carbon-cycle research and improving new perspective of ocean acidification and carbon sequestration based on observational variables.

Appendix A: Supplemented tables and figures

Table A1List of cruises with excluded measurements from the carbonate system internal consistency training dataset presented in this work. Numbers in brackets following recommended adjustment values denote stations removed from the dataset.

−777 = Poor data, no adjustment suggested. If one of the three carbon system parameters – DIC, TA, or pH – is calculated, it is annotated with a subscript c.

Download Print Version | Download XLSX

Table A2All GLODAPv2 cruise located in Southern Ocean that have pH values.

All TA and DIC data from GLODAP used in this study is measured. (Carter et al., 2024).

Download Print Version | Download XLSX

Table A3Uncertainty estimation for measurements and calculations from GLODAP.

Download Print Version | Download XLSX

https://essd.copernicus.org/articles/17/7169/2025/essd-17-7169-2025-f11

Figure A1NO.1042 cruise's (Expocode: 33RO20131223) scatter plot and linear fitting of measured pH and discrepancy between measured and calculated pH.

Download

https://essd.copernicus.org/articles/17/7169/2025/essd-17-7169-2025-f12

Figure A2Histograms of biases between the full-parameter pathway (orange; includes oxygen concentration) and the hydrography-only pathway (cyan, excludes oxygen concentration) for TA and DIC. Bias is defined as the difference between values calculated using ESPER_NN-derived variables or GLODAP shipboard measurements. The x axis was restricted within a range of ±20µmol kg−1 for TA and ±50µmol kg−1 for DIC. (a, b) Based on total data of GLODAP; (c, d) Based on assessment data of GLODAP. Figure legends indicate the calculation pathway, number of data, median values, mean values ±1 SD, and 95 % confidence intervals.

Download

https://essd.copernicus.org/articles/17/7169/2025/essd-17-7169-2025-f13

Figure A3Intercomparison between Cant and Ωar calculations based on ESPER_NN-derived variables and direct measurements. (a, b) Scatterplots of Cant concentration calculated using ESPER_NN-derived variables and direct shipboard measurements. (c, d) Same as (a, b), but for Ωar values. The uncertainties are showed in the left-top of subplot (a–d).

Download

https://essd.copernicus.org/articles/17/7169/2025/essd-17-7169-2025-f14

Figure A4Averaged climatological distribution of TA (a–d) in oceanic sectors and three layers: subsurface layer (100 to 300 m), intermediate layer (1400 to 2000 m), deep layer (2000 to 4000 m). The climatology is based on the Float Grid derived from Argo profile data spanning 2001–2024. The thin black lines show, from north to south, the Subtropical Front (STF), the Subantarctic Front (SAF), the Polar Front (PF), and the Southern Antarctic Circumpolar Current Front (SACCF) (Orsi et al., 1995). Note that the color scales differ among the individual maps.

https://essd.copernicus.org/articles/17/7169/2025/essd-17-7169-2025-f15

Figure A5Averaged climatological distribution of NO3 (a–d), PO4 (e–h), and SiO4 (i–l) in oceanic sectors (m–o) and four layers: subsurface layer (100 to 300 m), intermediate layer (1400 to 2000 m), deep layer (2000 to 4000 m), and abyssal layer (4000 to 5600 m). The climatology is based on the Float Grid derived from Argo profile data spanning 2001–2024. The thin black lines show, from north to south, the Subtropical Front (STF), the Subantarctic Front (SAF), the Polar Front (PF), and the Southern Antarctic Circumpolar Current Front (SACCF) (Orsi et al., 1995). Note that the color scales differ among the individual maps.

https://essd.copernicus.org/articles/17/7169/2025/essd-17-7169-2025-f16

Figure A6The location of the observations measured by Argo floats (a) or ship (b) in south of 65° S. The blue and red symbol denotes measurement in the intermediate layer (2000–4000 dbar) and the abyssal layer (4000–5600 dbar), respectively.

https://essd.copernicus.org/articles/17/7169/2025/essd-17-7169-2025-f17

Figure A7The uncertainty of the All-Data Grid in the intermediate layer (1400–2000 m) for (a) TA, (b) DIC, (c) Cant, (d) NO3, (e) PO4, (f) SiO4, (g) pH, and (h) Ωar.

https://essd.copernicus.org/articles/17/7169/2025/essd-17-7169-2025-f18

Figure A8The uncertainty of the Float Grid in the intermediate layer (1400–2000 m) for (a) TA, (b) DIC, (c) Cant, (d) NO3, (e) PO4, (f) SiO4, (g) pH, and (h) Ωar.

https://essd.copernicus.org/articles/17/7169/2025/essd-17-7169-2025-f19

Figure A9Averaged climatological distribution of Cant (a–d) and Ωar (e–h) in oceanic sectors and three layers: subsurface layer (100 to 300 m), intermediate layer (1400 to 2000 m), deep layer (2000 to 4000 m), and abyssal layer (4000 to 5600 m). The climatology is derived from GLODAPv2.2023, while Cant is scaled to the reference year 2013. The thin black lines show, from north to south, the Subtropical Front (STF), the Subantarctic Front (SAF), the Polar Front (PF), and the Southern Antarctic Circumpolar Current Front (SACCF) (Orsi et al., 1995). Note that the color scales differ among the individual maps.

Appendix B: Supplement to the methods

B1 Scaling method

To scale the anthropogenic CO2 concentration, we follow Gruber et al. (2019b) to estimate the scaling ratio α of the changes between the periods of t1(2001) and t2(2024) relative to the preindustrial t0(1750):

(B1) α = Δ t p CO 2 atm ( t 2 - t 1 ) Δ t p CO 2 atm ( t 0 - t 1 ) γ ( t 0 . . t 1 ) γ ( t 1 . . t 2 ) ξ ( t 1 . . t 2 ) ξ ( t 0 . . t 1 )

where α depends mainly on the ratio of the change in atmospheric CO2 (pCO2atm), but is modified by the changes in the revelle factors (γ) and changes in the air-sea disequilibrium (ξ).

Using pCO2atm=280 ppm for t0, 371 ppm for t1, and 423 ppm for t2 (Lan et al., 2025), the ratio of the changes in pCO2atm is 0.57 with a very small uncertainty of about ±0.01 considering round up. Taking the revelle factor for 1950 for γ(t0..t1) and that for 2013 for γ(t1..t2) yields a ratio γ(t0..t1)/γ(t1..t2) of 0.90 ± 0.02 for the Southern Ocean (south of 30° S). These revelle factors were derived by using products from Gregor and Gruber (2021). Considering the trends of decrease of the air-sea equilibrium changes relatively small in subtropics and high latitudes (Matsumoto and Gruber, 2005), we use 0.94 ± 0.05 for the ratio ξ(t1..t2)/ξ(t0..t1) following Gruber et al. (2019b). Using all ratio values, α are set as 0.48 ± 0.04 (0.019986 yr−1). Assuming the ocean reaches the constant steady state over 2001–2024, Cant is normalized as referring to scaling equations of Carter et al. (2021b):

(B2) C ant ( t ref ) = C ant ( t ) e 0.019986 ( t ref - t )

where Cant(tref) is the normalized Cant concentration at the reference year tref (set as 2013, the median Argo observation year), and Cant(t) is the estimates for year t.

Figure B1 shows the climatological distribution of Cant is insensitive to uncertainty in the scaling factor, with anomalous change remaining within ±1µmol kg−1. A smaller scaling factor (α=0.44, Fig. B1a–d) produces slightly higher Cant values, whereas a larger factor (α=0.52, Fig. B1e and f) yields lower values, consistent with different assumed rates of oceanic CO2 accumulation.

B2 Differences between reconstruction pathways for float-based products

To quantify the differences between the two reconstruction pathways of Argo floats, BGC-Argo floats equipped with O2 sensors are used to compare the hydrography-only pathway with the full-parameter pathway. For each 1°×1° grid cell, the averaged vertical difference in Cant and Ωar between the two pathways are calculated (Fig. B2). Because the full-parameter pathway incorporates measured O2 and therefore provides better-constrained results, its reconstructions of Cant and Ωar are regarded as more accurate. The comparison reveals a clear overestimation of Cant and a localized underestimation of Ωar in the southwestern Atlantic under the hydrography-only pathway. The Ωar difference primarily occurs within the intermediate layer. Similar but weaker differences are observed near the Ross Sea (overestimated Cant, underestimated Ωar) and along the Pacific coast of South America (underestimated Cant, overestimated Ωar), although their spatial patterns are much less pronounced. The pronounced differences in the southwestern Atlantic likely arises from the limited number of shipboard training samples in this region (only about 1.3 % of the quality-controlled GLODAPv2.2023 measurements in the Southern Ocean contain both TA and DIC measurements). The performance of the machine-learning reconstructions and the gridded dataset is expected to improve as additional high-quality data become available.

B3 Regional profile distribution in the southwestern Atlantic

From the climatological distribution of Cant and Ωar in the O2-float Grid and non-O2-float Grid, a hotspot of high Cant concentrations and low Ωar appears in the southwestern Atlantic Ocean (SW Atlantic) near the SAF and PF. Temperature and salinity profiles from Core and BGC Argo floats are generally consistent, and the profile distributions of O2 measured by BGC Argo floats agree well with those reconstructed from Core Argo data (Fig. B3), confirming that the O2 reconstruction in this region is reliable. However, the profiles of TA, DIC, and consequently Cant and Ωar exhibit noticeable differences (Fig. B4), arising from the inherent limitations of the machine-learning model that lead to distinct reconstruction pathways (detailed in Appendix B2). In the SW Atlantic, Cant exhibits a more stratified vertical structure rather than the deep-penetrating signal captured by the non-O2-Float grid. For regional studies, particularly in the southwestern Atlantic, we recommend using the O2-Float Grid provided in this study.

https://essd.copernicus.org/articles/17/7169/2025/essd-17-7169-2025-f20

Figure B1Sensitivity of anomalous change in Cant distribution to the value of the scaling factor α=0.44 (a–d) and α=0.52 (e–h) in four layers: subsurface layer (100 to 300 m), intermediate layer (1400 to 2000 m), deep layer (2000 to 4000 m), and abyssal layer (4000 to 5600 m).

https://essd.copernicus.org/articles/17/7169/2025/essd-17-7169-2025-f21

Figure B2Differences between reconstruction pathways for (a) Cant (µmol kg−1) and (b) Ωar from BGC-Argo floats with O2 sensors. For each 1°×1° bin, the mean vertical difference between the hydrography-only and full-parameter pathways are calculated. Values falling within the range derived from float–shipboard matchups are shown in grey, whereas brown and green indicate positive and negative differences, respectively.

https://essd.copernicus.org/articles/17/7169/2025/essd-17-7169-2025-f22

Figure B3Zonal-mean sections (averaged between 60 and 30° W) of temperature, salinity, and oxygen from 60 to 30° S in the southwestern Atlantic Ocean. Panels (a–c) show profiles from the O2-Float Grid, and panels (d–f) from the Non-O2-Float Grid. Note that oxygen is measured in panel (c) and reconstructed in panel (f).

Download

https://essd.copernicus.org/articles/17/7169/2025/essd-17-7169-2025-f23

Figure B4Zonal-mean sections (averaged between 60 and 30° W) of TA (a), DIC (b), Cant (c), and Ωar (d) from 60 to 30° S in the southwestern Atlantic Ocean. The profiles are from the O2-Float Grid.

Download

https://essd.copernicus.org/articles/17/7169/2025/essd-17-7169-2025-f24

Figure B5Zonal-mean sections (averaged between 60 and 30° W) of TA (a), DIC (b), Cant (c), and Ωar (d) from 60 to 30° S in the southwestern Atlantic Ocean. The profiles are from the Non-O2-Float Grid.

Download

Author contributions

Conceptualization: DQ and YW; Data curation: WZ; Methodology: WZ, YW, and CL; Resources: MX, DQ, and WG; Writing original draft preparation: WZ; Writing review and editing: all authors.

Competing interests

The contact author has declared that none of the authors has any competing interests.

Disclaimer

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Acknowledgements

This work was supported by the Ocean Negative Carbon Emissions (ONCE) Program. We thank the many contributors to the datasets of GLODAP and Argo Program. Di Qi was supported by the National Youth Talent Program of China and the Special Professorship of the National Major Talent Engineering of China. The numerical calculations in this paper have been done on the supercomputing system in the Supercomputing Center of Wuhan University.

Financial support

This research has been supported by the Independent Research Projects of Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai) (grant no. SML2021SP306), the National Natural Science Foundation of China (grant no. 42576268 and 42171464), the Natural Science Foundation of Fujian Province (grant no. 2025J09045), the 2024 International Cooperation Seed Funding Project for China's Ocean Decade Actions (grant no. GHZZ3702840002024020000028), the Fundamental Research Funds for the Central Universities (grant no. ZNJC202415 and 413000028), the National Key R&D Program of China (grant no. 2024YFC3015600), and the Science and Technology Program of Hubei Provincial (grant no. 2025BEB017).

Review statement

This paper was edited by Sebastiaan van de Velde and reviewed by L.-Q. Jiang and one anonymous referee.

References

Asselot, R., Carracedo, L. I., Thierry, V., Mercier, H., Bajon, R., and Pérez, F. F.: Anthropogenic carbon pathways towards the North Atlantic interior revealed by Argo-O2, neural networks and back-calculations, Nature Communications, 15, 1630, https://doi.org/10.1038/s41467-024-46074-5, 2024. 

Bakker, D. C. E., Pfeil, B., Landa, C. S., Metzl, N., O'Brien, K. M., Olsen, A., Smith, K., Cosca, C., Harasawa, S., Jones, S. D., Nakaoka, S., Nojiri, Y., Schuster, U., Steinhoff, T., Sweeney, C., Takahashi, T., Tilbrook, B., Wada, C., Wanninkhof, R., Alin, S. R., Balestrini, C. F., Barbero, L., Bates, N. R., Bianchi, A. A., Bonou, F., Boutin, J., Bozec, Y., Burger, E. F., Cai, W.-J., Castle, R. D., Chen, L., Chierici, M., Currie, K., Evans, W., Featherstone, C., Feely, R. A., Fransson, A., Goyet, C., Greenwood, N., Gregor, L., Hankin, S., Hardman-Mountford, N. J., Harlay, J., Hauck, J., Hoppema, M., Humphreys, M. P., Hunt, C. W., Huss, B., Ibánhez, J. S. P., Johannessen, T., Keeling, R., Kitidis, V., Körtzinger, A., Kozyr, A., Krasakopoulou, E., Kuwata, A., Landschützer, P., Lauvset, S. K., Lefèvre, N., Lo Monaco, C., Manke, A., Mathis, J. T., Merlivat, L., Millero, F. J., Monteiro, P. M. S., Munro, D. R., Murata, A., Newberger, T., Omar, A. M., Ono, T., Paterson, K., Pearce, D., Pierrot, D., Robbins, L. L., Saito, S., Salisbury, J., Schlitzer, R., Schneider, B., Schweitzer, R., Sieger, R., Skjelvan, I., Sullivan, K. F., Sutherland, S. C., Sutton, A. J., Tadokoro, K., Telszewski, M., Tuma, M., van Heuven, S. M. A. C., Vandemark, D., Ward, B., Watson, A. J., and Xu, S.: A multi-decade record of high-quality fCO2 data in version 3 of the Surface Ocean CO2 Atlas (SOCAT), Earth Syst. Sci. Data, 8, 383–413, https://doi.org/10.5194/essd-8-383-2016, 2016. 

Barth, A., Beckers, J.-M., Troupin, C., Alvera-Azcárate, A., and Vandenbulcke, L.: divand-1.0: n-dimensional variational data analysis for ocean observations, Geosci. Model Dev., 7, 225–241, https://doi.org/10.5194/gmd-7-225-2014, 2014. 

Bednaršek, N., Tarling, G. A., Bakker, D. C. E., Fielding, S., Jones, E. M., Venables, H. J., Ward, P., Kuzirian, A., Lézé, B., Feely, R. A., and Murphy, E. J.: Extensive dissolution of live pteropods in the Southern Ocean, Nature Geoscience, 5, 881–885, https://doi.org/10.1038/ngeo1635, 2012. 

Bittig, H. C., Steinhoff, T., Claustre, H., Fiedler, B., Williams, N. L., Sauzède, R., Körtzinger, A., and Gattuso, J.-P.: An Alternative to Static Climatologies: Robust Estimation of Open Ocean CO2 Variables and Nutrient Concentrations From T, S, and O2 Data Using Bayesian Neural Networks, Frontiers in Marine Science, 5, https://doi.org/10.3389/fmars.2018.00328, 2018. 

Bopp, L., Lévy, M., Resplandy, L., and Sallée, J. B.: Pathways of anthropogenic carbon subduction in the global ocean, Geophysical Research Letters, 42, 6416–6423, https://doi.org/10.1002/2015GL065073, 2015. 

Bushinsky, S. M., Nachod, Z., Fassbender, A. J., Tamsitt, V., Takeshita, Y., and Williams, N.: Offset Between Profiling Float and Shipboard Oxygen Observations at Depth Imparts Bias on Float pH and Derived pCO2, Global Biogeochemical Cycles, 39, e2024GB008185, https://doi.org/10.1029/2024GB008185, 2025. 

Carter, B. R., Williams, N. L., Gray, A. R., and Feely, R. A.: Locally interpolated alkalinity regression for global alkalinity estimation, Limnology and Oceanography: Methods, 14, 268–277, https://doi.org/10.1002/lom3.10087, 2016. 

Carter, B. R., Feely, R. A., Williams, N. L., Dickson, A. G., Fong, M. B., and Takeshita, Y.: Updated methods for global locally interpolated estimation of alkalinity, pH, and nitrate, Limnology and Oceanography-Methods, 16, 119–131, https://doi.org/10.1002/lom3.10232, 2018. 

Carter, B. R., Bittig, H. C., Fassbender, A. J., Sharp, J. D., Takeshita, Y., Xu, Y.-Y., Álvarez, M., Wanninkhof, R., Feely, R. A., and Barbero, L.: New and updated global empirical seawater property estimation routines, Limnology and Oceanography: Methods, 19, 785–809, https://doi.org/10.1002/lom3.10461, 2021a. 

Carter, B. R., Feely, R. A., Lauvset, S. K., Olsen, A., DeVries, T., and Sonnerup, R.: Preformed Properties for Marine Organic Matter and Carbonate Mineral Cycling Quantification, Global Biogeochemical Cycles, 35, e2020GB006623, https://doi.org/10.1029/2020GB006623, 2021b. 

Carter, B. R., Sharp, J. D., García-Ibáñez, M. I., Woosley, R. J., Fong, M. B., Álvarez, M., Barbero, L., Clegg, S. L., Easley, R., Fassbender, A. J., Li, X., Schockman, K. M., and Wang, Z. A.: Random and systematic uncertainty in ship-based seawater carbonate chemistry observations, Limnology and Oceanography, 69, 2473–2488, https://doi.org/10.1002/lno.12674, 2024. 

Carter, B. R., Schwinger, J., Sonnerup, R., Fassbender, A. J., Sharp, J. D., Dias, L. M., and Sandborn, D. E.: Tracer-based Rapid Anthropogenic Carbon Estimation (TRACE), Earth Syst. Sci. Data, 17, 3073–3088, https://doi.org/10.5194/essd-17-3073-2025, 2025. 

Dickson, A. G., Wesolowski, D. J., Palmer, D. A., and Mesmer, R. E.: Dissociation constant of bisulfate ion in aqueous sodium chloride solutions to 250 °C, Journal of Physical Chemistry (United States), 94, 7978–7985, https://doi.org/10.1021/j100383a042, 1990. 

Doney, S. C., Fabry, V. J., Feely, R. A., and Kleypas, J. A.: Ocean Acidification: The Other CO2 Problem, Annual Review of Marine Science, 1, 169–192, https://doi.org/10.1146/annurev.marine.010908.163834, 2009. 

Doney, S. C., Busch, D. S., Cooley, S. R., and Kroeker, K. J.: The Impacts of Ocean Acidification on Marine Ecosystems and Reliant Human Communities, Annual Review of Environment and Resources, 45, 83–112, https://doi.org/10.1146/annurev-environ-012320-083019, 2020. 

Friedlingstein, P., O'Sullivan, M., Jones, M. W., Andrew, R. M., Hauck, J., Landschützer, P., Le Quéré, C., Li, H., Luijkx, I. T., Olsen, A., Peters, G. P., Peters, W., Pongratz, J., Schwingshackl, C., Sitch, S., Canadell, J. G., Ciais, P., Jackson, R. B., Alin, S. R., Arneth, A., Arora, V., Bates, N. R., Becker, M., Bellouin, N., Berghoff, C. F., Bittig, H. C., Bopp, L., Cadule, P., Campbell, K., Chamberlain, M. A., Chandra, N., Chevallier, F., Chini, L. P., Colligan, T., Decayeux, J., Djeutchouang, L. M., Dou, X., Duran Rojas, C., Enyo, K., Evans, W., Fay, A. R., Feely, R. A., Ford, D. J., Foster, A., Gasser, T., Gehlen, M., Gkritzalis, T., Grassi, G., Gregor, L., Gruber, N., Gürses, Ö., Harris, I., Hefner, M., Heinke, J., Hurtt, G. C., Iida, Y., Ilyina, T., Jacobson, A. R., Jain, A. K., Jarníková, T., Jersild, A., Jiang, F., Jin, Z., Kato, E., Keeling, R. F., Klein Goldewijk, K., Knauer, J., Korsbakken, J. I., Lan, X., Lauvset, S. K., Lefèvre, N., Liu, Z., Liu, J., Ma, L., Maksyutov, S., Marland, G., Mayot, N., McGuire, P. C., Metzl, N., Monacci, N. M., Morgan, E. J., Nakaoka, S.-I., Neill, C., Niwa, Y., Nützel, T., Olivier, L., Ono, T., Palmer, P. I., Pierrot, D., Qin, Z., Resplandy, L., Roobaert, A., Rosan, T. M., Rödenbeck, C., Schwinger, J., Smallman, T. L., Smith, S. M., Sospedra-Alfonso, R., Steinhoff, T., Sun, Q., Sutton, A. J., Séférian, R., Takao, S., Tatebe, H., Tian, H., Tilbrook, B., Torres, O., Tourigny, E., Tsujino, H., Tubiello, F., van der Werf, G., Wanninkhof, R., Wang, X., Yang, D., Yang, X., Yu, Z., Yuan, W., Yue, X., Zaehle, S., Zeng, N., and Zeng, J.: Global Carbon Budget 2024, Earth Syst. Sci. Data, 17, 965–1039, https://doi.org/10.5194/essd-17-965-2025, 2025. 

Friis, K., Körtzinger, A., Pätsch, J., and Wallace, D. W. R.: On the temporal increase of anthropogenic CO2 in the subpolar North Atlantic, Deep Sea Research Part I: Oceanographic Research Papers, 52, 681–698, https://doi.org/10.1016/j.dsr.2004.11.017, 2005. 

Gregor, L. and Gruber, N.: OceanSODA-ETHZ: a global gridded data set of the surface ocean carbonate system for seasonal to decadal studies of ocean acidification, Earth Syst. Sci. Data, 13, 777–808, https://doi.org/10.5194/essd-13-777-2021, 2021. 

Gruber, N.: Anthropogenic CO2 in the Atlantic Ocean, Global Biogeochemical Cycles, 12, 165–191, https://doi.org/10.1029/97GB03658, 1998. 

Gruber, N., Sarmiento, J. L., and Stocker, T. F.: An improved method for detecting anthropogenic CO2 in the oceans, Global Biogeochemical Cycles, 10, 809–837, https://doi.org/10.1029/96GB01608, 1996. 

Gruber, N., Clement, D., Carter, B. R., Feely, R. A., van Heuven, S., Hoppema, M., Ishii, M., Key, R. M., Kozyr, A., Lauvset, S. K., Lo Monaco, C., Mathis, J. T., Murata, A., Olsen, A., Perez, F. F., Sabine, C. L., Tanhua, T., and Wanninkhof, R.: The oceanic sink for anthropogenic CO2 from 1994 to 2007, Science, 363, 1193–1199, https://doi.org/10.1126/science.aau5153, 2019a. 

Gruber, N., Landschützer, P., and Lovenduski, N. S.: The Variable Southern Ocean Carbon Sink, Annual Review of Marine Science, 11, 159–186, https://doi.org/10.1146/annurev-marine-121916-063407, 2019b. 

Hauck, J., Gregor, L., Nissen, C., Patara, L., Hague, M., Mongwe, P., Bushinsky, S., Doney, S. C., Gruber, N., Le Quéré, C., Manizza, M., Mazloff, M., Monteiro, P. M. S., and Terhaar, J.: The Southern Ocean Carbon Cycle 1985–2018: Mean, Seasonal Cycle, Trends, and Storage, Global Biogeochemical Cycles, 37, e2023GB007848, https://doi.org/10.1029/2023GB007848, 2023. 

Johnson, G. C. and Purkey, S. G.: Refined Estimates of Global Ocean Deep and Abyssal Decadal Warming Trends, Geophysical Research Letters, 51, https://doi.org/10.1029/2024gl111229, 2024. 

Kroeker, K. J., Kordas, R. L., Crim, R., Hendriks, I. E., Ramajo, L., Singh, G. S., Duarte, C. M., and Gattuso, J.-P.: Impacts of ocean acidification on marine organisms: quantifying sensitivities and interaction with warming, Global Change Biology, 19, 1884–1896, https://doi.org/10.1111/gcb.12179, 2013. 

Lan, X., Tans, P. P., and Thoning, K. W.: Trends in globally averaged CO2 determined from NOAA Global Monitoring Laboratory measurements, NOAA Global Monitoring Laboratory (GML) [data set], https://doi.org/10.15138/9N0H-ZH07, 2025. 

Landschützer, P., Gruber, N., Haumann, F. A., Rödenbeck, C., Bakker, D. C. E., van Heuven, S., Hoppema, M., Metzl, N., Sweeney, C., Takahashi, T., Tilbrook, B., and Wanninkhof, R.: The reinvigoration of the Southern Ocean carbon sink, Science, 349, 1221–1224, https://doi.org/10.1126/science.aab2620, 2015. 

Lauvset, S. K. and Tanhua, T.: A toolbox for secondary quality control on ocean chemistry and hydrographic data, Limnology and Oceanography-Methods, 13, 601–608, https://doi.org/10.1002/lom3.10050, 2015. 

Lauvset, S. K., Key, R. M., Olsen, A., van Heuven, S., Velo, A., Lin, X., Schirnick, C., Kozyr, A., Tanhua, T., Hoppema, M., Jutterström, S., Steinfeldt, R., Jeansson, E., Ishii, M., Perez, F. F., Suzuki, T., and Watelet, S.: A new global interior ocean mapped climatology: the 1°×1° GLODAP version 2, Earth Syst. Sci. Data, 8, 325–340, https://doi.org/10.5194/essd-8-325-2016, 2016. 

Lauvset, S. K., Lange, N., Tanhua, T., Bittig, H. C., Olsen, A., Kozyr, A., Álvarez, M., Azetsu-Scott, K., Brown, P. J., Carter, B. R., Cotrim da Cunha, L., Hoppema, M., Humphreys, M. P., Ishii, M., Jeansson, E., Murata, A., Müller, J. D., Pérez, F. F., Schirnick, C., Steinfeldt, R., Suzuki, T., Ulfsbo, A., Velo, A., Woosley, R. J., and Key, R. M.: The annual update GLODAPv2.2023: the global interior ocean biogeochemical data product, Earth Syst. Sci. Data, 16, 2047–2072, https://doi.org/10.5194/essd-16-2047-2024, 2024. 

Lee, K., Kim, T.-W., Byrne, R. H., Millero, F. J., Feely, R. A., and Liu, Y.-M.: The universal ratio of boron to chlorinity for the North Pacific and North Atlantic oceans, Geochimica et Cosmochimica Acta, 74, 1801–1811, https://doi.org/10.1016/j.gca.2009.12.027, 2010. 

Lo Monaco, C., Goyet, C., Metzl, N., Poisson, A., and Touratier, F.: Distribution and inventory of anthropogenic CO2 in the Southern Ocean: Comparison of three data-based methods, Journal of Geophysical Research: Oceans, 110, https://doi.org/10.1029/2004JC002571, 2005. 

Lueker, T. J., Dickson, A. G., and Keeling, C. D.: Ocean pCO2 calculated from dissolved inorganic carbon, alkalinity, and equations for K1 and K2: validation based on laboratory measurements of CO2 in gas and seawater at equilibrium, Marine Chemistry, 70, 105–119, https://doi.org/10.1016/S0304-4203(00)00022-0, 2000. 

Mahieu, L., Lo Monaco, C., Metzl, N., Fin, J., and Mignon, C.: Variability and stability of anthropogenic CO2 in Antarctic Bottom Water observed in the Indian sector of the Southern Ocean, 1978–2018, Ocean Sci., 16, 1559–1576, https://doi.org/10.5194/os-16-1559-2020, 2020. 

Marshall, J. and Speer, K.: Closure of the meridional overturning circulation through Southern Ocean upwelling, Nature Geoscience, 5, 171–180, https://doi.org/10.1038/ngeo1391, 2012. 

Matsumoto, K. and Gruber, N.: How accurate is the estimation of anthropogenic carbon in the ocean? An evaluation of the ΔC* method, Global Biogeochemical Cycles, 19, https://doi.org/10.1029/2004GB002397, 2005. 

Maurer, T. L., Plant, J. N., and Johnson, K. S.: Delayed-Mode Quality Control of Oxygen, Nitrate, and pH Data on SOCCOM Biogeochemical Profiling Floats, Frontiers in Marine Science, 8, https://doi.org/10.3389/fmars.2021.683207, 2021. 

Metzl, N., Lo Monaco, C., Leseurre, C., Ridame, C., Reverdin, G., Chau, T. T. T., Chevallier, F., and Gehlen, M.: Anthropogenic CO2, air–sea CO2 fluxes, and acidification in the Southern Ocean: results from a time-series analysis at station OISO-KERFIX (51° S–68° E), Ocean Sci., 20, 725–758, https://doi.org/10.5194/os-20-725-2024, 2024. 

Morrison, A. K., Hogg, A. M., England, M. H., and Spence, P.: Warm Circumpolar Deep Water transport toward Antarctica driven by local dense water export in canyons, Science Advances, 6, eaav2516, https://doi.org/10.1126/sciadv.aav2516, 2020. 

Morrison, A. K., Waugh, D. W., Hogg, A. M., Jones, D. C., and Abernathey, R. P.: Ventilation of the Southern Ocean Pycnocline, Annual Review of Marine Science, 14, 405–430, https://doi.org/10.1146/annurev-marine-010419-011012, 2022. 

Olsen, A., Key, R. M., van Heuven, S., Lauvset, S. K., Velo, A., Lin, X., Schirnick, C., Kozyr, A., Tanhua, T., Hoppema, M., Jutterström, S., Steinfeldt, R., Jeansson, E., Ishii, M., Pérez, F. F., and Suzuki, T.: The Global Ocean Data Analysis Project version 2 (GLODAPv2) – an internally consistent data product for the world ocean, Earth Syst. Sci. Data, 8, 297–323, https://doi.org/10.5194/essd-8-297-2016, 2016. 

Orr, J. C., Fabry, V. J., Aumont, O., Bopp, L., Doney, S. C., Feely, R. A., Gnanadesikan, A., Gruber, N., Ishida, A., Joos, F., Key, R. M., Lindsay, K., Maier-Reimer, E., Matear, R., Monfray, P., Mouchet, A., Najjar, R. G., Plattner, G.-K., Rodgers, K. B., Sabine, C. L., Sarmiento, J. L., Schlitzer, R., Slater, R. D., Totterdell, I. J., Weirig, M.-F., Yamanaka, Y., and Yool, A.: Anthropogenic ocean acidification over the twenty-first century and its impact on calcifying organisms, Nature, 437, 681–686, https://doi.org/10.1038/nature04095, 2005. 

Orr, J. C., Epitalon, J.-M., Dickson, A. G., and Gattuso, J.-P.: Routine uncertainty propagation for the marine carbon dioxide system, Marine Chemistry, 207, 84–107, https://doi.org/10.1016/j.marchem.2018.10.006, 2018. 

Orsi, A. H., Whitworth, T., and Nowlin, W. D.: On the meridional extent and fronts of the Antarctic Circumpolar Current, Deep Sea Research Part I: Oceanographic Research Papers, 42, 641–673, https://doi.org/10.1016/0967-0637(95)00021-W, 1995. 

Pardo, P. C., Pérez, F. F., Khatiwala, S., and Ríos, A. F.: Anthropogenic CO2 estimates in the Southern Ocean: Storage partitioning in the different water masses, Progress in Oceanography, 120, 230–242, https://doi.org/10.1016/j.pocean.2013.09.005, 2014. 

Pardo, P. C., Tilbrook, B., Langlais, C., Trull, T. W., and Rintoul, S. R.: Carbon uptake and biogeochemical change in the Southern Ocean, south of Tasmania, Biogeosciences, 14, 5217–5237, https://doi.org/10.5194/bg-14-5217-2017, 2017. 

Pellichero, V., Sallée, J.-B., Chapman, C. C., and Downes, S. M.: The southern ocean meridional overturning in the sea-ice sector is driven by freshwater fluxes, Nature Communications, 9, 1789, https://doi.org/10.1038/s41467-018-04101-2, 2018. 

Perez, F. F. and Fraga, F.: Association constant of fluoride and hydrogen ions in seawater, Marine Chemistry, 21, 161–168, https://doi.org/10.1016/0304-4203(87)90036-3, 1987. 

Petrou, K., Baker, K. G., Nielsen, D. A., Hancock, A. M., Schulz, K. G., and Davidson, A. T.: Acidification diminishes diatom silica production in the Southern Ocean, Nature Climate Change, 9, 781–786, https://doi.org/10.1038/s41558-019-0557-y, 2019. 

Qi, D., Ouyang, Z., Chen, L., Wu, Y., Lei, R., Chen, B., Feely, R. A., Anderson, L. G., Zhong, W., Lin, H., Polukhin, A., Zhang, Y., Zhang, Y., Bi, H., Lin, X., Luo, Y., Zhuang, Y., He, J., Chen, J., and Cai, W.-J.: Climate change drives rapid decadal acidification in the Arctic Ocean from 1994 to 2020, Science, 377, 1544–1550, https://doi.org/10.1126/science.abo0383, 2022. 

Ríos, A. F., Velo, A., Pardo, P. C., Hoppema, M., and Pérez, F. F.: An update of anthropogenic CO2 storage rates in the western South Atlantic basin and the role of Antarctic Bottom Water, Journal of Marine Systems, 94, 197–203, https://doi.org/10.1016/j.jmarsys.2011.11.023, 2012. 

Riser, S. C., Freeland, H. J., Roemmich, D., Wijffels, S., Troisi, A., Belbéoch, M., Gilbert, D., Xu, J., Pouliquen, S., Thresher, A., Le Traon, P.-Y., Maze, G., Klein, B., Ravichandran, M., Grant, F., Poulain, P.-M., Suga, T., Lim, B., Sterl, A., Sutton, P., Mork, K.-A., Vélez-Belchí, P. J., Ansorge, I., King, B., Turton, J., Baringer, M., and Jayne, S. R.: Fifteen years of ocean observations with the global Argo array, Nature Climate Change, 6, 145–153, https://doi.org/10.1038/nclimate2872, 2016. 

Sabine, C. L., Feely, R. A., Gruber, N., Key, R. M., Lee, K., Bullister, J. L., Wanninkhof, R., Wong, C. S., Wallace, D. W. R., Tilbrook, B., Millero, F. J., Peng, T.-H., Kozyr, A., Ono, T., and Rios, A. F.: The Oceanic Sink for Anthropogenic CO2, Science, 305, 367–371, https://doi.org/10.1126/science.1097403, 2004. 

Sauzède, R., Bittig, H. C., Claustre, H., Pasqueron de Fommervault, O., Gattuso, J.-P., Legendre, L., and Johnson, K. S.: Estimates of Water-Column Nutrient Concentrations and Carbonate System Parameters in the Global Ocean: A Novel Approach Based on Neural Networks, Frontiers in Marine Science, 4, https://doi.org/10.3389/fmars.2017.00128, 2017. 

Sharp, J. D., Pierrot, D., Humphreys, M. P., Epitalon, J.-M., Orr, J. C., Lewis, E. R., and Wallace, D. W. R.: CO2SYSv3 for MATLAB (Version v3.2.1), Zenodo [code], https://doi.org/10.5281/zenodo.3950562, 2023. 

Silvano, A., Purkey, S., Gordon, A. L., Castagno, P., Stewart, A. L., Rintoul, S. R., Foppert, A., Gunn, K. L., Herraiz-Borreguero, L., Aoki, S., Nakayama, Y., Naveira Garabato, A. C., Spingys, C., Akhoudas, C. H., Sallée, J.-B., de Lavergne, C., Abrahamsen, E. P., Meijers, A. J. S., Meredith, M. P., Zhou, S., Tamura, T., Yamazaki, K., Ohshima, K. I., Falco, P., Budillon, G., Hattermann, T., Janout, M. A., Llanillo, P., Bowen, M. M., Darelius, E., Østerhus, S., Nicholls, K. W., Stevens, C., Fernandez, D., Cimoli, L., Jacobs, S. S., Morrison, A. K., Hogg, A. M., Haumann, F. A., Mashayek, A., Wang, Z., Kerr, R., Williams, G. D., and Lee, W. S.: Observing Antarctic Bottom Water in the Southern Ocean, Frontiers in Marine Science, 10, https://doi.org/10.3389/fmars.2023.1221701, 2023. 

Talley, L. D.: Closure of the Global Overturning Circulation Through the Indian, Pacific, and Southern Oceans: Schematics and Transports, Oceanography, 26, 80–97, https://doi.org/10.5670/oceanog.2013.07, 2013. 

Tanhua, T., Körtzinger, A., Friis, K., Waugh, D. W., and Wallace, D. W. R.: An estimate of anthropogenic CO2 inventory from decadal changes in oceanic carbon content, Proceedings of the National Academy of Sciences, 104, 3037–3042, https://doi.org/10.1073/pnas.0606574104, 2007. 

Touratier, F. and Goyet, C.: Definition, properties, and Atlantic Ocean distribution of the new tracer TrOCA, Journal of Marine Systems, 46, 169–179, https://doi.org/10.1016/j.jmarsys.2003.11.016, 2004. 

Touratier, F., Azouzi, L., and Goyet, C.: CFC-11, Δ14C and 3H tracers as a means to assess anthropogenic CO2 concentrations in the ocean, Tellus B, 59, 318–325, https://doi.org/10.1111/j.1600-0889.2006.00247.x, 2007. 

Vázquez-Rodríguez, M., Touratier, F., Lo Monaco, C., Waugh, D. W., Padin, X. A., Bellerby, R. G. J., Goyet, C., Metzl, N., Ríos, A. F., and Pérez, F. F.: Anthropogenic carbon distributions in the Atlantic Ocean: data-based estimates from the Arctic to the Antarctic, Biogeosciences, 6, 439–451, https://doi.org/10.5194/bg-6-439-2009, 2009. 

Williams, N. L., Juranek, L. W., Feely, R. A., Johnson, K. S., Sarmiento, J. L., Talley, L. D., Dickson, A. G., Gray, A. R., Wanninkhof, R., Russell, J. L., Riser, S. C., and Takeshita, Y.: Calculating surface ocean pCO2 from biogeochemical Argo floats equipped with pH: An uncertainty analysis, Global Biogeochemical Cycles, 31, 591–604, https://doi.org/10.1002/2016GB005541, 2017. 

Wu, Y. and Qi, D.: The controversial Southern Ocean air-sea CO2 flux in the era of autonomous ocean observations, Science Bulletin, 68, https://doi.org/10.1016/j.scib.2023.08.059, 2023. 

Wu, Y., Bakker, D. C. E., Achterberg, E. P., Silva, A. N., Pickup, D. D., Li, X., Hartman, S., Stappard, D., Qi, D., and Tyrrell, T.: Integrated analysis of carbon dioxide and oxygen concentrations as a quality control of ocean float data, Communications Earth & Environment, 3, 92, https://doi.org/10.1038/s43247-022-00421-w, 2022.  

Xue, L., Cai, W.-J., Takahashi, T., Gao, L., Wanninkhof, R., Wei, M., Li, K., Feng, L., and Yu, W.: Climatic modulation of surface acidification rates through summertime wind forcing in the Southern Ocean, Nature Communications, 9, 3240, https://doi.org/10.1038/s41467-018-05443-7, 2018. 

Zemskova, V. E., He, T.-L., Wan, Z., and Grisouard, N.: A deep-learning estimate of the decadal trends in the Southern Ocean carbon storage, Nature Communications, 13, 4056, https://doi.org/10.1038/s41467-022-31560-5, 2022. 

Zhang, S., Wu, Y., Cai, W.-J., Cai, W., Feely, R. A., Wang, Z., Tanhua, T., Wang, Y., Liu, C., Li, X., Yang, Q., Ding, M., Xu, Z., Kerr, R., Luo, Y., Cheng, X., Chen, L., and Qi, D.: Transport of Anthropogenic Carbon From the Antarctic Shelf to Deep Southern Ocean Triggers Acidification, Global Biogeochemical Cycles, 37, e2023GB007921, https://doi.org/10.1029/2023GB007921, 2023. 

Zhong, W., Ma, X., Wu, Y., Li, C., Shi, T., Gong, W., and Qi, D.: Southern Ocean CO2 Machine Learning products (SOCOML), V2, Mendeley Data [data set], https://doi.org/10.17632/xzr59ngmpz.2, 2025a. 

Zhong, W., Wu, Y., and Qi, D.: Southern Ocean CO2 Machine Learning data products (SOCOML) from 1972-11-20 to 2025-02-23, NOAA National Centers for Environmental Information [data set], https://doi.org/10.25921/8c29-rv75, 2025b. 

Download
Short summary
This work addresses a critical observational gap in the Southern Ocean — one of the most important regions for carbon uptake — by integrating comprehensive Argo float observations with historical ship-based measurements. Our findings demonstrate the feasibility of using machine learning models to integrate observations, and support in-depth analyses of carbon transport and storage mechanisms. This can foster broader utilization of Argo floats data in ocean carbon research.
Share
Altmetrics
Final-revised paper
Preprint