<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing with OASIS Tables v3.0 20080202//EN" "https://jats.nlm.nih.gov/nlm-dtd/publishing/3.0/journalpub-oasis3.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:oasis="http://docs.oasis-open.org/ns/oasis-exchange/table" xml:lang="en" dtd-version="3.0" article-type="data-paper">
  <front>
    <journal-meta><journal-id journal-id-type="publisher">ESSD</journal-id><journal-title-group>
    <journal-title>Earth System Science Data</journal-title>
    <abbrev-journal-title abbrev-type="publisher">ESSD</abbrev-journal-title><abbrev-journal-title abbrev-type="nlm-ta">Earth Syst. Sci. Data</abbrev-journal-title>
  </journal-title-group><issn pub-type="epub">1866-3516</issn><publisher>
    <publisher-name>Copernicus Publications</publisher-name>
    <publisher-loc>Göttingen, Germany</publisher-loc>
  </publisher></journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.5194/essd-18-3125-2026</article-id><title-group><article-title>GEOXYGEN: a global long-term dissolved oxygen dataset based on biogeochemistry-aware machine learning framework and multi-source observations</article-title><alt-title>GEOXYGEN</alt-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author" corresp="no" rid="aff1">
          <name><surname>Wang</surname><given-names>Zhenguo</given-names></name>
          
        <ext-link>https://orcid.org/0009-0009-8663-1279</ext-link></contrib>
        <contrib contrib-type="author" corresp="yes" rid="aff1 aff2">
          <name><surname>Fu</surname><given-names>Weiwei</given-names></name>
          <email>wwfu@fudan.edu.cn</email>
        <ext-link>https://orcid.org/0000-0003-4965-0832</ext-link></contrib>
        <contrib contrib-type="author" corresp="no" rid="aff3 aff4">
          <name><surname>Xue</surname><given-names>Cunjin</given-names></name>
          
        </contrib>
        <contrib contrib-type="author" corresp="no" rid="aff1">
          <name><surname>Wang</surname><given-names>Guihua</given-names></name>
          
        </contrib>
        <aff id="aff1"><label>1</label><institution>Department of Atmospheric and Oceanic Sciences, Fudan University, Shanghai, 200438, China</institution>
        </aff>
        <aff id="aff2"><label>2</label><institution>Institute of Eco-Chongming (IEC), 1050 Baozhen, Lühua Town, Chongming District, Shanghai 202151, China</institution>
        </aff>
        <aff id="aff3"><label>3</label><institution>International Research Center of Big Data for Sustainable Development Goals, Beijing 100094, China</institution>
        </aff>
        <aff id="aff4"><label>4</label><institution>Key Laboratory of Digital Earth Science, Aerospace Information Research Institute,  Chinese Academy of Sciences, Beijing 100094, China</institution>
        </aff>
      </contrib-group>
      <author-notes><corresp id="corr1">Weiwei Fu (wwfu@fudan.edu.cn)</corresp></author-notes><pub-date><day>12</day><month>May</month><year>2026</year></pub-date>
      
      <volume>18</volume>
      <issue>5</issue>
      <fpage>3125</fpage><lpage>3146</lpage>
      <history>
        <date date-type="received"><day>18</day><month>November</month><year>2025</year></date>
           <date date-type="rev-request"><day>1</day><month>December</month><year>2025</year></date>
           <date date-type="rev-recd"><day>24</day><month>April</month><year>2026</year></date>
           <date date-type="accepted"><day>25</day><month>April</month><year>2026</year></date>
      </history>
      <permissions>
        <copyright-statement>Copyright: © 2026 Zhenguo Wang et al.</copyright-statement>
        <copyright-year>2026</copyright-year>
      <license license-type="open-access"><license-p>This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this licence, visit <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">https://creativecommons.org/licenses/by/4.0/</ext-link></license-p></license></permissions><self-uri xlink:href="https://essd.copernicus.org/articles/18/3125/2026/essd-18-3125-2026.html">This article is available from https://essd.copernicus.org/articles/18/3125/2026/essd-18-3125-2026.html</self-uri><self-uri xlink:href="https://essd.copernicus.org/articles/18/3125/2026/essd-18-3125-2026.pdf">The full text article is available as a PDF file from https://essd.copernicus.org/articles/18/3125/2026/essd-18-3125-2026.pdf</self-uri>
      <abstract><title>Abstract</title>

      <p id="d2e130">Dissolved oxygen (DO) serves as an essential indicator of marine ecosystem health. However, sparse and uneven observations have limited our ability to characterize its full spatiotemporal variability, underscoring the continued need for long-term, high-resolution, and physically consistent global DO datasets. Here, we present GEOXYGEN, a global dataset of monthly DO fields at 0.5° <inline-formula><mml:math id="M1" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> 0.5° resolution spanning 1960–2024 and depths from the surface to 5500 m (Wang et al., 2026a, <ext-link xlink:href="https://doi.org/10.5281/zenodo.19703198" ext-link-type="DOI">10.5281/zenodo.19703198</ext-link>; Wang et al., 2026b, <ext-link xlink:href="https://doi.org/10.12157/IOCAS.20260223.002" ext-link-type="DOI">10.12157/IOCAS.20260223.002</ext-link>). GEOXYGEN is generated with a hierarchical modeling framework that accounts for regional and vertical heterogeneity. By combining physical and biogeochemical predictors with an adaptive feature selection strategy, GEOXYGEN demonstrated high predictive accuracy (<inline-formula><mml:math id="M2" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> <inline-formula><mml:math id="M3" display="inline"><mml:mo>&gt;</mml:mo></mml:math></inline-formula> 0.9) in independent temporal tests. The reconstructed spatial patterns align closely with the World Ocean Atlas 2023 climatology, and in subsurface waters, GEOXYGEN demonstrates superior generalization relative to existing data-driven products. Uncertainty analysis shows that the uncertainty in nearshore and shelf regions is approximately twice that in the open ocean, while long-term deoxygenation trends remain stable even without satellite-era sea-surface predictors. Additionally, a ship-only analysis of the Southern Ocean indicates that early reconstructions are robust, unaffected by the inclusion of Argo observations. GEOXYGEN offers a consistent, physically informed baseline for investigating global and regional DO variability, providing an important tool for evaluating the representation of DO in climate and Earth system models.</p>
  </abstract>
    
<funding-group>
<award-group id="gs1">
<funding-source>Natural Science Foundation of Shanghai Municipality</funding-source>
<award-id>24ZR1404500</award-id>
</award-group>
<award-group id="gs2">
<funding-source>Science and Technology Commission of Shanghai Municipality</funding-source>
<award-id>25DZ3102200</award-id>
</award-group>
<award-group id="gs3">
<funding-source>National Natural Science Foundation of China</funding-source>
<award-id>42476011</award-id>
</award-group>
</funding-group>
</article-meta>
  </front>
<body>
      

<sec id="Ch1.S1" sec-type="intro">
  <label>1</label><title>Introduction</title>
      <p id="d2e173">Ocean dissolved oxygen (DO) concentration serves as an essential indicator of marine ecosystem health and biogeochemical status (Robinson, 2019; Grégoire et al., 2023). Beyond its ecological significance, DO plays a critical role in modulating climate-relevant biogeochemical feedbacks in the global carbon cycle (Gregoire et al., 2021; Oschlies, 2021; Yamaguchi et al., 2024). Observations over recent decades reveal marked spatiotemporal variability in DO, accompanied by a clear trend toward deoxygenation (Ito et al., 2017), particularly within tropical oxygen minimum zones (OMZs) and in subsurface waters at mid to high latitudes (Bopp et al., 2013; Li et al., 2020). This loss of oxygen is projected to persist under continued global warming (Gong et al., 2021; Zhou et al., 2022), with growing consequences for marine habitats, fisheries, and ecosystem services (Breitburg et al., 2018; Kim et al., 2023; Chen et al., 2024; Humphries et al., 2024).</p>
      <p id="d2e176">Sparse and heterogeneous observational coverage hinders accurate estimation of the global oxygen inventory. A seminal study by Schmidtko et al. (2017) estimates a 2 % decline (4.8 <inline-formula><mml:math id="M4" display="inline"><mml:mo>±</mml:mo></mml:math></inline-formula> 2.1 Pmol) in the global ocean oxygen inventory from 1960 to 2009. Yet, the accuracy of such an estimate depends heavily on the observations used. Historically, DO measurements have been sourced from ship-based campaigns compiled in global databases such as the World Ocean Database (WOD) and the Global Ocean Data Analysis Project (GLODAP), which exhibit strong spatial and temporal sampling biases (Garcia et al., 1998). The resulting unevenness in data availability across time, space, and quality standards, especially in coastal waters, complicates robust quantification of deoxygenation rates, particularly in dynamic and vulnerable systems such as coastal shelves. These limitations underscore the pressing need for a spatially continuous, long-term, and accurate global DO reconstruction.</p>
      <p id="d2e186">Multiple approaches have been developed to address these observational gaps. Earth system models (ESMs) simulate four-dimensional DO fields continuously but often suffer from systematic biases and incomplete representation of multi-scale processes (Cocco et al., 2013; Oschlies et al., 2018). Limited observational constraints further compound uncertainties in model evaluation and in projections. Traditional statistical interpolation methods can reproduce mean climatologies but frequently underestimate trends in data-sparse regions and fail to capture seasonal to interannual variability (Ito et al., 2024b; Cheng and Gouretski, 2024). In recent years, data-driven machine learning (ML) has emerged as a promising alternative, leveraging relationships between DO and physical or biogeochemical covariates to reconstruct continuous four-dimensional fields from sparse in situ measurements (Sharp et al., 2023; Garabaghi et al., 2023; Huang et al., 2023; Wang et al., 2024; Lu et al., 2024). In principle, ML can recover local variability and identify deoxygenation risk without relying on computationally expensive coupled simulations.</p>
      <p id="d2e189">Despite this potential, several methodological challenges remain. First, many existing ML reconstructions rely on a single model trained on global ocean data. However, the physical and biogeochemical controls on DO vary markedly across regions, and the relationships between DO and its environmental drivers are strongly nonstationary in space and time (Garabaghi et al., 2023). As a result, a unified global mapping can be unduly shaped by data-rich areas, so that inferred relationships in data-sparse regions are effectively constrained by remote analogs, yielding unstable long-term trends. Second, a common workflow is to first reconstruct DO at scattered profile locations and then interpolate these point estimates onto a regular grid, often using a limited set of predictors such as temperature and salinity (Sharp et al., 2023; Wang et al., 2024; Liu et al., 2025). In data-sparse regions, this two-step procedure encourages extrapolation, propagates local errors, and can generate spurious fine-scale structure that is not supported by the underlying observations, particularly near sharp DO gradients and in historically undersampled basins. Third, training models directly on raw profiles amplifies sampling biases: autonomous platforms such as Argo repeatedly sample specific regions and depth ranges, whereas historical ship-based surveys are concentrated along cruise tracks (Huang et al., 2023; Lu et al., 2024). Without explicit rebalancing or weighting, ML models place disproportionate emphasis on well-observed areas and generalize poorly elsewhere, leading to reconstructions that systematically underrepresent variability and trends in data-poor regions.</p>
      <p id="d2e193">At the dataset level, available ML-based global DO products, such as GOBAI-O2 (Sharp et al., 2023), G4D-DOC (Xue et al., 2024), and ML4O2 (Ito et al., 2024a), represent important advances, providing monthly gridded DO fields at 1° resolution over multi-year to multi-decadal periods and resolving much of the upper and intermediate ocean. However, these products are typically either limited in temporal coverage or do not extend below approximately 1000–2000 m. To our knowledge, there is currently no single observation-based product that combines pre-Argo coverage from the 1960s, full-depth global fields, and sub-degree horizontal resolution.</p>
      <p id="d2e196">To address these methodological and dataset-level limitations, we generated GEOXYGEN, a monthly global DO dataset at 0.5° <inline-formula><mml:math id="M5" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> 0.5° resolution on 176 depth levels from 1960 to 2024 (Wang et al., 2026b, <ext-link xlink:href="https://doi.org/10.12157/IOCAS.20260223.002" ext-link-type="DOI">10.12157/IOCAS.20260223.002</ext-link>). The dataset is generated by combining a global compilation of in situ DO profiles with objectively analyzed temperature–salinity fields and related sea-surface environmental variables, and by learning their relationships with DO through an integrated machine-learning reconstruction workflow, which combines regional partitioning, depth-wise modeling, adaptive feature selection, and physically informed predictors. Specifically, we implement heterogeneity-based partitioning by training separate submodels for each depth layer within each region and incorporating monthly climatological environmental state fields as prior information. Within each region–depth unit, we adaptively select predictive features from a suite of variables including temperature, salinity, oxygen saturation, physical indicators, carbonate-system parameters, and bio-optical properties, thereby ensuring physical interpretability while minimizing redundancy. To mitigate sampling bias and boundary discontinuities, we apply inverse-density weighting, decadal-block cross-validation, and cross-boundary fusion. The resulting GEOXYGEN product provides a consistent, long-term, and spatially complete representation of global DO suitable for quantifying global and regional deoxygenation, diagnosing underlying drivers, and evaluating Earth system and biogeochemical models.</p>
</sec>
<sec id="Ch1.S2">
  <label>2</label><title>Data</title>
<sec id="Ch1.S2.SS1">
  <label>2.1</label><title>Oxygen Data</title>
      <p id="d2e224">To reconstruct a long-term, global ocean DO dataset, we compiled several complementary in situ data products. These include CLIVAR and the Carbon Hydrographic Data Office (CCHDO), GLODAP, the GEOTRACES Intermediate Data Product (IDP2021), the OceanSITES mooring network, and the internally consistent OSD/CTD and Argo DO dataset of Gouretski et al. (2024). The OSD/CTD and Argo profiles are merged under a single automated QC framework, and Argo DO biases are evaluated and corrected using contemporaneous reference measurements. This procedure improves cross-platform consistency across platforms and helps reduce platform-related systematic differences that can affect later modeling.</p>
      <p id="d2e227">A rigorous dual-stage QC protocol ensured observational reliability. The primary stage involved standardizing metadata formats and units across disparate sources, retaining only observations flagged as “good” or “probably good”. Spurious terrestrial signals were omitted via land-masking, and duplicate profiles – defined by coincidence criteria of <inline-formula><mml:math id="M6" display="inline"><mml:mo>≤</mml:mo></mml:math></inline-formula> 1 km spatial distance and <inline-formula><mml:math id="M7" display="inline"><mml:mo>≤</mml:mo></mml:math></inline-formula> 24 h temporal difference – were identified across archives. In cases of redundancy, we prioritized profiles with the highest vertical sampling density.</p>
      <p id="d2e244">Standardization was further achieved by mapping observations to 176 vertical levels, adopting an expanding grid resolution (10 m intervals above 800 m, 20 to 2000 m, and 100 m thereafter down to 5500 m). This configuration aligns with the vertical frameworks of CORA and ISAS17 (Szekely et al., 2019; Kolodziejczyk et al., 2023). For the original profiles, DO, temperature, and salinity were mapped using a shape-preserving piecewise cubic Hermite interpolator (PCHIP) strictly within the observed pressure range, precluding vertical extrapolation. Profiles from Gouretski et al. (2024) were utilized directly without further interpolation as they were pre-aligned to these standard levels.</p>
      <p id="d2e247">To avoid adding values in large vertical gaps where observations are sparse while retaining only levels that are locally supported by sufficient observations, we applied an “observation constraint” filter to the interpolated standard-level data. For each standard level <inline-formula><mml:math id="M8" display="inline"><mml:mi>z</mml:mi></mml:math></inline-formula>, we first used a strict window <inline-formula><mml:math id="M9" display="inline"><mml:mrow><mml:mi>x</mml:mi><mml:mfenced close=")" open="("><mml:mi>z</mml:mi></mml:mfenced></mml:mrow></mml:math></inline-formula>: the level is kept if at least one observed pressure exists within <inline-formula><mml:math id="M10" display="inline"><mml:mrow><mml:mo>±</mml:mo><mml:mi>x</mml:mi><mml:mfenced close=")" open="("><mml:mi>z</mml:mi></mml:mfenced></mml:mrow></mml:math></inline-formula>. For levels that fail the strict window, we then applied a relaxed window <inline-formula><mml:math id="M11" display="inline"><mml:mrow><mml:mi>y</mml:mi><mml:mfenced close=")" open="("><mml:mi>z</mml:mi></mml:mfenced></mml:mrow></mml:math></inline-formula>, but required at least two observed pressures within <inline-formula><mml:math id="M12" display="inline"><mml:mrow><mml:mo>±</mml:mo><mml:mi>y</mml:mi><mml:mfenced close=")" open="("><mml:mi>z</mml:mi></mml:mfenced></mml:mrow></mml:math></inline-formula>. The windows vary with depth <inline-formula><mml:math id="M13" display="inline"><mml:mi>z</mml:mi></mml:math></inline-formula>, as described in Eq. (1). 

            <disp-formula id="Ch1.E1" content-type="numbered"><label>1</label><mml:math id="M14" display="block"><mml:mtable rowspacing="0.2ex" class="split" displaystyle="true" columnalign="right left"><mml:mtr><mml:mtd/><mml:mtd><mml:mrow><mml:mi>x</mml:mi><mml:mfenced open="(" close=")"><mml:mi>z</mml:mi></mml:mfenced><mml:mo>=</mml:mo><mml:mfenced close="" open="{"><mml:mtable class="array" columnalign="left left"><mml:mtr><mml:mtd><mml:mrow><mml:mn mathvariant="normal">2</mml:mn><mml:mo>,</mml:mo></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mi>z</mml:mi><mml:mo>≤</mml:mo><mml:mn mathvariant="normal">50</mml:mn></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mrow><mml:mn mathvariant="normal">5</mml:mn><mml:mo>,</mml:mo></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mn mathvariant="normal">50</mml:mn><mml:mo>&lt;</mml:mo><mml:mi>z</mml:mi><mml:mo>≤</mml:mo><mml:mn mathvariant="normal">800</mml:mn></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mrow><mml:mn mathvariant="normal">10</mml:mn><mml:mo>,</mml:mo></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mn mathvariant="normal">800</mml:mn><mml:mo>&lt;</mml:mo><mml:mi>z</mml:mi><mml:mo>≤</mml:mo><mml:mn mathvariant="normal">2000</mml:mn></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mrow><mml:mn mathvariant="normal">20</mml:mn><mml:mo>,</mml:mo></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mi>z</mml:mi><mml:mo>&gt;</mml:mo><mml:mn mathvariant="normal">2000</mml:mn></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mfenced><mml:mi mathvariant="normal">m</mml:mi><mml:mo>,</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd/><mml:mtd><mml:mrow><mml:mi>y</mml:mi><mml:mfenced close=")" open="("><mml:mi>z</mml:mi></mml:mfenced><mml:mo>=</mml:mo><mml:mfenced close="" open="{"><mml:mtable class="array" columnalign="left left"><mml:mtr><mml:mtd><mml:mrow><mml:mn mathvariant="normal">5</mml:mn><mml:mo>,</mml:mo></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mi>z</mml:mi><mml:mo>≤</mml:mo><mml:mn mathvariant="normal">50</mml:mn></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mrow><mml:mn mathvariant="normal">15</mml:mn><mml:mo>,</mml:mo></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mn mathvariant="normal">50</mml:mn><mml:mo>&lt;</mml:mo><mml:mi>z</mml:mi><mml:mo>≤</mml:mo><mml:mn mathvariant="normal">800</mml:mn></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mrow><mml:mn mathvariant="normal">30</mml:mn><mml:mo>,</mml:mo></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mn mathvariant="normal">800</mml:mn><mml:mo>&lt;</mml:mo><mml:mi>z</mml:mi><mml:mo>≤</mml:mo><mml:mn mathvariant="normal">2000</mml:mn></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mrow><mml:mn mathvariant="normal">120</mml:mn><mml:mo>,</mml:mo></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mi>z</mml:mi><mml:mo>&gt;</mml:mo><mml:mn mathvariant="normal">2000</mml:mn></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mfenced><mml:mi mathvariant="normal">m</mml:mi><mml:mo>,</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>

          We also recorded an interpolation uncertainty term, <inline-formula><mml:math id="M15" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mi mathvariant="normal">interp</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>, to represent the local error scale introduced by vertical interpolation, as defined in Eq. (2). Using a first-order approximation, this uncertainty depends on the local vertical DO gradient and the distance to the nearest valid observed pressure. We estimated the local gradient <inline-formula><mml:math id="M16" display="inline"><mml:mrow><mml:mfenced open="|" close="|"><mml:mrow><mml:mo>∂</mml:mo><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">O</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow><mml:mo>/</mml:mo><mml:mo>∂</mml:mo><mml:mi>p</mml:mi><mml:mfenced close=")" open="("><mml:mi>z</mml:mi></mml:mfenced></mml:mrow></mml:mfenced></mml:mrow></mml:math></inline-formula> from adjacent observed points along the profile, defined <inline-formula><mml:math id="M17" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>p</mml:mi><mml:mfenced open="(" close=")"><mml:mi>z</mml:mi></mml:mfenced></mml:mrow></mml:math></inline-formula> as the pressure distance between the standard level and the nearest valid observed point.

            <disp-formula id="Ch1.E2" content-type="numbered"><label>2</label><mml:math id="M18" display="block"><mml:mrow><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mi mathvariant="normal">interp</mml:mi></mml:msub><mml:mfenced open="(" close=")"><mml:mi>z</mml:mi></mml:mfenced><mml:mo>=</mml:mo><mml:mfenced open="|" close="|"><mml:mrow><mml:mover accent="true"><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:mo>∂</mml:mo><mml:msub><mml:mi mathvariant="normal">O</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow><mml:mrow><mml:mo>∂</mml:mo><mml:mi>p</mml:mi></mml:mrow></mml:mfrac></mml:mstyle><mml:mo mathvariant="normal">‾</mml:mo></mml:mover><mml:mfenced close=")" open="("><mml:mi>z</mml:mi></mml:mfenced></mml:mrow></mml:mfenced><mml:mo>⋅</mml:mo><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>p</mml:mi><mml:mfenced open="(" close=")"><mml:mi>z</mml:mi></mml:mfenced></mml:mrow></mml:math></disp-formula>

          A second QC stage was applied to the standard levels. First, we excluded records where DO exceeded the 0–600 <inline-formula><mml:math id="M19" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">µ</mml:mi></mml:mrow></mml:math></inline-formula>mol kg<sup>−1</sup> range. Next, to limit uncertainty from vertical interpolation, data with <inline-formula><mml:math id="M21" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mi mathvariant="normal">interp</mml:mi></mml:msub><mml:mo>&gt;</mml:mo><mml:mn mathvariant="normal">3</mml:mn></mml:mrow></mml:math></inline-formula> <inline-formula><mml:math id="M22" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">µ</mml:mi></mml:mrow></mml:math></inline-formula>mol kg<sup>−1</sup> were removed, based on the 95th percentile of <inline-formula><mml:math id="M24" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mi mathvariant="normal">interp</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> across all standard-level samples. Finally, following TEOS-10, oxygen saturation percentage (Sat %) was computed using in-situ temperature and salinity data. For standard levels deeper than 200 m, records with Sat % <inline-formula><mml:math id="M25" display="inline"><mml:mo>≥</mml:mo></mml:math></inline-formula> 120 % were considered erroneous and removed.</p>

      <fig id="F1" specific-use="star"><label>Figure 1</label><caption><p id="d2e666">Global distribution and temporal coverage of DO profiles. <bold>(a)</bold> Changes in the number of profiles from major data sources during 1950–2024. <bold>(b–c)</bold> Spatial distribution of decadal-mean profile counts for 1960–1980 and 2000–2020, computed on a 1° <inline-formula><mml:math id="M26" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> 1° grid. The color bar indicates the decadal-mean number of profiles per grid cell (log scale). G24 denotes the oxygen compilation of Gouretski et al. (2024).</p></caption>
          <graphic xlink:href="https://essd.copernicus.org/articles/18/3125/2026/essd-18-3125-2026-f01.png"/>

        </fig>

      <p id="d2e688">After two-stage QC, we assembled a high-quality archive of 930 252 DO profiles, of which 94 % are derived from the OSD, CTD, and Argo platforms. Figure 1a captures the decisive transition in the historical data record: a long-standing reliance on ship-based OSD and CTD profiles was superseded post-2000 by the exponential growth of the Argo network, which now serves as the backbone of global oxygen monitoring. Platform-specific dominance is also depth-dependent, defining the vertical representativeness of the dataset. Above 800 m, the combined prevalence of OSD, CTD, and Argo remains robustly above 90 %. However, the intermediate depths (800–1800 m) mark a transitional zone where Argo's increasing share compensates for the thinning OSD/CTD coverage. Beyond 2000 m, where Argo penetration is limited, the deep-ocean constraint returns to the OSD and CTD platforms, which maintain a prevalence of at least 80 %. This tiered vertical structure ensures that secondary data products remain supplementary to the core observational framework.</p>
      <p id="d2e691">The geometric evolution of sampling has shifted from ship-track-oriented linear clusters to a near-planar global distribution. This shift is most evident in the Southern Hemisphere, where coverage increased from extreme sparsity prior to 1980 to approximately 36 % of the global total in the 2000–2020 interval. This globalization of the data stream, driven by the Argo array, has mitigated historical hemispheric imbalances, although systematic gaps persist in marginal seas, the central Pacific, and ice-influenced Arctic regions.</p>
      <p id="d2e694">Acknowledging that uneven observational density poses a risk of artificial signal dominance from high-coverage eras, our methodology incorporates specific safeguards to ensure trend stability. By adopting heterogeneity-based partitioned modeling and decadal-scale transferability assessments, we reduce the risk of data-sparse regions being dominated by modern sampling patterns. These efforts are further supported by an inverse-density weighting scheme and the use of grid-representative values, which together enhance the influence of sparse historical records and maintain the physical interpretability of multi-decadal oxygen trends.</p>
</sec>
<sec id="Ch1.S2.SS2">
  <label>2.2</label><title>Feature Data</title>
      <p id="d2e705">Accurate reconstruction of ocean deoxygenation needs both long, dense DO observations and drivers that describe physical transport and biogeochemical processes across scales (Oschlies et al., 2018). Consequently, we leverage three-dimensional thermodynamic fields (temperature and salinity) alongside a curated suite of sea surface environmental variables (SSEVs) to establish a process-based predictor space (Table 1).</p>

<table-wrap id="T1" specific-use="star"><label>Table 1</label><caption><p id="d2e711">Details of the Feature Data.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="6">
     <oasis:colspec colnum="1" colname="col1" align="justify" colwidth="2cm"/>
     <oasis:colspec colnum="2" colname="col2" align="justify" colwidth="3cm"/>
     <oasis:colspec colnum="3" colname="col3" align="justify" colwidth="3cm"/>
     <oasis:colspec colnum="4" colname="col4" align="justify" colwidth="2cm"/>
     <oasis:colspec colnum="5" colname="col5" align="justify" colwidth="1.7cm"/>
     <oasis:colspec colnum="6" colname="col6" align="justify" colwidth="3.5cm"/>
     <oasis:thead>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1" align="left">Variable</oasis:entry>
         <oasis:entry colname="col2" align="left">Description</oasis:entry>
         <oasis:entry colname="col3" align="left">Spatial Resolution</oasis:entry>
         <oasis:entry colname="col4" align="left">Temporal Resolution</oasis:entry>
         <oasis:entry colname="col5" align="left">Temporal Coverage</oasis:entry>
         <oasis:entry colname="col6" align="left">Data Source</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry rowsep="1" colname="col1" align="left">Temperature (°C)</oasis:entry>
         <oasis:entry rowsep="1" colname="col2" align="left">Seawater temperature</oasis:entry>
         <oasis:entry rowsep="1" colname="col3" align="left">0.5° <inline-formula><mml:math id="M27" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> 0.5°; 187 standard depth levels (surface–5500 m)</oasis:entry>
         <oasis:entry rowsep="1" colname="col4" align="left">Monthly</oasis:entry>
         <oasis:entry rowsep="1" colname="col5" align="left">Jan 1960– Jun 2024</oasis:entry>
         <oasis:entry colname="col6" align="left">Szekely et al. (2025)</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1" align="left">Salinity</oasis:entry>
         <oasis:entry colname="col2" align="left">Seawater salinity</oasis:entry>
         <oasis:entry colname="col3" align="left">0.5° <inline-formula><mml:math id="M28" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> 0.5°; 187 standard depth levels (surface–5500 m)</oasis:entry>
         <oasis:entry colname="col4" align="left">Monthly</oasis:entry>
         <oasis:entry colname="col5" align="left">Jan 1960– Jun 2024</oasis:entry>
         <oasis:entry colname="col6" align="left"/>
       </oasis:row>
       <oasis:row>
         <oasis:entry rowsep="1" colname="col1" align="left"><inline-formula><mml:math id="M29" display="inline"><mml:mi mathvariant="bold-italic">U</mml:mi></mml:math></inline-formula> (m s<sup>−1</sup>)</oasis:entry>
         <oasis:entry rowsep="1" colname="col2" align="left"><inline-formula><mml:math id="M31" display="inline"><mml:mi mathvariant="bold-italic">U</mml:mi></mml:math></inline-formula>-wind vector component at 10 m</oasis:entry>
         <oasis:entry colname="col3" align="left">0.25° <inline-formula><mml:math id="M32" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> 0.25°</oasis:entry>
         <oasis:entry colname="col4" align="left">Monthly</oasis:entry>
         <oasis:entry colname="col5" align="left">Jan 1993– Jun 2024</oasis:entry>
         <oasis:entry colname="col6" align="left">Mears et al. (2022)</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1" align="left"><inline-formula><mml:math id="M33" display="inline"><mml:mi mathvariant="bold-italic">V</mml:mi></mml:math></inline-formula> (m s<sup>−1</sup>)</oasis:entry>
         <oasis:entry colname="col2" align="left"><inline-formula><mml:math id="M35" display="inline"><mml:mi mathvariant="bold-italic">V</mml:mi></mml:math></inline-formula>-wind vector component at 10 m</oasis:entry>
         <oasis:entry colname="col3" align="left"/>
         <oasis:entry colname="col4" align="left"/>
         <oasis:entry colname="col5" align="left"/>
         <oasis:entry colname="col6" align="left"/>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1" align="left">MLD (m)</oasis:entry>
         <oasis:entry colname="col2" align="left">Ocean mixed layer depth</oasis:entry>
         <oasis:entry colname="col3" align="left">0.25° <inline-formula><mml:math id="M36" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> 0.25°</oasis:entry>
         <oasis:entry colname="col4" align="left">Monthly</oasis:entry>
         <oasis:entry colname="col5" align="left">Jan 1993– Dec 2022</oasis:entry>
         <oasis:entry colname="col6" align="left">Guinehut et al. (2012)</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry rowsep="1" colname="col1" align="left">DIC (<inline-formula><mml:math id="M37" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">µ</mml:mi></mml:mrow></mml:math></inline-formula>mol kg<sup>−1</sup>)</oasis:entry>
         <oasis:entry rowsep="1" colname="col2" align="left">Surface ocean dissolved inorganic carbon</oasis:entry>
         <oasis:entry colname="col3" align="left">0.25° <inline-formula><mml:math id="M39" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> 0.25°</oasis:entry>
         <oasis:entry colname="col4" align="left">Monthly</oasis:entry>
         <oasis:entry colname="col5" align="left">Jan 1985– Jun 2024</oasis:entry>
         <oasis:entry colname="col6" align="left">Chau et al. (2022, 2024)</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry rowsep="1" colname="col1" align="left">pH</oasis:entry>
         <oasis:entry rowsep="1" colname="col2" align="left">Surface pH on total scale</oasis:entry>
         <oasis:entry colname="col3" align="left"/>
         <oasis:entry colname="col4" align="left"/>
         <oasis:entry colname="col5" align="left"/>
         <oasis:entry colname="col6" align="left"/>
       </oasis:row>
       <oasis:row>
         <oasis:entry rowsep="1" colname="col1" align="left"><inline-formula><mml:math id="M40" display="inline"><mml:mi>p</mml:mi></mml:math></inline-formula>CO<sub>2</sub> (<inline-formula><mml:math id="M42" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">µ</mml:mi></mml:mrow></mml:math></inline-formula>atm)</oasis:entry>
         <oasis:entry rowsep="1" colname="col2" align="left">Surface aqueous partial pressure of CO<sub>2</sub></oasis:entry>
         <oasis:entry colname="col3" align="left"/>
         <oasis:entry colname="col4" align="left"/>
         <oasis:entry colname="col5" align="left"/>
         <oasis:entry colname="col6" align="left"/>
       </oasis:row>
       <oasis:row>
         <oasis:entry rowsep="1" colname="col1" align="left">CO<sub>2</sub> flux (mol m<sup>−2</sup> yr<sup>−1</sup>)</oasis:entry>
         <oasis:entry rowsep="1" colname="col2" align="left">Surface downward flux of total CO<sub>2</sub></oasis:entry>
         <oasis:entry colname="col3" align="left"/>
         <oasis:entry colname="col4" align="left"/>
         <oasis:entry colname="col5" align="left"/>
         <oasis:entry colname="col6" align="left"/>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1" align="left">Alkalinity (<inline-formula><mml:math id="M48" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">µ</mml:mi></mml:mrow></mml:math></inline-formula>mol kg<sup>−1</sup>)</oasis:entry>
         <oasis:entry colname="col2" align="left">Total alkalinity in surface seawater</oasis:entry>
         <oasis:entry colname="col3" align="left"/>
         <oasis:entry colname="col4" align="left"/>
         <oasis:entry colname="col5" align="left"/>
         <oasis:entry colname="col6" align="left"/>
       </oasis:row>
       <oasis:row>
         <oasis:entry rowsep="1" colname="col1" align="left">PAR (mol m<sup>−2</sup> d<sup>−1</sup>)</oasis:entry>
         <oasis:entry rowsep="1" colname="col2" align="left">Photosynthetically available radiation</oasis:entry>
         <oasis:entry colname="col3" align="left">4 km/9 km</oasis:entry>
         <oasis:entry colname="col4" align="left">Monthly</oasis:entry>
         <oasis:entry colname="col5" align="left">Oct 1997– Feb 2025</oasis:entry>
         <oasis:entry colname="col6" align="left">NASA Ocean Biology Processing Group (2018)</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1" align="left">Chl-<inline-formula><mml:math id="M52" display="inline"><mml:mi>a</mml:mi></mml:math></inline-formula> (mg m<sup>−3</sup>)</oasis:entry>
         <oasis:entry colname="col2" align="left">Mass concentration of chlorophyll in surface water</oasis:entry>
         <oasis:entry colname="col3" align="left"/>
         <oasis:entry colname="col4" align="left"/>
         <oasis:entry colname="col5" align="left"/>
         <oasis:entry colname="col6" align="left"/>
       </oasis:row>
       <oasis:row>
         <oasis:entry rowsep="1" colname="col1" align="left">SSH (m)</oasis:entry>
         <oasis:entry rowsep="1" colname="col2" align="left">Sea surface height above geoid</oasis:entry>
         <oasis:entry colname="col3" align="left">0.25° <inline-formula><mml:math id="M54" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> 0.25°</oasis:entry>
         <oasis:entry colname="col4" align="left">Monthly</oasis:entry>
         <oasis:entry colname="col5" align="left">Jan 1993– Aug 2023</oasis:entry>
         <oasis:entry colname="col6" align="left">Hauser et al. (2020)</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1" align="left">EKE (cm<sup>2</sup> s<sup>−2</sup>)</oasis:entry>
         <oasis:entry colname="col2" align="left">Surface averaged eddy kinetic energy</oasis:entry>
         <oasis:entry colname="col3" align="left"/>
         <oasis:entry colname="col4" align="left"/>
         <oasis:entry colname="col5" align="left"/>
         <oasis:entry colname="col6" align="left"/>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

      <p id="d2e1321">Three-dimensional temperature and salinity fields are taken from the Coriolis Ocean Dataset for Reanalysis, CORA (Szekely et al., 2025). CORA is a CMEMS objective analysis that uses the ISAS objective mapping system to merge in situ temperature and salinity observations from ships, Argo floats, and other platforms. It also applies delayed mode quality control to support long-term stability and global consistency. From these fields, oxygen saturation (O<sub>2</sub>Sat) is derived according to the TEOS-10 standard (IOC et al., 2015). The resulting discrepancy between observed DO and O<sub>2</sub>Sat serves as a critical proxy for the integrated influence of microbial respiration and physical ventilation dynamics.</p>
      <p id="d2e1343">Complementarily, we assembled a multidimensional suite of SSEVs, encompassing thermodynamic, dynamical, bio-optical, and carbon-chemistry features (Shao et al., 2024; Ma et al., 2025). Wind-vector components (<inline-formula><mml:math id="M59" display="inline"><mml:mi mathvariant="bold-italic">U</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M60" display="inline"><mml:mi mathvariant="bold-italic">V</mml:mi></mml:math></inline-formula>) are taken from NASA's Cross-Calibrated Multi-Platform (CCMP) product (Mears et al., 2022). Mixed-layer depth (MLD) is obtained from the CMEMS Multi-Observation Global Ocean 3D product (Guinehut et al., 2012). Dynamical variables include sea surface height (SSH) and eddy kinetic energy (EKE), both derived from AVISO satellite altimetry (Hauser et al., 2020). Bio-optical variables comprise photosynthetically active radiation (PAR) and chlorophyll <inline-formula><mml:math id="M61" display="inline"><mml:mi>a</mml:mi></mml:math></inline-formula> (Chl-<inline-formula><mml:math id="M62" display="inline"><mml:mi>a</mml:mi></mml:math></inline-formula>) from NASA Level-3/Level-4 ocean-color products (NASA Ocean Biology Processing Group, 2018). Carbon-chemistry variables include dissolved inorganic carbon (DIC), total alkalinity, pH, sea surface partial pressure of CO<sub>2</sub> (<inline-formula><mml:math id="M64" display="inline"><mml:mi>p</mml:mi></mml:math></inline-formula>CO<sub>2</sub>), and CO<sub>2</sub> flux, all obtained from the CMEMS Surface Ocean Carbon Fields product (Chau et al., 2022, 2024). All feature variables, last accessed in March 2025, were standardized onto a uniform monthly 0.5° grid to maintain spatial consistency across the reconstruction.</p>
</sec>
</sec>
<sec id="Ch1.S3">
  <label>3</label><title>Method</title>
      <p id="d2e1418">The overall workflow for constructing the GEOXYGEN dataset is illustrated in Fig. 2, comprising data collection and preprocessing, heterogeneity-based partitioning, and model training with integrated physical and biogeochemical predictors. By incorporating a spatial clustering descriptor that captures the large-scale background structure, and depth-/region-adaptive feature selection, we implement a biogeochemistry-aware machine-learning workflow for gridded DO reconstruction.</p>

      <fig id="F2" specific-use="star"><label>Figure 2</label><caption><p id="d2e1423">Overall workflow of the GEOXYGEN dataset construction.</p></caption>
        <graphic xlink:href="https://essd.copernicus.org/articles/18/3125/2026/essd-18-3125-2026-f02.png"/>

      </fig>

<sec id="Ch1.S3.SS1">
  <label>3.1</label><title>Gridding and Aggregation</title>
      <p id="d2e1439">To match the spatiotemporal scale of the supervised learning labels with the target reconstruction grid and to reduce sample-weight bias caused by uneven sampling in space and time, all observations are aggregated to a monthly 0.5° by 0.5° grid. Within each grid cell, observations are summarized into a single representative value, defined as the median to limit the influence of outliers and extreme events on the labels. The within-unit dispersion is also computed using the median absolute deviation (MAD), which serves as an empirical proxy for uncertainty.</p>
      <p id="d2e1442">To quantify representativeness error introduced by finite sampling and sub-grid variability within each cell-month-depth unit, within-unit dispersion is used to estimate <inline-formula><mml:math id="M67" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mi mathvariant="normal">rep</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>. Let a unit contain <inline-formula><mml:math id="M68" display="inline"><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mi mathvariant="normal">obs</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> observations <inline-formula><mml:math id="M69" display="inline"><mml:mrow><mml:mo mathvariant="italic">{</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:msubsup><mml:mo mathvariant="italic">}</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mi mathvariant="normal">obs</mml:mi></mml:msub></mml:mrow></mml:msubsup></mml:mrow></mml:math></inline-formula>. The representative value is defined as the robust central statistic, as shown in Eq. (3). We then compute the dispersion and define it as the representativeness error metric, <inline-formula><mml:math id="M70" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mi mathvariant="normal">rep</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>, as described in Eq. (4), where the factor 1.4826 makes <inline-formula><mml:math id="M71" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mi mathvariant="normal">rep</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> comparable to the standard deviation under an approximate normal distribution, facilitating comparison across units.

                <disp-formula specific-use="gather" content-type="numbered"><mml:math id="M72" display="block"><mml:mtable displaystyle="true"><mml:mlabeledtr id="Ch1.E3"><mml:mtd><mml:mtext>3</mml:mtext></mml:mtd><mml:mtd><mml:mrow><mml:mstyle class="stylechange" displaystyle="true"/><mml:mover accent="true"><mml:mi>x</mml:mi><mml:mo stretchy="false" mathvariant="normal">̃</mml:mo></mml:mover><mml:mo>=</mml:mo><mml:mi mathvariant="normal">median</mml:mi><mml:mfenced close=")" open="("><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mfenced></mml:mrow></mml:mtd></mml:mlabeledtr><mml:mlabeledtr id="Ch1.E4"><mml:mtd><mml:mtext>4</mml:mtext></mml:mtd><mml:mtd><mml:mrow><mml:mstyle class="stylechange" displaystyle="true"/><mml:mi mathvariant="normal">MAD</mml:mi><mml:mo>=</mml:mo><mml:mi mathvariant="normal">median</mml:mi><mml:mfenced open="(" close=")"><mml:mfenced close="|" open="|"><mml:mrow><mml:msub><mml:mi mathvariant="normal">x</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:mover accent="true"><mml:mi>x</mml:mi><mml:mo stretchy="false" mathvariant="normal">̃</mml:mo></mml:mover></mml:mrow></mml:mfenced></mml:mfenced><mml:mo>,</mml:mo><mml:mspace width="1em" linebreak="nobreak"/><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mi mathvariant="normal">rep</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1.4826</mml:mn><mml:mo>×</mml:mo><mml:mi mathvariant="normal">MAD</mml:mi></mml:mrow></mml:mtd></mml:mlabeledtr></mml:mtable></mml:math></disp-formula>

          At the same spatiotemporal scale, all environmental variables and DO data are mapped in space and time. To preserve historical information, all QC-passed DO data are retained even when some covariates such as SSEVs are missing. This sample archive provides standardized inputs for later model training.</p>
</sec>
<sec id="Ch1.S3.SS2">
  <label>3.2</label><title>Heterogeneity-Based Partitioning</title>
      <p id="d2e1602">Our reconstruction framework utilizes a spatiotemporally stratified approach to address the shifting controlling mechanisms of ocean deoxygenation across basins and depths (Ma et al., 2025; Ito et al., 2024a). Following the basin definitions in the World Ocean Atlas 2023 (WOA23; Garcia et al., 2024), we additionally treat the Southern Ocean as a dedicated domain. Accordingly, the horizontal grid is divided into five primary modeling domains (Atlantic, Pacific, Indian, Southern, and Arctic), which are held constant across vertical layers to maintain training coherence (Fig. 3). By avoiding highly intricate province boundaries, this design reduces sensitivity of variability and trend estimates to boundary effects and makes cross-boundary continuity easier to maintain. We further mask the Mediterranean, Red Sea, and other semi-enclosed marginal seas to focus the reconstruction on open-ocean dynamics dominated by large-scale circulation and transport.</p>

      <fig id="F3" specific-use="star"><label>Figure 3</label><caption><p id="d2e1607">Partitioning of the global open ocean into five basins.</p></caption>
          <graphic xlink:href="https://essd.copernicus.org/articles/18/3125/2026/essd-18-3125-2026-f03.png"/>

        </fig>

      <p id="d2e1616">The vertical architecture utilizes 176 standard levels, as discussed in Sect. 2.1, similar to the CORA/ISAS17 frameworks (Szekely et al., 2019; Kolodziejczyk et al., 2023). At each standard depth below 3000 m, the framework converges into a single global model, a necessary adaptation to accommodate the physical constraints of the seafloor and data limitations. For each basin and depth layer, separate models are trained based on the partitioned regions.</p>
      <p id="d2e1620">Within each depth, the self-organizing-map-derived descriptor (SOM_DES) is used as a categorical predictor to encode the large-scale climatological background state. Specifically, we construct a four-dimensional climatological vector [SST, SSS, MLD, DO] from monthly background fields. These vectors are then used to train a self-organizing map (SOM), and each grid cell is assigned to one of 25 discrete classes. The clustering is based on the joint patterns of multiple environmental fields, and, at this step, the O<sub>2</sub> climatology is implicitly assigned a higher weight to obtain a seasonally varying, dynamic partitioning. The SOM is trained directly on these monthly climatological fields, such that the DO climatology implicitly exerts a stronger influence on the resulting state partition and ensures that the partitioning evolves coherently with the seasonal DO cycle. This feature-engineering step leverages the SOM's ability to map multivariate climatological structure onto a discrete set of regimes while preserving topological dependencies, thereby providing clear seasonal and spatial context for month-scale DO reconstruction. The background climatological fields are anchored in WOA23 (upper 1500 m) and the IAP Global Ocean Oxygen gridded product (IAP Oxygen; Cheng and Gouretski, 2024), ensuring vertically and horizontally comprehensive baseline states. Overall, SOM_DES captures the large-scale climatological structure of DO and supplies background-state information that supports refined month-scale DO modeling, improving diagnostic consistency across heterogeneous regimes and reinforcing the physical interpretability of the global DO product.</p>
</sec>
<sec id="Ch1.S3.SS3">
  <label>3.3</label><title>Adaptive Regional Modeling</title>
      <p id="d2e1640">We train independent sub-models for each ocean basin partition across 176 standard depth levels, employing the CatBoost gradient boosting framework to resolve the non-linear mapping between sparse DO observations and diverse environmental predictors. CatBoost's implementation of oblivious decision trees and ordered boosting is strategically utilized to mitigate variance while precluding target leakage – a critical requirement for stable multi-decadal reconstruction. This framework is algorithmically predisposed to this task due to its innate capacity to process missing predictors without imputation, its support for cost-sensitive sample weighting, and its robust regularization suite (L2 leaf regularization and early stopping) that safeguards against overfitting in data-sparse regimes.</p>
      <p id="d2e1643">During the period before the satellite era (for example, from the 1960s to the 1980s), this model would automatically identify and properly handle the missing SSEVs. This non-imputational strategy allows the reconstruction to hinge predominantly upon thermodynamic variables (Temperature, Salinity, O<sub>2</sub>Sat) and spatiotemporal coordinates without introducing the systematic biases inherent in statistical filling. We tested longitude as a candidate predictor; however, it induced spurious banded (stripe-like) artifacts in data-sparse regions and was therefore excluded from the final predictor set. The CatBoost model uses 20 predictor variables as inputs: Year, month_sin, month_cos, Latitude, Temperature, Salinity, O<sub>2</sub>Sat, SOM_DES, <inline-formula><mml:math id="M76" display="inline"><mml:mi mathvariant="bold-italic">U</mml:mi></mml:math></inline-formula>, <inline-formula><mml:math id="M77" display="inline"><mml:mi mathvariant="bold-italic">V</mml:mi></mml:math></inline-formula>, SSH, EKE, MLD, PAR, Chl-<inline-formula><mml:math id="M78" display="inline"><mml:mi>a</mml:mi></mml:math></inline-formula>, DIC, <inline-formula><mml:math id="M79" display="inline"><mml:mi>p</mml:mi></mml:math></inline-formula>CO<sub>2</sub>, pH, Alkalinity, and CO<sub>2</sub> flux. The month_sin and month_cos encodings are defined in Eqs. (5)–(6).

                <disp-formula specific-use="gather" content-type="numbered"><mml:math id="M82" display="block"><mml:mtable displaystyle="true"><mml:mlabeledtr id="Ch1.E5"><mml:mtd><mml:mtext>5</mml:mtext></mml:mtd><mml:mtd><mml:mrow><mml:mstyle class="stylechange" displaystyle="true"/><mml:mi mathvariant="normal">month</mml:mi><mml:mi mathvariant="normal">_</mml:mi><mml:mi mathvariant="normal">sin</mml:mi><mml:mo>=</mml:mo><mml:mi mathvariant="normal">sin</mml:mi><mml:mfenced close=")" open="("><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:mn mathvariant="normal">2</mml:mn><mml:mi mathvariant="italic">π</mml:mi><mml:mo>(</mml:mo><mml:mi>m</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>)</mml:mo></mml:mrow><mml:mn mathvariant="normal">12</mml:mn></mml:mfrac></mml:mstyle></mml:mfenced></mml:mrow></mml:mtd></mml:mlabeledtr><mml:mlabeledtr id="Ch1.E6"><mml:mtd><mml:mtext>6</mml:mtext></mml:mtd><mml:mtd><mml:mrow><mml:mstyle class="stylechange" displaystyle="true"/><mml:mi mathvariant="normal">month</mml:mi><mml:mi mathvariant="normal">_</mml:mi><mml:mi mathvariant="normal">cos</mml:mi><mml:mo>=</mml:mo><mml:mi mathvariant="normal">cos</mml:mi><mml:mfenced open="(" close=")"><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:mn mathvariant="normal">2</mml:mn><mml:mi mathvariant="italic">π</mml:mi><mml:mo>(</mml:mo><mml:mi>m</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>)</mml:mo></mml:mrow><mml:mn mathvariant="normal">12</mml:mn></mml:mfrac></mml:mstyle></mml:mfenced><mml:mi>m</mml:mi><mml:mo>∈</mml:mo><mml:mo mathvariant="italic">{</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">2</mml:mn><mml:mo>,</mml:mo><mml:mi mathvariant="normal">…</mml:mi><mml:mo>,</mml:mo><mml:mn mathvariant="normal">12</mml:mn><mml:mo mathvariant="italic">}</mml:mo><mml:mo>,</mml:mo></mml:mrow></mml:mtd></mml:mlabeledtr></mml:mtable></mml:math></disp-formula>

          To improve interpretability and generalization, a two-stage feature selection procedure is applied to each region–depth submodel. Only training data are used, and cross-validation follows the same decadal-scale design used for model evaluation. Tree-based models often benefit from compact feature sets because redundant predictors can reduce predictive skill (Garabaghi et al., 2023). First, we estimated permutation importance under five-fold cross-validation grouped by decade and retained an initial subset of features using an adaptive rule <inline-formula><mml:math id="M83" display="inline"><mml:mrow><mml:mi>K</mml:mi><mml:mo>=</mml:mo><mml:mo>max⁡</mml:mo><mml:mo>(</mml:mo><mml:mn mathvariant="normal">10</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">2</mml:mn><mml:msqrt><mml:mi>p</mml:mi></mml:msqrt><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>, where <inline-formula><mml:math id="M84" display="inline"><mml:mrow><mml:mi>p</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">20</mml:mn></mml:mrow></mml:math></inline-formula> is the number of candidate features, discarding predictors with negligible contribution. Second, we perform recursive feature elimination with cross-validation (RFECV), iteratively removing the least important feature and selecting the combination that minimizes validation Root Mean Square Error (RMSE). This process facilitates a physical evolution of the feature space: surface models prioritize high-frequency biogeochemical forcing, mid-depth models emphasize water-mass transition markers, and deep-ocean models converge on conservative thermodynamic tracers.</p>
      <p id="d2e1853">To mitigate biases arising from heterogeneous spatiotemporal sampling (Fig. 1), we applied inverse-density weighting within a fixed binning scheme. The sample domain is partitioned on a 5° <inline-formula><mml:math id="M85" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> 5° latitude–longitude grid and non-overlapping 10-year time windows in each cell. Sample weights were computed as the inverse of the observational density within each spatiotemporal bin and standardized at each depth level, thereby reducing the influence of over-sampled regions and periods during model training.</p>
      <p id="d2e1863">Let <inline-formula><mml:math id="M86" display="inline"><mml:mrow><mml:mi>b</mml:mi><mml:mfenced close=")" open="("><mml:mi>i</mml:mi></mml:mfenced></mml:mrow></mml:math></inline-formula> denote the spatiotemporal bin containing sample <inline-formula><mml:math id="M87" display="inline"><mml:mi>i</mml:mi></mml:math></inline-formula>, and let <inline-formula><mml:math id="M88" display="inline"><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mrow><mml:mi>b</mml:mi><mml:mfenced close=")" open="("><mml:mi>i</mml:mi></mml:mfenced></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> be the number of samples in that bin. The initial per-sample weight is the inverse square root of this count, as defined in Eq. (7).

            <disp-formula id="Ch1.E7" content-type="numbered"><label>7</label><mml:math id="M89" display="block"><mml:mrow><mml:msub><mml:mover accent="true"><mml:mi>w</mml:mi><mml:mo stretchy="false" mathvariant="normal">̃</mml:mo></mml:mover><mml:mi>i</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mn mathvariant="normal">1</mml:mn><mml:msqrt><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mrow><mml:mi>b</mml:mi><mml:mfenced open="(" close=")"><mml:mi>i</mml:mi></mml:mfenced></mml:mrow></mml:msub></mml:mrow></mml:msqrt></mml:mfrac></mml:mstyle><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>

          Specifically, each sample weight is set proportional to the inverse square root of the local sample density. To preserve the aggregate information content, we then normalize the weights to unit mean, as defined in Eq. (8).

            <disp-formula id="Ch1.E8" content-type="numbered"><label>8</label><mml:math id="M90" display="block"><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:msub><mml:mover accent="true"><mml:mi>w</mml:mi><mml:mo mathvariant="normal" stretchy="false">̃</mml:mo></mml:mover><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:mstyle displaystyle="false"><mml:mfrac style="text"><mml:mn mathvariant="normal">1</mml:mn><mml:mi>N</mml:mi></mml:mfrac></mml:mstyle><mml:munderover><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>N</mml:mi></mml:munderover><mml:msub><mml:mover accent="true"><mml:mi>w</mml:mi><mml:mo stretchy="false" mathvariant="normal">̃</mml:mo></mml:mover><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:mfrac></mml:mstyle><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>

          This strategy increases the influence of observations from sparse regions and earlier periods without altering the aggregate sample distribution.</p>
</sec>
<sec id="Ch1.S3.SS4">
  <label>3.4</label><title>Hyperparameter optimization and validation</title>
      <p id="d2e1994">Independent hyperparameter optimization for each basin-depth unit is performed using Bayesian inference (Optuna), targeting the objective of validation RMSE minimization (Table 2). This automated search is integrated with a decadal-block five-fold cross-validation scheme to address the challenges of non-stationary ocean signals. By grouping observations into multi-year blocks, we decouple validation results from the short-term temporal dependencies that often inflate predictive skill in traditional random-split CV (Salazar et al., 2022). This structural separation ensures that the model learns large-scale climatic drivers rather than localized temporal artifacts. In each cross-validation fold, early stopping is activated if validation RMSE fails to improve for 50 consecutive iterations, after which the optimal iteration count is recorded. To secure a robust final model, we set the terminal iteration count to the median of these recorded values across all folds. This iteration-locked retraining on the complete calibration set – with early stopping disabled – prevents overfitting and ensures a stable convergence state.</p>

<table-wrap id="T2" specific-use="star"><label>Table 2</label><caption><p id="d2e2000">CatBoost hyperparameters and Optuna search priors.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="3">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="left"/>
     <oasis:colspec colnum="3" colname="col3" align="left"/>
     <oasis:thead>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Hyperparameter</oasis:entry>
         <oasis:entry colname="col2">Explanation</oasis:entry>
         <oasis:entry colname="col3">Search range</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1">iterations</oasis:entry>
         <oasis:entry colname="col2">Maximum boosting rounds</oasis:entry>
         <oasis:entry colname="col3">200–2500</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">learning_rate</oasis:entry>
         <oasis:entry colname="col2">Learning rate (shrinkage)</oasis:entry>
         <oasis:entry colname="col3">0.02–0.08</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">depth</oasis:entry>
         <oasis:entry colname="col2">Tree depth</oasis:entry>
         <oasis:entry colname="col3">6–10</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">l2_leaf_reg</oasis:entry>
         <oasis:entry colname="col2">L2 regularization on leaf values</oasis:entry>
         <oasis:entry colname="col3">5–20</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">bagging_temperature</oasis:entry>
         <oasis:entry colname="col2">Temperature for Bayesian bootstrap</oasis:entry>
         <oasis:entry colname="col3">0.1–0.6</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

      <p id="d2e2092">For an unbiased final assessment, a strictly independent global test set is constructed via decade-stratified random sampling (1960–2024), where one full calendar year per decade is entirely excluded from feature selection and hyperparameter tuning. The resultant test years (1961, 1970, 1984, 1993, 2003, 2012, and 2020) provide a temporally representative benchmark. This withheld-year test set is reserved exclusively for final performance assessment and inter-product benchmarking; all model selection and calibration are conducted via decade-block cross-validation within the remaining training data. Integrated predictions from regional sub-models are then evaluated against this set using RMSE, mean absolute error (MAE), and the coefficient of determination (<inline-formula><mml:math id="M91" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula>).</p>
</sec>
</sec>
<sec id="Ch1.S4">
  <label>4</label><title>Results</title>
<sec id="Ch1.S4.SS1">
  <label>4.1</label><title>Variable Associations and Feature Contributions</title>
      <p id="d2e2122">The relationship between DO and its environmental drivers undergoes a transformation from the surface to the deep ocean (Fig. 4). In the near-surface layers, thermodynamic solubility constraints dominate, manifesting as a robust negative correlation with temperature. As depth increases, this direct solubility control attenuates, allowing salinity to emerge as the primary integrative proxy for water-mass properties and ventilation history, particularly tracking the signatures of deep-water formation and transport. These depth-dependent patterns provide the empirical foundation for resolving complex non-linear interactions within the water column (Ping et al., 2024; Cao et al., 2024).</p>

      <fig id="F4" specific-use="star"><label>Figure 4</label><caption><p id="d2e2127">Vertical correlations between DO and physical–biogeochemical variables. Bubble color encodes the sign and magnitude (red <inline-formula><mml:math id="M92" display="inline"><mml:mo>=</mml:mo></mml:math></inline-formula> positive; blue <inline-formula><mml:math id="M93" display="inline"><mml:mo>=</mml:mo></mml:math></inline-formula> negative), and bubble area scales with (<inline-formula><mml:math id="M94" display="inline"><mml:mrow><mml:mo>|</mml:mo><mml:mi>r</mml:mi><mml:mo>|</mml:mo></mml:mrow></mml:math></inline-formula>). Filled bubbles denote correlations significant at (<inline-formula><mml:math id="M95" display="inline"><mml:mrow><mml:mi>q</mml:mi><mml:mo>&lt;</mml:mo><mml:mn mathvariant="normal">0.05</mml:mn></mml:mrow></mml:math></inline-formula>) after Benjamini–Hochberg false-discovery-rate control; hollow bubbles indicate non-significant results.</p></caption>
          <graphic xlink:href="https://essd.copernicus.org/articles/18/3125/2026/essd-18-3125-2026-f04.png"/>

        </fig>

      <p id="d2e2174">Correlations between sea surface environmental variables (SSEVs) and subsurface DO represent teleconnected pathways rather than local drivers. Surface dynamical forcing and upper-ocean mixing modulate water-mass formation and reventilation, thereby constraining the DO budget at depth. Specifically, sea surface height (SSH) exhibits persistent vertical coherence across much of the water column, while wind stress provides a dynamical context for advection and upwelling (Hollitzer et al., 2024). In contrast, the Chl-<inline-formula><mml:math id="M96" display="inline"><mml:mi>a</mml:mi></mml:math></inline-formula> signal remains confined to the epipelagic zone, reflecting its role as a productivity indicator. Beyond the euphotic zone, carbonate system variables, notably alkalinity, maintain high correlations with DO, delineating the remineralization background linked to deep-water aging.</p>
      <p id="d2e2185">Feature-importance diagnostics are reported to describe model dependence and do not imply causality. The adaptive feature-selection results show that the model's reliance on predictors varies strongly with depth and region. In the upper 10 m, temperature and O<sub>2</sub> saturation are consistently among the most informative predictors, whereas at intermediate depths (1000–2000 m) salinity tends to contribute more strongly in certain high-latitude regimes, including the Arctic and Southern Oceans (Fig. 5). This depth-dependent pattern motivates the use of depth-specific predictor sets to better represent distinct hydrographic contexts. In addition, several regionally relevant covariates (e.g., SSH in the Indian Ocean and DIC/alkalinity in low-oxygen environments) are retained more frequently by the selection procedure, indicating that they provide useful contextual information for prediction under specific regimes (Franco et al., 2014).</p>

      <fig id="F5" specific-use="star"><label>Figure 5</label><caption><p id="d2e2199">Heatmap of relative feature importance across depths and basins. Colors are on a logarithmic scale. The bar chart on the right shows each feature's mean importance computed over the basin provinces in which that feature is available.</p></caption>
          <graphic xlink:href="https://essd.copernicus.org/articles/18/3125/2026/essd-18-3125-2026-f05.png"/>

        </fig>

      <p id="d2e2208">Although latitude serves as a dominant proxy for broad-scale gradients (Milà et al., 2024), specialized SSEVs are vital for refining local accuracy. Our regionalized architecture prioritizes these idiosyncratic dynamics, utilizing variables like SOM_DES to resolve high-frequency seasonal variations in the Southern Ocean. This adaptive approach mitigates the biases inherent in spatially stationary parameterizations, ensuring the reconstruction respects the intrinsic heterogeneity of the global oxygen cycle.</p>
</sec>
<sec id="Ch1.S4.SS2">
  <label>4.2</label><title>Model evaluation</title>
      <p id="d2e2219">Vertical stratification of error profiles reveals a coherent link between RMSE and the intensity of the oxycline (Fig. 6). In most basins, uncertainty is concentrated in the upper 600 m, where biological consumption and physical mixing generate high spatiotemporal variance. The Indian Ocean (IND) presents the most significant upper-ocean RMSE peak, likely due to its unique biogeochemical configuration, whereas the Arctic Ocean (ARC) shows fluctuations in RMSE at intermediate depths, but its <inline-formula><mml:math id="M98" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> remains relatively high. This vertical decoupling of error – where water-mass stability increases with depth before decreasing – represents a common feature across the Atlantic, Pacific, IND, and Southern Oceans, further confirming the appropriateness of the regional partitioning approach.</p>

      <fig id="F6" specific-use="star"><label>Figure 6</label><caption><p id="d2e2235">Depth profiles of model performance across basin provinces.</p></caption>
          <graphic xlink:href="https://essd.copernicus.org/articles/18/3125/2026/essd-18-3125-2026-f06.png"/>

        </fig>

      <p id="d2e2244">Except for the sparsely sampled Arctic, all ocean basins maintain high <inline-formula><mml:math id="M99" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> values (typically <inline-formula><mml:math id="M100" display="inline"><mml:mo>&gt;</mml:mo></mml:math></inline-formula> 0.8) across the sampled water column. While absolute errors are inherently higher in the upper ocean where DO variability is maximized, the high <inline-formula><mml:math id="M101" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> values indicate that the underlying spatial and temporal patterns are accurately recovered. The analysis demonstrates that uncertainty is largely a function of the vertical DO structure, with the deep-ocean regime providing a stable and highly predictable anchor for long-term deoxygenation trends.</p>
      <p id="d2e2277">The reconstructed DO fields demonstrate high accuracy throughout the global ocean, with scatter plots confirming that estimates follow the 1 : 1 line across the entire concentration range (Fig. 7). At representative depths, the global RMSE/MAE on the independent test set is on the order of 12.8/8.0 <inline-formula><mml:math id="M102" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">µ</mml:mi></mml:mrow></mml:math></inline-formula>mol kg<sup>−1</sup> at 10 m, 16.8/11.4 <inline-formula><mml:math id="M104" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">µ</mml:mi></mml:mrow></mml:math></inline-formula>mol kg<sup>−1</sup> at 200 m, and 7.5/5.2 <inline-formula><mml:math id="M106" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">µ</mml:mi></mml:mrow></mml:math></inline-formula>mol kg<sup>−1</sup> at 1000 m. Most <inline-formula><mml:math id="M108" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> values are above 0.9, and the <inline-formula><mml:math id="M109" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> for deep-layer DO reconstruction reaches 0.99. A slight overestimation in regions with relatively low values reflects the inherent smoothing characteristic of regularized decision trees, which tend to pull extreme values toward local means. However, the symmetric distribution of residuals around zero indicates that these effects are localized and do not introduce a systematic large-scale bias. This performance confirms that the modeling architecture is well-suited for resolving the non-linearities inherent in ocean oxygen dynamics.</p>

      <fig id="F7" specific-use="star"><label>Figure 7</label><caption><p id="d2e2365">Model performance for DO predictions across depth layers (independent test set). Top row: hexagon-binned scatterplots of predicted vs. observed values; the gray dashed line denotes the 1 : 1 reference. The color bar indicates sample counts per hexbin. Bottom row: corresponding histograms of residuals (observed <inline-formula><mml:math id="M110" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula> predicted).</p></caption>
          <graphic xlink:href="https://essd.copernicus.org/articles/18/3125/2026/essd-18-3125-2026-f07.png"/>

        </fig>

</sec>
<sec id="Ch1.S4.SS3">
  <label>4.3</label><title>Long-term dataset</title>
      <p id="d2e2389">The trained models leverage CORA ocean analysis from CMEMS to generate spatially and temporally consistent monthly dissolved oxygen (DO) reconstructions. By using gridded temperature and salinity products, the framework obviates interpolation from sparse in situ profiles and minimizes associated uncertainties. These predictors undergo delayed-mode objective mapping and rigorous quality control to ensure long-term consistency and full traceability. Monthly DO predictions are generated at 176 standard depth levels on a 0.5° <inline-formula><mml:math id="M111" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> 0.5° grid extending to 5500 m.</p>
      <p id="d2e2399">To mitigate discontinuity artifacts at basin province boundaries – described as the “step-effect” – a boundary fusion protocol is implemented within transition zones (Wagstaff and Bean, 2022). For a prediction location <inline-formula><mml:math id="M112" display="inline"><mml:mi mathvariant="bold-italic">x</mml:mi></mml:math></inline-formula>, the fused estimate is derived from the regional model prediction <inline-formula><mml:math id="M113" display="inline"><mml:mrow><mml:msub><mml:mover accent="true"><mml:mi>y</mml:mi><mml:mo stretchy="false" mathvariant="normal">^</mml:mo></mml:mover><mml:mi>i</mml:mi></mml:msub><mml:mfenced close=")" open="("><mml:mi mathvariant="bold-italic">x</mml:mi></mml:mfenced></mml:mrow></mml:math></inline-formula> and the minimum great-circle distance <inline-formula><mml:math id="M114" display="inline"><mml:mrow><mml:msub><mml:mi>d</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> to the provincial boundary, as defined in Eq. (9). With a smoothing bandwidth of <inline-formula><mml:math id="M115" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">300</mml:mn></mml:mrow></mml:math></inline-formula> km, predictions from adjacent basins are smoothly blended in proportion to their distance from the edge. This approach preserves continuity across boundaries without excessive smoothing. Far from boundaries (<inline-formula><mml:math id="M116" display="inline"><mml:mrow><mml:mi>d</mml:mi><mml:mo>&gt;</mml:mo><mml:mi>S</mml:mi></mml:mrow></mml:math></inline-formula>), only the local provincial model contributes to the estimate.

            <disp-formula id="Ch1.E9" content-type="numbered"><label>9</label><mml:math id="M117" display="block"><mml:mtable class="split" rowspacing="0.2ex" displaystyle="true" columnalign="right left"><mml:mtr><mml:mtd/><mml:mtd><mml:mrow><mml:msup><mml:mover accent="true"><mml:mi>y</mml:mi><mml:mo mathvariant="normal" stretchy="false">^</mml:mo></mml:mover><mml:mo>′</mml:mo></mml:msup><mml:mfenced close=")" open="("><mml:mi mathvariant="bold-italic">x</mml:mi></mml:mfenced><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:msub><mml:mo>∑</mml:mo><mml:mi>i</mml:mi></mml:msub><mml:mi>w</mml:mi><mml:mfenced open="(" close=")"><mml:mrow><mml:msub><mml:mi>d</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>∣</mml:mo><mml:mi>S</mml:mi></mml:mrow></mml:mfenced><mml:msub><mml:mover accent="true"><mml:mi>y</mml:mi><mml:mo mathvariant="normal" stretchy="false">^</mml:mo></mml:mover><mml:mi>i</mml:mi></mml:msub><mml:mfenced open="(" close=")"><mml:mi mathvariant="bold-italic">x</mml:mi></mml:mfenced></mml:mrow><mml:mrow><mml:msub><mml:mo>∑</mml:mo><mml:mi>i</mml:mi></mml:msub><mml:mi>w</mml:mi><mml:mfenced close=")" open="("><mml:mrow><mml:msub><mml:mi>d</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>∣</mml:mo><mml:mi>S</mml:mi></mml:mrow></mml:mfenced></mml:mrow></mml:mfrac></mml:mstyle><mml:mo>,</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd/><mml:mtd><mml:mrow><mml:mi>w</mml:mi><mml:mfenced open="(" close=")"><mml:mrow><mml:msub><mml:mi>d</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>∣</mml:mo><mml:mi>S</mml:mi></mml:mrow></mml:mfenced><mml:mo>=</mml:mo><mml:mfenced open="{" close=""><mml:mtable class="array" columnalign="left left"><mml:mtr><mml:mtd><mml:mrow><mml:msup><mml:mfenced open="(" close=")"><mml:mstyle displaystyle="false"><mml:mstyle displaystyle="false"><mml:mfrac style="text"><mml:mrow><mml:mfenced close=")" open="("><mml:mrow><mml:mi>S</mml:mi><mml:mo>-</mml:mo><mml:msub><mml:mi>d</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mfenced></mml:mrow><mml:mi>S</mml:mi></mml:mfrac></mml:mstyle></mml:mstyle></mml:mfenced><mml:mn mathvariant="normal">2</mml:mn></mml:msup><mml:mo>,</mml:mo></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:msub><mml:mi>d</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>≤</mml:mo><mml:mi>S</mml:mi></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mrow><mml:mn mathvariant="normal">0</mml:mn><mml:mo>,</mml:mo></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:msub><mml:mi>d</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>&gt;</mml:mo><mml:mi>S</mml:mi></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mfenced><mml:mo>,</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>

          Finally, we obtained GEOXYGEN, a global DO dataset that provides monthly fields on a 0.5° <inline-formula><mml:math id="M118" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> 0.5° latitude–longitude grid at 176 standard depth levels from 1 to 5500 m, spanning 1960–2024. The latitude grid is not strictly uniform, with higher density in the high-latitude regions. Distributed via CF-compliant NetCDF files, each monthly volume (GEOXYGEN_DO_YYYYMM_0p5deg_v1.nc) includes the four-dimensional DO variable (time <inline-formula><mml:math id="M119" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> depth <inline-formula><mml:math id="M120" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> lat <inline-formula><mml:math id="M121" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> lon) and associated coordinates. Time represents monthly means encoded as days since 1 January 1950 00:00:00 UTC (Gregorian calendar), with missing data identified by a large sentinel value. A separate NetCDF file provides the basin provinces and valid-ocean masks. The complete dataset is available at <ext-link xlink:href="https://doi.org/10.12157/IOCAS.20260223.002" ext-link-type="DOI">10.12157/IOCAS.20260223.002</ext-link> (Wang et al., 2026b).</p>
</sec>
<sec id="Ch1.S4.SS4">
  <label>4.4</label><title>Comparison with other products</title>
      <p id="d2e2656">To evaluate the predictive fidelity of GEOXYGEN against four existing dissolved oxygen (DO) products (Table 3), we utilize the independent test dataset stratified across discrete temporal windows: the early period (1960–1980) and the recent epoch (2000–2020; Table 4). The intercomparison is strictly constrained to spatiotemporally co-located samples, ensuring that only grid-month-depth intersections with concurrent availability across all products are included in the evaluation. This rigorous filtering preserves statistical equity and minimizes biases arising from disparate product masks.</p>

<table-wrap id="T3" specific-use="star"><label>Table 3</label><caption><p id="d2e2662">Summary of our product and other DO products.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="5">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="left"/>
     <oasis:colspec colnum="3" colname="col3" align="left"/>
     <oasis:colspec colnum="4" colname="col4" align="left"/>
     <oasis:colspec colnum="5" colname="col5" align="left"/>
     <oasis:thead>
       <oasis:row>
         <oasis:entry colname="col1">Product</oasis:entry>
         <oasis:entry colname="col2">Time coverage</oasis:entry>
         <oasis:entry colname="col3">Vertical levels</oasis:entry>
         <oasis:entry colname="col4">Temporal</oasis:entry>
         <oasis:entry colname="col5">Horizontal</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2"/>
         <oasis:entry colname="col3"/>
         <oasis:entry colname="col4">resolution</oasis:entry>
         <oasis:entry colname="col5">resolution</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1">GEOXYGEN</oasis:entry>
         <oasis:entry colname="col2">Jan 1960–Jun 2024</oasis:entry>
         <oasis:entry colname="col3">1–5500 m (176 levels)</oasis:entry>
         <oasis:entry colname="col4">Monthly</oasis:entry>
         <oasis:entry colname="col5">0.5° <inline-formula><mml:math id="M122" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> 0.5°</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">ML4O2</oasis:entry>
         <oasis:entry colname="col2">Jan 1965–Dec 2020</oasis:entry>
         <oasis:entry colname="col3">6–1000 m (20 levels)</oasis:entry>
         <oasis:entry colname="col4">Monthly</oasis:entry>
         <oasis:entry colname="col5">1° <inline-formula><mml:math id="M123" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> 1°</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">GOBAI-O2</oasis:entry>
         <oasis:entry colname="col2">Jan 2004–Dec 2023</oasis:entry>
         <oasis:entry colname="col3">2.5–1975 m (58 levels)</oasis:entry>
         <oasis:entry colname="col4">Monthly</oasis:entry>
         <oasis:entry colname="col5">1° <inline-formula><mml:math id="M124" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> 1°</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">G4D-DOC</oasis:entry>
         <oasis:entry colname="col2">Jan 2005–Dec 2022</oasis:entry>
         <oasis:entry colname="col3">10–1995 m (26 levels)</oasis:entry>
         <oasis:entry colname="col4">Monthly</oasis:entry>
         <oasis:entry colname="col5">1° <inline-formula><mml:math id="M125" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> 1°</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">IAP Oxygen</oasis:entry>
         <oasis:entry colname="col2">Jan 1960–Dec 2022</oasis:entry>
         <oasis:entry colname="col3">1–6000 m (119 levels)</oasis:entry>
         <oasis:entry colname="col4">Monthly</oasis:entry>
         <oasis:entry colname="col5">1° <inline-formula><mml:math id="M126" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> 1°</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

<table-wrap id="T4" specific-use="star"><label>Table 4</label><caption><p id="d2e2848">Accuracy by depth for each product relative to observations. RMSE and bias are reported in <inline-formula><mml:math id="M127" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">µ</mml:mi></mml:mrow></mml:math></inline-formula>mol kg<sup>−1</sup>; <inline-formula><mml:math id="M129" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> is dimensionless.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="8">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="right"/>
     <oasis:colspec colnum="3" colname="col3" align="right"/>
     <oasis:colspec colnum="4" colname="col4" align="right"/>
     <oasis:colspec colnum="5" colname="col5" align="right" colsep="1"/>
     <oasis:colspec colnum="6" colname="col6" align="right"/>
     <oasis:colspec colnum="7" colname="col7" align="right"/>
     <oasis:colspec colnum="8" colname="col8" align="right"/>
     <oasis:thead>
       <oasis:row>
         <oasis:entry colname="col1">Product</oasis:entry>
         <oasis:entry colname="col2">Depth</oasis:entry>
         <oasis:entry rowsep="1" namest="col3" nameend="col5" align="center" colsep="1">1960–1980 </oasis:entry>
         <oasis:entry rowsep="1" namest="col6" nameend="col8" align="center">2000–2020 </oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2"/>
         <oasis:entry colname="col3">RMSE</oasis:entry>
         <oasis:entry colname="col4">Bias</oasis:entry>
         <oasis:entry colname="col5"><inline-formula><mml:math id="M130" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col6">RMSE</oasis:entry>
         <oasis:entry colname="col7">Bias</oasis:entry>
         <oasis:entry colname="col8"><inline-formula><mml:math id="M131" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula></oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry rowsep="1" colname="col1">GEOXYGEN</oasis:entry>
         <oasis:entry colname="col2">10</oasis:entry>
         <oasis:entry rowsep="1" colname="col3">12.2</oasis:entry>
         <oasis:entry rowsep="1" colname="col4">0.3</oasis:entry>
         <oasis:entry rowsep="1" colname="col5">0.95</oasis:entry>
         <oasis:entry rowsep="1" colname="col6">6.9</oasis:entry>
         <oasis:entry rowsep="1" colname="col7"><inline-formula><mml:math id="M132" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">0.4</mml:mn></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry rowsep="1" colname="col8">0.98</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry rowsep="1" colname="col1">IAP Oxygen</oasis:entry>
         <oasis:entry colname="col2"/>
         <oasis:entry rowsep="1" colname="col3">13.2</oasis:entry>
         <oasis:entry rowsep="1" colname="col4">0.1</oasis:entry>
         <oasis:entry rowsep="1" colname="col5">0.94</oasis:entry>
         <oasis:entry rowsep="1" colname="col6">7.5</oasis:entry>
         <oasis:entry rowsep="1" colname="col7">0.3</oasis:entry>
         <oasis:entry rowsep="1" colname="col8">0.97</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry rowsep="1" colname="col1">ML4O2</oasis:entry>
         <oasis:entry colname="col2"/>
         <oasis:entry rowsep="1" colname="col3">13.4</oasis:entry>
         <oasis:entry rowsep="1" colname="col4">1.5</oasis:entry>
         <oasis:entry rowsep="1" colname="col5">0.94</oasis:entry>
         <oasis:entry rowsep="1" colname="col6">8.1</oasis:entry>
         <oasis:entry rowsep="1" colname="col7"><inline-formula><mml:math id="M133" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry rowsep="1" colname="col8">0.97</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry rowsep="1" colname="col1">GOBAI-O<sub>2</sub></oasis:entry>
         <oasis:entry colname="col2"/>
         <oasis:entry rowsep="1" colname="col3">NA</oasis:entry>
         <oasis:entry rowsep="1" colname="col4">NA</oasis:entry>
         <oasis:entry rowsep="1" colname="col5">NA</oasis:entry>
         <oasis:entry rowsep="1" colname="col6">7.5</oasis:entry>
         <oasis:entry rowsep="1" colname="col7"><inline-formula><mml:math id="M135" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">0.4</mml:mn></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry rowsep="1" colname="col8">0.97</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">G4D-DOC</oasis:entry>
         <oasis:entry colname="col2"/>
         <oasis:entry colname="col3">NA</oasis:entry>
         <oasis:entry colname="col4">NA</oasis:entry>
         <oasis:entry colname="col5">NA</oasis:entry>
         <oasis:entry colname="col6">7.3</oasis:entry>
         <oasis:entry colname="col7"><inline-formula><mml:math id="M136" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1.3</mml:mn></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col8">0.98</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry rowsep="1" colname="col1">GEOXYGEN</oasis:entry>
         <oasis:entry colname="col2">50</oasis:entry>
         <oasis:entry rowsep="1" colname="col3">16.7</oasis:entry>
         <oasis:entry rowsep="1" colname="col4">0.1</oasis:entry>
         <oasis:entry rowsep="1" colname="col5">0.93</oasis:entry>
         <oasis:entry rowsep="1" colname="col6">12</oasis:entry>
         <oasis:entry rowsep="1" colname="col7">0</oasis:entry>
         <oasis:entry rowsep="1" colname="col8">0.95</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry rowsep="1" colname="col1">IAP Oxygen</oasis:entry>
         <oasis:entry colname="col2"/>
         <oasis:entry rowsep="1" colname="col3">18.1</oasis:entry>
         <oasis:entry rowsep="1" colname="col4">1.4</oasis:entry>
         <oasis:entry rowsep="1" colname="col5">0.92</oasis:entry>
         <oasis:entry rowsep="1" colname="col6">13.8</oasis:entry>
         <oasis:entry rowsep="1" colname="col7">0.9</oasis:entry>
         <oasis:entry rowsep="1" colname="col8">0.94</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry rowsep="1" colname="col1">ML4O2</oasis:entry>
         <oasis:entry colname="col2"/>
         <oasis:entry rowsep="1" colname="col3">17.3</oasis:entry>
         <oasis:entry rowsep="1" colname="col4">1.8</oasis:entry>
         <oasis:entry rowsep="1" colname="col5">0.93</oasis:entry>
         <oasis:entry rowsep="1" colname="col6">13.4</oasis:entry>
         <oasis:entry rowsep="1" colname="col7"><inline-formula><mml:math id="M137" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1.6</mml:mn></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry rowsep="1" colname="col8">0.94</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry rowsep="1" colname="col1">GOBAI-O<sub>2</sub></oasis:entry>
         <oasis:entry colname="col2"/>
         <oasis:entry rowsep="1" colname="col3">NA</oasis:entry>
         <oasis:entry rowsep="1" colname="col4">NA</oasis:entry>
         <oasis:entry rowsep="1" colname="col5">NA</oasis:entry>
         <oasis:entry rowsep="1" colname="col6">12.9</oasis:entry>
         <oasis:entry rowsep="1" colname="col7">0</oasis:entry>
         <oasis:entry rowsep="1" colname="col8">0.95</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">G4D-DOC</oasis:entry>
         <oasis:entry colname="col2"/>
         <oasis:entry colname="col3">NA</oasis:entry>
         <oasis:entry colname="col4">NA</oasis:entry>
         <oasis:entry colname="col5">NA</oasis:entry>
         <oasis:entry colname="col6">13</oasis:entry>
         <oasis:entry colname="col7"><inline-formula><mml:math id="M139" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1.2</mml:mn></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col8">0.94</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry rowsep="1" colname="col1">GEOXYGEN</oasis:entry>
         <oasis:entry colname="col2">100</oasis:entry>
         <oasis:entry rowsep="1" colname="col3">17.4</oasis:entry>
         <oasis:entry rowsep="1" colname="col4">0.2</oasis:entry>
         <oasis:entry rowsep="1" colname="col5">0.95</oasis:entry>
         <oasis:entry rowsep="1" colname="col6">14.4</oasis:entry>
         <oasis:entry rowsep="1" colname="col7">0.1</oasis:entry>
         <oasis:entry rowsep="1" colname="col8">0.96</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry rowsep="1" colname="col1">IAP Oxygen</oasis:entry>
         <oasis:entry colname="col2"/>
         <oasis:entry rowsep="1" colname="col3">18.3</oasis:entry>
         <oasis:entry rowsep="1" colname="col4">1.2</oasis:entry>
         <oasis:entry rowsep="1" colname="col5">0.94</oasis:entry>
         <oasis:entry rowsep="1" colname="col6">17.2</oasis:entry>
         <oasis:entry rowsep="1" colname="col7">0.9</oasis:entry>
         <oasis:entry rowsep="1" colname="col8">0.95</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry rowsep="1" colname="col1">ML4O2</oasis:entry>
         <oasis:entry colname="col2"/>
         <oasis:entry rowsep="1" colname="col3">18.4</oasis:entry>
         <oasis:entry rowsep="1" colname="col4">2.1</oasis:entry>
         <oasis:entry rowsep="1" colname="col5">0.94</oasis:entry>
         <oasis:entry rowsep="1" colname="col6">17.2</oasis:entry>
         <oasis:entry rowsep="1" colname="col7"><inline-formula><mml:math id="M140" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry rowsep="1" colname="col8">0.95</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry rowsep="1" colname="col1">GOBAI-O<sub>2</sub></oasis:entry>
         <oasis:entry colname="col2"/>
         <oasis:entry rowsep="1" colname="col3">NA</oasis:entry>
         <oasis:entry rowsep="1" colname="col4">NA</oasis:entry>
         <oasis:entry rowsep="1" colname="col5">NA</oasis:entry>
         <oasis:entry rowsep="1" colname="col6">15.6</oasis:entry>
         <oasis:entry rowsep="1" colname="col7">0.1</oasis:entry>
         <oasis:entry rowsep="1" colname="col8">0.96</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">G4D-DOC</oasis:entry>
         <oasis:entry colname="col2"/>
         <oasis:entry colname="col3">NA</oasis:entry>
         <oasis:entry colname="col4">NA</oasis:entry>
         <oasis:entry colname="col5">NA</oasis:entry>
         <oasis:entry colname="col6">20.6</oasis:entry>
         <oasis:entry colname="col7"><inline-formula><mml:math id="M142" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">0.7</mml:mn></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col8">0.92</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry rowsep="1" colname="col1">GEOXYGEN</oasis:entry>
         <oasis:entry colname="col2">200</oasis:entry>
         <oasis:entry rowsep="1" colname="col3">17.3</oasis:entry>
         <oasis:entry rowsep="1" colname="col4"><inline-formula><mml:math id="M143" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">0.4</mml:mn></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry rowsep="1" colname="col5">0.96</oasis:entry>
         <oasis:entry rowsep="1" colname="col6">12.8</oasis:entry>
         <oasis:entry rowsep="1" colname="col7">0.2</oasis:entry>
         <oasis:entry rowsep="1" colname="col8">0.98</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry rowsep="1" colname="col1">IAP Oxygen</oasis:entry>
         <oasis:entry colname="col2"/>
         <oasis:entry rowsep="1" colname="col3">17.6</oasis:entry>
         <oasis:entry rowsep="1" colname="col4">0</oasis:entry>
         <oasis:entry rowsep="1" colname="col5">0.96</oasis:entry>
         <oasis:entry rowsep="1" colname="col6">14.8</oasis:entry>
         <oasis:entry rowsep="1" colname="col7">0.4</oasis:entry>
         <oasis:entry rowsep="1" colname="col8">0.97</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry rowsep="1" colname="col1">ML4O2</oasis:entry>
         <oasis:entry colname="col2"/>
         <oasis:entry rowsep="1" colname="col3">18.6</oasis:entry>
         <oasis:entry rowsep="1" colname="col4"><inline-formula><mml:math id="M144" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">0.1</mml:mn></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry rowsep="1" colname="col5">0.95</oasis:entry>
         <oasis:entry rowsep="1" colname="col6">16.2</oasis:entry>
         <oasis:entry rowsep="1" colname="col7"><inline-formula><mml:math id="M145" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1.6</mml:mn></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry rowsep="1" colname="col8">0.96</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry rowsep="1" colname="col1">GOBAI-O<sub>2</sub></oasis:entry>
         <oasis:entry colname="col2"/>
         <oasis:entry rowsep="1" colname="col3">NA</oasis:entry>
         <oasis:entry rowsep="1" colname="col4">NA</oasis:entry>
         <oasis:entry rowsep="1" colname="col5">NA</oasis:entry>
         <oasis:entry rowsep="1" colname="col6">15.5</oasis:entry>
         <oasis:entry rowsep="1" colname="col7"><inline-formula><mml:math id="M147" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">0.9</mml:mn></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry rowsep="1" colname="col8">0.97</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">G4D-DOC</oasis:entry>
         <oasis:entry colname="col2"/>
         <oasis:entry colname="col3">NA</oasis:entry>
         <oasis:entry colname="col4">NA</oasis:entry>
         <oasis:entry colname="col5">NA</oasis:entry>
         <oasis:entry colname="col6">18.3</oasis:entry>
         <oasis:entry colname="col7">0.8</oasis:entry>
         <oasis:entry colname="col8">0.95</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry rowsep="1" colname="col1">GEOXYGEN</oasis:entry>
         <oasis:entry colname="col2">500</oasis:entry>
         <oasis:entry rowsep="1" colname="col3">15.1</oasis:entry>
         <oasis:entry rowsep="1" colname="col4"><inline-formula><mml:math id="M148" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">0.3</mml:mn></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry rowsep="1" colname="col5">0.97</oasis:entry>
         <oasis:entry rowsep="1" colname="col6">10.9</oasis:entry>
         <oasis:entry rowsep="1" colname="col7">0</oasis:entry>
         <oasis:entry rowsep="1" colname="col8">0.98</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry rowsep="1" colname="col1">IAP Oxygen</oasis:entry>
         <oasis:entry colname="col2"/>
         <oasis:entry rowsep="1" colname="col3">16.1</oasis:entry>
         <oasis:entry rowsep="1" colname="col4"><inline-formula><mml:math id="M149" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">0.6</mml:mn></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry rowsep="1" colname="col5">0.97</oasis:entry>
         <oasis:entry rowsep="1" colname="col6">13</oasis:entry>
         <oasis:entry rowsep="1" colname="col7"><inline-formula><mml:math id="M150" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">0.1</mml:mn></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry rowsep="1" colname="col8">0.98</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry rowsep="1" colname="col1">ML4O2</oasis:entry>
         <oasis:entry colname="col2"/>
         <oasis:entry rowsep="1" colname="col3">16.6</oasis:entry>
         <oasis:entry rowsep="1" colname="col4"><inline-formula><mml:math id="M151" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">0.5</mml:mn></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry rowsep="1" colname="col5">0.97</oasis:entry>
         <oasis:entry rowsep="1" colname="col6">14.1</oasis:entry>
         <oasis:entry rowsep="1" colname="col7"><inline-formula><mml:math id="M152" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1.7</mml:mn></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry rowsep="1" colname="col8">0.97</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry rowsep="1" colname="col1">GOBAI-O<sub>2</sub></oasis:entry>
         <oasis:entry colname="col2"/>
         <oasis:entry rowsep="1" colname="col3">NA</oasis:entry>
         <oasis:entry rowsep="1" colname="col4">NA</oasis:entry>
         <oasis:entry rowsep="1" colname="col5">NA</oasis:entry>
         <oasis:entry rowsep="1" colname="col6">13.9</oasis:entry>
         <oasis:entry rowsep="1" colname="col7"><inline-formula><mml:math id="M154" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1.2</mml:mn></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry rowsep="1" colname="col8">0.97</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">G4D-DOC</oasis:entry>
         <oasis:entry colname="col2"/>
         <oasis:entry colname="col3">NA</oasis:entry>
         <oasis:entry colname="col4">NA</oasis:entry>
         <oasis:entry colname="col5">NA</oasis:entry>
         <oasis:entry colname="col6">12.7</oasis:entry>
         <oasis:entry colname="col7">0.3</oasis:entry>
         <oasis:entry colname="col8">0.98</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry rowsep="1" colname="col1">GEOXYGEN</oasis:entry>
         <oasis:entry colname="col2">1000</oasis:entry>
         <oasis:entry rowsep="1" colname="col3">7.4</oasis:entry>
         <oasis:entry rowsep="1" colname="col4">0.1</oasis:entry>
         <oasis:entry rowsep="1" colname="col5">0.99</oasis:entry>
         <oasis:entry rowsep="1" colname="col6">5.2</oasis:entry>
         <oasis:entry rowsep="1" colname="col7">0</oasis:entry>
         <oasis:entry rowsep="1" colname="col8">1</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry rowsep="1" colname="col1">IAP Oxygen</oasis:entry>
         <oasis:entry colname="col2"/>
         <oasis:entry rowsep="1" colname="col3">7.4</oasis:entry>
         <oasis:entry rowsep="1" colname="col4">0</oasis:entry>
         <oasis:entry rowsep="1" colname="col5">0.99</oasis:entry>
         <oasis:entry rowsep="1" colname="col6">6.3</oasis:entry>
         <oasis:entry rowsep="1" colname="col7">0.4</oasis:entry>
         <oasis:entry rowsep="1" colname="col8">0.99</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry rowsep="1" colname="col1">ML4O2</oasis:entry>
         <oasis:entry colname="col2"/>
         <oasis:entry rowsep="1" colname="col3">8.7</oasis:entry>
         <oasis:entry rowsep="1" colname="col4">1.3</oasis:entry>
         <oasis:entry rowsep="1" colname="col5">0.99</oasis:entry>
         <oasis:entry rowsep="1" colname="col6">7.6</oasis:entry>
         <oasis:entry rowsep="1" colname="col7"><inline-formula><mml:math id="M155" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">2.2</mml:mn></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry rowsep="1" colname="col8">0.99</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry rowsep="1" colname="col1">GOBAI-O<sub>2</sub></oasis:entry>
         <oasis:entry colname="col2"/>
         <oasis:entry rowsep="1" colname="col3">NA</oasis:entry>
         <oasis:entry rowsep="1" colname="col4">NA</oasis:entry>
         <oasis:entry rowsep="1" colname="col5">NA</oasis:entry>
         <oasis:entry rowsep="1" colname="col6">5.7</oasis:entry>
         <oasis:entry rowsep="1" colname="col7"><inline-formula><mml:math id="M157" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">0.8</mml:mn></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry rowsep="1" colname="col8">0.99</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">G4D-DOC</oasis:entry>
         <oasis:entry colname="col2"/>
         <oasis:entry colname="col3">NA</oasis:entry>
         <oasis:entry colname="col4">NA</oasis:entry>
         <oasis:entry colname="col5">NA</oasis:entry>
         <oasis:entry colname="col6">11.7</oasis:entry>
         <oasis:entry colname="col7"><inline-formula><mml:math id="M158" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">3.8</mml:mn></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col8">0.98</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table><table-wrap-foot><p id="d2e2882">NA – not available.</p></table-wrap-foot></table-wrap>

      <p id="d2e4010">The comparative analysis reveals three dominant patterns. First, all products show larger errors during the early period, a trend predominantly attributable to the historical sampling architecture. Pre-Argo observations were characterized by a disproportionate concentration of coastal and shelf cruises, whereas the modern era is defined by the proliferation of autonomous Argo floats providing expansive open-ocean coverage. Because coastal environments are governed by pronounced small-scale heterogeneity and high-frequency nonstationarity, their reconstruction poses a greater challenge, inherently raising the error floor for the 1960–1980 interval.</p>
      <p id="d2e4013">Within this data-sparse window (1960–1980), GEOXYGEN manifests superior predictive stability across the upper and intermediate layers (10–500 m). It consistently achieves lower RMSE and MAE than both IAP and ML4O2 across nearly all depth levels while maintaining robust <inline-formula><mml:math id="M159" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> values. These gains demonstrate that the stratified, basin-specific modeling framework facilitates the recovery of mesoscale spatial heterogeneities and vertical gradients even under sparse observational constraints. By leveraging high-resolution target fields, this architecture enhances the representation of regional variability, ensuring that GEOXYGEN maintains competitive skill during the pre-satellite and pre-Argo eras.</p>
      <p id="d2e4027">In the data-rich modern period (2000–2020), GEOXYGEN exhibits even more pronounced performance gains. Within the critical mesopelagic range (100, 200 and 500 m), the product shows a significant accuracy improvement relative to all reference datasets, while ensuring that residual biases are smaller and more stable. In contrast, several existing products exhibit persistent systematic biases in this depth range, potentially compromising the representation of OMZs and steep vertical gradients. This suite of high-resolution, broad-coverage DO fields provides a reliable foundation for the integrated assessment of global deoxygenation trends and their underlying physical-biogeochemical drivers.</p>
</sec>
<sec id="Ch1.S4.SS5">
  <label>4.5</label><title>Comparison with WOA23</title>
      <p id="d2e4039">To evaluate the large-scale credibility and multi-year stability of our reconstructed dissolved oxygen (DO) fields, we compare our product's annual-mean DO climatology with WOA23 and examine vertical structure using representative profiles from three major basins (Fig. 8).</p>

      <fig id="F8" specific-use="star"><label>Figure 8</label><caption><p id="d2e4044">Climatological comparison between WOA23 and our product (GEOXYGEN). Row 1 shows the global-mean DO distribution averaged over 0–300 m. Colored dashed lines mark the locations of three sections: 65° E (red), 180° W (purple), and 30° W (green). Rows 2–4 show DO cross-section distributions along these three transects.</p></caption>
          <graphic xlink:href="https://essd.copernicus.org/articles/18/3125/2026/essd-18-3125-2026-f08.png"/>

        </fig>

      <p id="d2e4053">In the upper ocean (0–300 m, depth-averaged), both products capture consistent basin-scale spatial patterns. The subtropical gyres exhibit relatively high DO concentrations, while the equatorial region and eastern boundary upwelling systems form distinct oxygen-deficient belts. The spatial extent and location of these structures are in close agreement between the two climatologies. GEOXYGEN reproduces these banded structures continuously, with cross-frontal gradients in transition zones closely matching those in WOA23.</p>
      <p id="d2e4057">In the vertical, our product accurately depicts the largest hypoxic zone in climatology, which is mainly located in the intermediate depth, ranging from 300 to 1300 m. Meridional sections reveal that the model successfully reproduces the core depths and spatial extents of major hypoxic zones across the three primary ocean basins. The precise alignment of oxygen isopleths and the accurate representation of the transition layer steepness indicate that the reconstruction results are consistent with the climatological constraints of DO.</p>
      <p id="d2e4060">Unlike the standard 1° grid used in most global products, GEOXYGEN employs a higher 0.5° horizontal resolution to capture more detailed features. By training on localized relationships between DO and hydrographic analyses within discrete basin provinces, the model resolves the morphometry of OMZs with enhanced clarity. Although the reconstruction fidelity remains conditioned upon the quality of input observations, benchmarking against 1° products reveals that the 0.5° configuration yields systematically lower errors in high-gradient regimes. This selection constitutes an optimized operating resolution that maximizes information recovery from the underlying biogeochemical constraints without over-interpreting sparsely sampled intervals.</p>
</sec>
</sec>
<sec id="Ch1.S5">
  <label>5</label><title>Discussion</title>
<sec id="Ch1.S5.SS1">
  <label>5.1</label><title>Uncertainty Analysis</title>
      <p id="d2e4080">To quantify the credibility of the GEOXYGEN product, we decompose total uncertainty into two components: observation-related uncertainty and mapping uncertainty. Observation-related uncertainty summarizes measurement error and the representativeness error introduced when observations are aggregated to the cell–month–depth scale. Mapping uncertainty describes prediction error from the machine-learning mapping between environmental predictors and DO. This decomposition separates label-side and model-side contributions. It also supports diagnosing elevated uncertainty in coastal regions and potential bias in earlier, data-sparse periods, and it improves traceability and clarity for product use.</p>
      <p id="d2e4083">Measurement error depends on the measurement technique and instrument. For Winkler titration, bottle measurements can include random errors on the order of 1 <inline-formula><mml:math id="M160" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">µ</mml:mi></mml:mrow></mml:math></inline-formula>mol kg<sup>−1</sup> or smaller (Carpenter, 1965). Since observing platforms differ in sensor type, calibration strategy, and QC level, we assign a platform-specific measurement error scale <inline-formula><mml:math id="M162" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mi mathvariant="normal">meas</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> to each data source to represent differences in precision among platforms. Bottle-based sources including CCHDO Bottle, GLODAP, and GEOTRACES IDP are assigned 1 <inline-formula><mml:math id="M163" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">µ</mml:mi></mml:mrow></mml:math></inline-formula>mol kg<sup>−1</sup>. Since Argo data are bias corrected, Argo, OSD and CTD, and CCHDO CTD are assigned 1.5 <inline-formula><mml:math id="M165" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">µ</mml:mi></mml:mrow></mml:math></inline-formula>mol kg<sup>−1</sup>. OceanSITES is assigned 2.0 <inline-formula><mml:math id="M167" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">µ</mml:mi></mml:mrow></mml:math></inline-formula>mol kg<sup>−1</sup>.</p>
      <p id="d2e4178">When forming supervised-learning labels at the cell–month–depth scale, observation-related uncertainty is defined as the combined contribution of measurement error, vertical mapping uncertainty, and representativeness error, as defined in Eq. (10).

            <disp-formula id="Ch1.E10" content-type="numbered"><label>10</label><mml:math id="M169" display="block"><mml:mrow><mml:msub><mml:mi>U</mml:mi><mml:mi mathvariant="normal">obs</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:msqrt><mml:mrow><mml:msubsup><mml:mi mathvariant="italic">σ</mml:mi><mml:mi mathvariant="normal">meas</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup><mml:mo>+</mml:mo><mml:msubsup><mml:mi mathvariant="italic">σ</mml:mi><mml:mi mathvariant="normal">interp</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup><mml:mo>+</mml:mo><mml:msubsup><mml:mi mathvariant="italic">σ</mml:mi><mml:mi mathvariant="normal">rep</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:msqrt></mml:mrow></mml:math></disp-formula>

          Here, <inline-formula><mml:math id="M170" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mi mathvariant="normal">interp</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> is the vertical mapping uncertainty introduced when a profile is mapped to standard levels, and <inline-formula><mml:math id="M171" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mi mathvariant="normal">rep</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> is the representativeness error estimated from within unit dispersion. Their definitions and computation have been described earlier.</p>
      <p id="d2e4243">Mapping uncertainty characterizes error generated during the mapping from environmental fields to DO. It reflects the combined effects of model structure, representativeness of training samples, and nonstationarity across regions and depth levels. On the independent test set, we define the residual following Eq. (11):

            <disp-formula id="Ch1.E11" content-type="numbered"><label>11</label><mml:math id="M172" display="block"><mml:mrow><mml:mi>r</mml:mi><mml:mo>=</mml:mo><mml:mover accent="true"><mml:mi>y</mml:mi><mml:mo stretchy="false" mathvariant="normal">^</mml:mo></mml:mover><mml:mo>-</mml:mo><mml:mi>y</mml:mi></mml:mrow></mml:math></disp-formula>

          where <inline-formula><mml:math id="M173" display="inline"><mml:mover accent="true"><mml:mi>y</mml:mi><mml:mo stretchy="false" mathvariant="normal">^</mml:mo></mml:mover></mml:math></inline-formula> is the model prediction and <inline-formula><mml:math id="M174" display="inline"><mml:mi>y</mml:mi></mml:math></inline-formula> is the observed label for the corresponding test sample. Following Ito et al. (2024a), we estimate the mapping uncertainty at each grid cell from the test residuals by taking their second central moment, as defined in Eq. (12).

            <disp-formula id="Ch1.E12" content-type="numbered"><label>12</label><mml:math id="M175" display="block"><mml:mrow><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mi mathvariant="normal">map</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:msqrt><mml:mrow><mml:msup><mml:mover accent="true"><mml:mi>r</mml:mi><mml:mo mathvariant="normal">‾</mml:mo></mml:mover><mml:mn mathvariant="normal">2</mml:mn></mml:msup><mml:mo>-</mml:mo><mml:msup><mml:mfenced close=")" open="("><mml:mover accent="true"><mml:mi>r</mml:mi><mml:mo mathvariant="normal">‾</mml:mo></mml:mover></mml:mfenced><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:msqrt><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>

          Here, the overbar denotes the mean over all test samples within the same grid cell. This definition estimates the residual variance at the grid cell scale. It separates the systematic component <inline-formula><mml:math id="M176" display="inline"><mml:mover accent="true"><mml:mi>r</mml:mi><mml:mo mathvariant="normal">‾</mml:mo></mml:mover></mml:math></inline-formula> from the random error component captured by <inline-formula><mml:math id="M177" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mi mathvariant="normal">map</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>.</p>
      <p id="d2e4341">Based on the two components above, total uncertainty (<inline-formula><mml:math id="M178" display="inline"><mml:mrow><mml:msub><mml:mi>U</mml:mi><mml:mi mathvariant="normal">total</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>) is defined in Eq. (13).

            <disp-formula id="Ch1.E13" content-type="numbered"><label>13</label><mml:math id="M179" display="block"><mml:mrow><mml:msub><mml:mi>U</mml:mi><mml:mi mathvariant="normal">total</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:msqrt><mml:mrow><mml:msubsup><mml:mi>U</mml:mi><mml:mi mathvariant="normal">obs</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup><mml:mo>+</mml:mo><mml:msubsup><mml:mi mathvariant="italic">σ</mml:mi><mml:mi mathvariant="normal">map</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:msqrt></mml:mrow></mml:math></disp-formula></p>

      <fig id="F9" specific-use="star"><label>Figure 9</label><caption><p id="d2e4387">The left panels show the mean uncertainty distributions across multiple standard depth layers (0–2000 m), including observational uncertainty <bold>(a)</bold>, mapping uncertainty <bold>(b)</bold>, and total uncertainty <bold>(c)</bold>. The right panels depict the nearshore spatial variability of total uncertainty <bold>(d)</bold>, its vertical (depth-dependent) structure <bold>(e)</bold>, and the decadal evolution of total uncertainty over the test years <bold>(f)</bold>.</p></caption>
          <graphic xlink:href="https://essd.copernicus.org/articles/18/3125/2026/essd-18-3125-2026-f09.jpg"/>

        </fig>

      <p id="d2e4415">As shown in Fig. 9, the spatial distribution of uncertainty in GEOXYGEN reveals a distinct bifurcation between observation-related uncertainty and mapping residuals. Observational uncertainty manifests pronounced spatial stationarity across pelagic domains. Conversely, mapping uncertainty exhibits a structured regional geometry, with elevated error bands aligning with high-gradient kinetic regimes such as western boundary currents and upwelling systems. The highest uncertainty is observed in the Pacific (7.385 <inline-formula><mml:math id="M180" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">µ</mml:mi></mml:mrow></mml:math></inline-formula>mol kg<sup>−1</sup>), followed by the Atlantic (6.148 <inline-formula><mml:math id="M182" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">µ</mml:mi></mml:mrow></mml:math></inline-formula>mol kg<sup>−1</sup>), Arctic (4.439 <inline-formula><mml:math id="M184" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">µ</mml:mi></mml:mrow></mml:math></inline-formula>mol kg<sup>−1</sup>), Indian (4.084 <inline-formula><mml:math id="M186" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">µ</mml:mi></mml:mrow></mml:math></inline-formula>mol kg<sup>−1</sup>), and Southern Ocean (3.652 <inline-formula><mml:math id="M188" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">µ</mml:mi></mml:mrow></mml:math></inline-formula>mol kg<sup>−1</sup>). The higher uncertainty in the Pacific and Atlantic Oceans is primarily due to the structural complexity and dynamic intensity of their oceanographic systems, as well as the coastal distribution of early observational data in these basins. The global-mean total uncertainty of 6.054 <inline-formula><mml:math id="M190" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">µ</mml:mi></mml:mrow></mml:math></inline-formula>mol kg<sup>−1</sup> conceals a pronounced shallow-water divergence: in nearshore and shelf regions where bathymetric depth is shallower than <inline-formula><mml:math id="M192" display="inline"><mml:mo>∼</mml:mo></mml:math></inline-formula> 200 m, total uncertainty rises to 12.917 <inline-formula><mml:math id="M193" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">µ</mml:mi></mml:mrow></mml:math></inline-formula>mol kg<sup>−1</sup> – more than double the open-ocean baseline (5.970 <inline-formula><mml:math id="M195" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">µ</mml:mi></mml:mrow></mml:math></inline-formula>mol kg<sup>−1</sup>). Consistent with the bathymetry-binned diagnostic, uncertainty increases monotonically with decreasing bathymetric depth, indicating progressively reduced predictability toward the shallow, dynamically heterogeneous coastal ocean (Fig. 9d). This intensification is predominantly driven by localized, high-frequency processes – including phytoplankton pulses, riverine influx, and tidal oscillation – which generate non-linear spatial gradients that challenge the transferability of pelagic-trained feature relationships (Gilbert et al., 2010; Regier et al., 2023; Giomi et al., 2023; Liu et al., 2024). These localized dynamics induce steep spatial gradients and temporal non-stationarity, which subsequently reduce the regional transferability of learned DO-environment associations (Valera et al., 2020). For nearshore and semi-enclosed bay environments, we recommend using GEOXYGEN with caution and interpreting results at larger spatial aggregation to reduce sensitivity to local high-frequency variability.</p>
      <p id="d2e4587">The vertical stratification of uncertainty reflects the underlying hydrographic stability and process coupling within the water column. As depicted in Fig. 9e, the total uncertainty profile reaches its maximum in the epipelagic layer before undergoing monotonic attenuation toward the abyssal depths. This vertical variation is consistent with the change in model accuracy across depth layers. In contrast, the intermediate and deep layers provide stronger water-mass constraints and a more coherent oxygen field, facilitating a convergence of mapping uncertainty as the predictive relationship stabilizes.</p>
      <p id="d2e4590">Temporal trends in uncertainty serve as a diagnostic of the evolving global observing system. The progressive reduction in total uncertainty across the withheld test years (Fig. 9f) coincides with the expansion of the Argo float network, which transitioned ocean sensing from route-based ship surveys to a spatially distributed autonomous paradigm. This transition significantly improved the observational constraints in the Southern Hemisphere and remote ocean basins, effectively lowering the residual variance in model validation. While this decline highlights the structural evolution of sampling coverage, the resulting time series reflects the collective stability of the multi-decadal reconstruction rather than a year-specific local error estimate.</p>
      <p id="d2e4594">It should be noted that GEOXYGEN has higher uncertainty in coastal and shelf regions. Therefore, for coastal applications, we recommend using multi-grid-cell averages and focusing on monthly to seasonal or longer timescales. Use of the native 0.5° grid near the coast should therefore be approached with caution and, wherever feasible, cross-validated against local observations or higher-resolution regional products.</p>
</sec>
<sec id="Ch1.S5.SS2">
  <label>5.2</label><title>Impact of Removing Surface Predictors on Trends</title>
      <p id="d2e4605">A two-configuration sensitivity experiment was designed to test whether satellite-era sea-surface information alters the reconstructed dissolved oxygen (DO) signal. The full predictor configuration used the complete predictor suite, while the reduced predictor configuration excluded the sea-surface predictor group (<inline-formula><mml:math id="M197" display="inline"><mml:mi mathvariant="bold-italic">U</mml:mi></mml:math></inline-formula>, <inline-formula><mml:math id="M198" display="inline"><mml:mi mathvariant="bold-italic">V</mml:mi></mml:math></inline-formula>, SSH, EKE, MLD, PAR, Chl-<inline-formula><mml:math id="M199" display="inline"><mml:mi>a</mml:mi></mml:math></inline-formula>, DIC, <inline-formula><mml:math id="M200" display="inline"><mml:mi>p</mml:mi></mml:math></inline-formula>CO<sub>2</sub>, pH, alkalinity, and CO<sub>2</sub> flux). All other parts of the reconstruction pipeline were kept the same, so differences between the two products isolate the effect of including this predictor group on the same grid and over the same period.</p>
      <p id="d2e4655">The comparison focuses on deseasonalized DO anomalies averaged over the 1–100 m depth range and further smoothed using a centered 12-month moving average. This metric targets the upper ocean, where sea-surface information is most likely to influence variability, while suppressing grid-scale noise that can obscure low-frequency signals. Predictor impact is quantified by the configuration-induced component, defined as Full <inline-formula><mml:math id="M203" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula> Reduced, which isolates the incremental effect of the excluded predictor group in anomaly space.</p>
      <p id="d2e4665">The full and reduced reconstructions exhibit strong agreement over 1960–2022 (Fig. 10), yielding a consistent depiction of upper-ocean (1–100 m) low-frequency variability: a sustained positive-anomaly regime through the 1970s–1980s followed by a marked decline after <inline-formula><mml:math id="M204" display="inline"><mml:mo>∼</mml:mo></mml:math></inline-formula> 1990 and a transition into a persistently negative-anomaly regime by <inline-formula><mml:math id="M205" display="inline"><mml:mo>∼</mml:mo></mml:math></inline-formula> 2000 (relative to the monthly climatology). The Full–Reduced difference remains close to zero prior to the mid-1980s and stays small compared with the total anomaly amplitude thereafter, indicating that the global-mean, decadal-scale signal is only weakly sensitive to the inclusion of satellite-era surface predictors. As an external benchmark, ML4O2 reproduces comparable variability during <inline-formula><mml:math id="M206" display="inline"><mml:mo>∼</mml:mo></mml:math></inline-formula> 1965–2010 but shows more negative anomalies in the most recent decade, suggesting a stronger upper-ocean deoxygenation signal in anomaly space (relative to its monthly climatology) than GEOXYGEN over the same period; however, part of this divergence may arise from inter-product differences in baseline climatology, sampling, and reconstruction methodology. Overall, the close concordance between configurations supports the robustness of GEOXYGEN for decadal-scale assessments, with configuration-dependent deviations being minor relative to the dominant multi-decadal evolution.</p>

      <fig id="F10"><label>Figure 10</label><caption><p id="d2e4692">Depth-averaged (1–100 m) monthly dissolved-oxygen anomalies (1960–2022). The figure compares deseasonalized anomalies from the Full predictor reconstruction, the Reduced predictor reconstruction (excluding sea-surface predictors), and ML4O2, and overlays the difference between the Full and Reduced reconstructions (Full <inline-formula><mml:math id="M207" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula> Reduced). Gray shading indicates 1985–1997, corresponding to the rapid expansion of satellite-derived sea-surface observations.</p></caption>
          <graphic xlink:href="https://essd.copernicus.org/articles/18/3125/2026/essd-18-3125-2026-f10.png"/>

        </fig>

</sec>
<sec id="Ch1.S5.SS3">
  <label>5.3</label><title>Ship-Only Analysis of Long-Term Trends</title>
      <p id="d2e4716">To quantify the dependency of the early reconstruction on modern autonomous sensing, a ship-only sensitivity experiment was conducted to assess potential retrospective signal contamination from the data-dense Argo era. Within the Southern Ocean (south of 45° S), we benchmarked an “Argo-excluded” configuration – utilizing non-Argo historical records – against an “Argo-included” configuration across the 1960–2000 period. This region serves as a critical diagnostic domain due to its radical transition from sparse historical sampling to comprehensive Argo coverage over the last two decades. By maintaining identical external physical analysis constraints, the experiment isolates the impact of Argo observations on the reconstructed climatological mean and vertical structure (Fig. 11).</p>

      <fig id="F11" specific-use="star"><label>Figure 11</label><caption><p id="d2e4721">Southern Ocean dissolved oxygen (DO) climatology for 1960–2000. <bold>(a)</bold> Argo-included climatological mean DO averaged over 1–100 m. <bold>(b)</bold> Same as <bold>(a)</bold>, but for the Argo-excluded configuration. <bold>(c)</bold> Difference map (Argo-included minus Argo-excluded) for the 1–100 m climatological mean DO. <bold>(d)</bold> Area-weighted mean vertical DO profiles south of 45° S from the Argo-included and Argo-excluded reconstructions; the upper <inline-formula><mml:math id="M208" display="inline"><mml:mi>x</mml:mi></mml:math></inline-formula>-axis shows the corresponding profile difference (Argo-included minus Argo-excluded).</p></caption>
          <graphic xlink:href="https://essd.copernicus.org/articles/18/3125/2026/essd-18-3125-2026-f11.jpg"/>

        </fig>

      <p id="d2e4753">The results manifest pronounced morphological invariance in the upper-ocean (1–100 m) DO climatology across both configurations. The annular high-oxygen band and its relative position within the circumpolar system remain effectively stationary, implying that Argo inclusion does not induce systematic structural rearrangement. Notably, the difference map reveals localized positive <inline-formula><mml:math id="M209" display="inline"><mml:mi mathvariant="normal">Δ</mml:mi></mml:math></inline-formula>DO hotspots concentrated around the Antarctic margin, with enhanced amplitudes in several coastal/shelf sectors, whereas departures in the open-ocean circumpolar interior are minimal and barely discernible. Consistent with this pattern, the area-weighted mean vertical profiles are nearly overlapping across most depths (panel d), with only a subtle upper-ocean positive offset in the Argo-included configuration that remains below <inline-formula><mml:math id="M210" display="inline"><mml:mo>∼</mml:mo></mml:math></inline-formula> 0.5 <inline-formula><mml:math id="M211" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">µ</mml:mi></mml:mrow></mml:math></inline-formula>mol kg<sup>−1</sup>, indicating modest amplitude refinement rather than depth-dependent reorganization. Collectively, these features suggest that the reconstructed vertical architecture is primarily constrained by consistent physical structure, while the denser modern sampling acts mainly to fine-tune regional magnitudes, thereby supporting the structural integrity of the historical reconstruction.</p>
</sec>
<sec id="Ch1.S5.SS4">
  <label>5.4</label><title>Boundary Fusion Effects</title>
      <p id="d2e4798">While basin-based modeling represents regional environmental heterogeneity, it often induces “step-effect” discontinuities at basin boundaries, resulting in unrealistic shifts in long-term trends and monthly variability at adjacent grid points. To address these boundary artifacts, we implemented a fusion method to smooth inter-basin transitions. We validated this approach by constructing boundary diagnostic samples on a 0.5° by 0.5° grid, specifically selecting adjacent points across basin interfaces.</p>
      <p id="d2e4801">The diagnostic experiment targeted the 100 m depth layer, a region characterized by high spatial complexity and hydrographic sensitivity. Using a global non-partitioned model as a continuous baseline, we quantified discontinuity magnitudes via the differential trend (<inline-formula><mml:math id="M213" display="inline"><mml:mi mathvariant="normal">Δ</mml:mi></mml:math></inline-formula>trend) and standard deviation (<inline-formula><mml:math id="M214" display="inline"><mml:mi mathvariant="normal">Δ</mml:mi></mml:math></inline-formula>SD) at adjacent cells. Contrasting fused and unfused basin-specific outputs against this reference isolated the consistency gains.</p>

      <fig id="F12" specific-use="star"><label>Figure 12</label><caption><p id="d2e4820">Spatial distribution of the “step-effect” before and after boundary fusion at the 100 m depth layer. <bold>(a)</bold> Compares the trend differences (<inline-formula><mml:math id="M215" display="inline"><mml:mi mathvariant="normal">Δ</mml:mi></mml:math></inline-formula>trend), and <bold>(b)</bold> compares the standard deviation differences (<inline-formula><mml:math id="M216" display="inline"><mml:mi mathvariant="normal">Δ</mml:mi></mml:math></inline-formula>SD) across adjacent grid points at the basin boundaries.</p></caption>
          <graphic xlink:href="https://essd.copernicus.org/articles/18/3125/2026/essd-18-3125-2026-f12.png"/>

        </fig>

      <p id="d2e4850">Results reveal that the boundary fusion protocol mitigates abrupt statistical shifts, decreasing local biases in the partitioned map (Fig. 12). Improvements in <inline-formula><mml:math id="M217" display="inline"><mml:mi mathvariant="normal">Δ</mml:mi></mml:math></inline-formula>trend and <inline-formula><mml:math id="M218" display="inline"><mml:mi mathvariant="normal">Δ</mml:mi></mml:math></inline-formula>SD demonstrate that the smoothing operator suppresses interface artifacts. Nevertheless, small discrepancies relative to the global non-partitioned baseline persist, indicating that independently trained regional submodels may retain slightly different statistical mappings that are only partially reduced by simple fusion.</p>
      <p id="d2e4867">Boundary fusion thus serves as an effective strategy for mitigating partition-induced “step-effect” artifacts, yielding a more coherent global spatial structure. While localized gradients remain in high-contrast hydrographic regions, their reduced magnitude indicates a meaningful alleviation of boundary inconsistencies. Future enhancements could involve adaptive fusion schemes or multi-task learning to explicitly share data across regional submodels while preserving specialization.</p>
</sec>
</sec>
<sec id="Ch1.S6">
  <label>6</label><title>Data availability</title>
      <p id="d2e4879">The GEOXYGEN dataset produced in this study can be found at <ext-link xlink:href="https://doi.org/10.5281/zenodo.19703198" ext-link-type="DOI">10.5281/zenodo.19703198</ext-link> (Wang et al., 2026a) and <ext-link xlink:href="https://doi.org/10.12157/IOCAS.20260223.002" ext-link-type="DOI">10.12157/IOCAS.20260223.002</ext-link> (Wang et al., 2026b). A recommended-use note: because uncertainty in nearshore regions is substantially higher than in the open ocean, coastal and shelf applications should preferentially rely on spatially aggregated and temporally averaged fields rather than on individual native-grid cells.</p>
</sec>
<sec id="Ch1.S7">
  <label>7</label><title>Code availability</title>
      <p id="d2e4896">The code used to train the models and generate the data product in this paper is openly available from GitHub at <uri>https://github.com/layne1202/GEOXYGEN_Code</uri> (last access: 24 February 2026) and archived on Zenodo at <ext-link xlink:href="https://doi.org/10.5281/zenodo.19852901" ext-link-type="DOI">10.5281/zenodo.19852901</ext-link> (Wang, 2026). The repository also documents the key training and evaluation settings (e.g., fixed random seeds) and provides the environment/dependency information needed to reproduce the experiments.</p>
</sec>
<sec id="Ch1.S8" sec-type="conclusions">
  <label>8</label><title>Conclusions</title>
      <p id="d2e4913">By combining multi-source physical and biogeochemical predictors with an adaptive feature-selection strategy, we constructed a hierarchical modeling framework that accounts for underlying biogeochemical controls. Using this framework, we produced GEOXYGEN – a monthly, four-dimensional global ocean dissolved oxygen (DO) product at 0.5° <inline-formula><mml:math id="M219" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> 0.5° resolution spanning 1960–2024. This product is intended to help address some of the long-standing challenges posed by sparse observations and strong spatiotemporal heterogeneity in historical DO records. Evaluated on an independent out-of-time test composed of withheld years, the reconstruction demonstrates consistently high skill in all depth layers (<inline-formula><mml:math id="M220" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup><mml:mo>&gt;</mml:mo><mml:mn mathvariant="normal">0.9</mml:mn></mml:mrow></mml:math></inline-formula>), confirming its robustness under conservative validation protocols. The resulting high-resolution framework offers superior aggregate fidelity compared to existing 1° products, effectively capturing the fine-scale spatial variance essential for understanding contemporary ocean change. The main strengths of this study are as follows:</p>
      <p id="d2e4938"><italic>Heterogeneity-aware partitioned modeling.</italic> By combining vertical stratification with basin provincialization, we train CatBoost regressors within each basin–depth unit. This design directly addresses the limitations of single global ML models, which struggle to represent the spatially varying physical–biogeochemical controls on DO. In combination with adaptive feature selection, inverse-density weighting, year-grouped cross-validation, and cross-boundary fusion, the framework enhances robustness in undersampled regions, minimizes temporal leakage and boundary artefacts, and yields parsimonious, interpretable submodels. The same hierarchical strategy can be readily applied to other biogeochemical tracers or observing systems that exhibit strong regional and vertical heterogeneity.</p>
      <p id="d2e4943"><italic>Adaptive multi-source feature selection.</italic> Starting from a rich set of physical, biogeochemical, and spatiotemporal predictors, we employ a two-stage feature-selection procedure within each basin–depth unit to retain only variables that add independent skill. This adaptive, region- and depth-aware integration of multi-source environmental predictors strengthens the representation of upper-ocean processes and water-mass transitions while suppressing noise and redundancy, providing physically interpretable feature sets.</p>
      <p id="d2e4948"><italic>A physically consistent, long-record, high-resolution product.</italic> GEOXYGEN delivers global monthly DO fields from 1960 to 2024 on a consistent 0.5° <inline-formula><mml:math id="M221" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> 0.5° horizontal grid, spanning depths from the surface to 5500 m. Its high performance, climate consistency, and the robustness of the reconstructed trends enable reliable estimates of deoxygenation trends and decadal variations, providing a rigorous benchmark for evaluating and constraining Earth system models.</p>
      <p id="d2e4961">Nonetheless, GEOXYGEN has limitations related to observational coverage and methodological assumptions. Uncertainty is generally higher in early decades, ice-covered high latitudes, data-sparse deep basins, and nearshore regions. Future work will focus on refining the framework to further reduce uncertainty in these challenging regions and to improve its ability to represent time-varying biogeochemical regimes. Despite these limitations, GEOXYGEN provides a long-term and internally consistent basis for assessing multi-decadal changes in ocean dissolved oxygen. The GEOXYGEN dataset and basin-partition files can be obtained at the following website: <ext-link xlink:href="https://doi.org/10.12157/IOCAS.20260223.002" ext-link-type="DOI">10.12157/IOCAS.20260223.002</ext-link> (Wang et al., 2026b).</p>
</sec>

      
      </body>
    <back><notes notes-type="authorcontribution"><title>Author contributions</title>

      <p id="d2e4971">Z.W. and W.F. conceived the project. Z.W. collected and processed the data with contributions from C.X.; Z.W. carried out the study and generated the GEOXYGEN product with contributions from W.F.; Z.W. and W.F. wrote the manuscript with contributions from C.X. and G.W. All authors discussed the results and commented on the manuscript.</p>
  </notes><notes notes-type="competinginterests"><title>Competing interests</title>

      <p id="d2e4977">The contact author has declared that none of the authors has any competing interests.</p>
  </notes><notes notes-type="disclaimer"><title>Disclaimer</title>

      <p id="d2e4983">Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. The authors bear the ultimate responsibility for providing appropriate place names. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.</p>
  </notes><ack><title>Acknowledgements</title><p id="d2e4990">The OceanSITES data were collected and made freely available by the international OceanSITES project and the national programs that contribute to it.</p></ack><notes notes-type="financialsupport"><title>Financial support</title>

      <p id="d2e4995">This research has been supported by the Natural Science Foundation of Shanghai Municipality (grant no. 24ZR1404500), the Science and Technology Commission of Shanghai Municipality (STCSM; grant no. 25DZ3102200), and the National Natural Science Foundation of China (grant no. 42476011).</p>
  </notes><notes notes-type="reviewstatement"><title>Review statement</title>

      <p id="d2e5001">This paper was edited by Xingchen (Tony) Wang and reviewed by two anonymous referees.</p>
  </notes><ref-list>
    <title>References</title>

      <ref id="bib1.bib1"><label>1</label><mixed-citation>Bopp, L., Resplandy, L., Orr, J. C., Doney, S. C., Dunne, J. P., Gehlen, M., Halloran, P., Heinze, C., Ilyina, T., Séférian, R., Tjiputra, J., and Vichi, M.: Multiple stressors of ocean ecosystems in the 21st century: projections with CMIP5 models, Biogeosciences, 10, 6225–6245, <ext-link xlink:href="https://doi.org/10.5194/bg-10-6225-2013" ext-link-type="DOI">10.5194/bg-10-6225-2013</ext-link>, 2013.</mixed-citation></ref>
      <ref id="bib1.bib2"><label>2</label><mixed-citation>Breitburg, D., Levin, L. A., Oschlies, A., Gregoire, M., Chavez, F. P., Conley, D. J., Garcon, V., Gilbert, D., Gutierrez, D., Isensee, K., Jacinto, G. S., Limburg, K. E., Montes, I., Naqvi, S. W. A., Pitcher, G. C., Rabalais, N. N., Roman, M. R., Rose, K. A., Seibel, B. A., Telszewski, M., Yasuhara, M., and Zhang, J.: Declining oxygen in the global ocean and coastal waters, Science, 359, eaam7240, <ext-link xlink:href="https://doi.org/10.1126/science.aam7240" ext-link-type="DOI">10.1126/science.aam7240</ext-link>, 2018.</mixed-citation></ref>
      <ref id="bib1.bib3"><label>3</label><mixed-citation>Cao, R., Wang, S., Bao, S., Li, X., Tan, J., and Shao, C.: SE-LeNet: A data reconstruction method for dissolved oxygen in tropical Pacific with deep learning, in: Proc. 2024 IEEE Int. Conf. Parallel Distrib. Process. Appl. (ISPA), IEEE, <ext-link xlink:href="https://doi.org/10.1109/ISPA63168.2024.00031" ext-link-type="DOI">10.1109/ISPA63168.2024.00031</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bib4"><label>4</label><mixed-citation>Carpenter, J. H.: The accuracy of the winkler method for dissolved oxygen analysis 1, Limnol. Oceanogr., 10, 135–140, <ext-link xlink:href="https://doi.org/10.4319/lo.1965.10.1.0135" ext-link-type="DOI">10.4319/lo.1965.10.1.0135</ext-link>, 1965.</mixed-citation></ref>
      <ref id="bib1.bib5"><label>5</label><mixed-citation>Chau, T. T. T., Gehlen, M., and Chevallier, F.: A seamless ensemble-based reconstruction of surface ocean <inline-formula><mml:math id="M222" display="inline"><mml:mi>p</mml:mi></mml:math></inline-formula>CO<sub>2</sub> and air–sea CO<sub>2</sub> fluxes over the global coastal and open oceans, Biogeosciences, 19, 1087–1109, <ext-link xlink:href="https://doi.org/10.5194/bg-19-1087-2022" ext-link-type="DOI">10.5194/bg-19-1087-2022</ext-link>, 2022.</mixed-citation></ref>
      <ref id="bib1.bib6"><label>6</label><mixed-citation>Chau, T.-T.-T., Gehlen, M., Metzl, N., and Chevallier, F.: CMEMS-LSCE: a global, 0.25°, monthly reconstruction of the surface ocean carbonate system, Earth Syst. Sci. Data, 16, 121–160, <ext-link xlink:href="https://doi.org/10.5194/essd-16-121-2024" ext-link-type="DOI">10.5194/essd-16-121-2024</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bib7"><label>7</label><mixed-citation>Cheng, L. and Gouretski, V.: IAP Global Ocean Oxygen gridded product (1-degree), CASODC [data set], <ext-link xlink:href="https://doi.org/10.12157/IOCAS.20231214.006" ext-link-type="DOI">10.12157/IOCAS.20231214.006</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bib8"><label>8</label><mixed-citation>Chen, Z., Siedlecki, S., Long, M., Petrik, C. M., Stock, C. A., and Deutsch, C. A.: Skillful multiyear prediction of marine habitat shifts jointly constrained by ocean temperature and dissolved oxygen, Nat. Commun., 15, 900, <ext-link xlink:href="https://doi.org/10.1038/s41467-024-45016-5" ext-link-type="DOI">10.1038/s41467-024-45016-5</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bib9"><label>9</label><mixed-citation>Cocco, V., Joos, F., Steinacher, M., Frölicher, T. L., Bopp, L., Dunne, J., Gehlen, M., Heinze, C., Orr, J., Oschlies, A., Schneider, B., Segschneider, J., and Tjiputra, J.: Oxygen and indicators of stress for marine life in multi-model global warming projections, Biogeosciences, 10, 1849–1868, <ext-link xlink:href="https://doi.org/10.5194/bg-10-1849-2013" ext-link-type="DOI">10.5194/bg-10-1849-2013</ext-link>, 2013.</mixed-citation></ref>
      <ref id="bib1.bib10"><label>10</label><mixed-citation>Franco, A. C., Hernández-Ayón, J. M., Beier, E., Garçon, V., Maske, H., Paulmier, A., Färber-Lorda, J., Castro, R., and Sosa-Ávalos, R.: Air–sea CO<sub>2</sub> fluxes above the stratified oxygen minimum zone in the coastal region off Mexico, J. Geophys. Res.-Oceans, 119, 2923–2937, <ext-link xlink:href="https://doi.org/10.1002/2013JC009337" ext-link-type="DOI">10.1002/2013JC009337</ext-link>, 2014.</mixed-citation></ref>
      <ref id="bib1.bib11"><label>11</label><mixed-citation>Garabaghi, F. H., Benzer, S., and Benzer, R.: Modeling dissolved oxygen concentration using machine learning techniques with dimensionality reduction approach, Environ. Monit. Assess., 195, 879, <ext-link xlink:href="https://doi.org/10.1007/s10661-023-11492-3" ext-link-type="DOI">10.1007/s10661-023-11492-3</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bib12"><label>12</label><mixed-citation>Garcia, H., Cruzado, A., Gordon, L., and Escanez, J.: Decadal-scale chemical variability in the subtropical North Atlantic deduced from nutrient and oxygen data, J. Geophys. Res.-Oceans, 103, 2817–2830, <ext-link xlink:href="https://doi.org/10.1029/97JC03037" ext-link-type="DOI">10.1029/97JC03037</ext-link>, 1998.</mixed-citation></ref>
      <ref id="bib1.bib13"><label>13</label><mixed-citation>Garcia, H. E., Wang, Z., Bouchard, C., Cross, S. L., Paver, C. R., Reagan, J. R., Boyer, T. P., Locarnini, R. A., Mishonov, A. V., Baranova, O., Seidov, D., and Dukhovskoy, D.: World Ocean Atlas 2023, Volume 3: Dissolved Oxygen, Apparent Oxygen Utilization, and Oxygen Saturation, edited by:  Mishonov, A., NOAA Atlas NESDIS 91, 109 pp., <ext-link xlink:href="https://doi.org/10.25923/rb67-ns53" ext-link-type="DOI">10.25923/rb67-ns53</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bib14"><label>14</label><mixed-citation>Gilbert, D., Rabalais, N. N., Díaz, R. J., and Zhang, J.: Evidence for greater oxygen decline rates in the coastal ocean than in the open ocean, Biogeosciences, 7, 2283–2296, <ext-link xlink:href="https://doi.org/10.5194/bg-7-2283-2010" ext-link-type="DOI">10.5194/bg-7-2283-2010</ext-link>, 2010.</mixed-citation></ref>
      <ref id="bib1.bib15"><label>15</label><mixed-citation>Giomi, F., Barausse, A., Steckbauer, A., Daffonchio, D., Duarte, C. M., and Fusi, M.: Oxygen dynamics in marine productive ecosystems at ecologically relevant scales, Nat. Geosci., 16, 560–566, <ext-link xlink:href="https://doi.org/10.1038/s41561-023-01217-z" ext-link-type="DOI">10.1038/s41561-023-01217-z</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bib16"><label>16</label><mixed-citation>Gong, H., Li, C., and Zhou, Y.: Emerging global ocean deoxygenation across the 21st century, Geophys. Res. Lett., 48, e2021GL095370, <ext-link xlink:href="https://doi.org/10.1029/2021GL095370" ext-link-type="DOI">10.1029/2021GL095370</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bib17"><label>17</label><mixed-citation>Gouretski, V., Cheng, L., Du, J., Xing, X., Chai, F., and Tan, Z.: A consistent ocean oxygen profile dataset with new quality control and bias assessment, Earth Syst. Sci. Data, 16, 5503–5530, <ext-link xlink:href="https://doi.org/10.5194/essd-16-5503-2024" ext-link-type="DOI">10.5194/essd-16-5503-2024</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bib18"><label>18</label><mixed-citation>Gregoire, M., Garcon, V., Garcia, H., Breitburg, D., Isensee, K., Oschlies, A., Telszewski, M., Barth, A., Bittig, H. C., Carstensen, J., Carval, T., Chai, F., Chavez, F., Conley, D., Coppola, L., Crowe, S., Currie, K., Dai, M., Deflandre, B., Dewitte, B., Diaz, R., Garcia-Robledo, E., Gilbert, D., Giorgetti, A., Glud, R., Gutierrez, D., Hosoda, S., Ishii, M., Jacinto, G., Langdon, C., Lauvset, S. K., Levin, L. A., Limburg, K. E., Mehrtens, H., Montes, I., Naqvi, W., Paulmier, A., Pfeil, B., Pitcher, G., Pouliquen, S., Rabalais, N., Rabouille, C., Recape, V., Roman, M., Rose, K., Rudnick, D., Rummer, J., Schmechtig, C., Schmidtko, S., Seibel, B., Slomp, C., Sumalia, U. R., Tanhua, T., Thierry, V., Uchida, H., Wanninkhof, R., and Yasuhara, M.: A global ocean oxygen database and atlas for assessing and predicting deoxygenation and ocean health in the open and coastal ocean, Front. Mar. Sci., 8, <ext-link xlink:href="https://doi.org/10.3389/fmars.2021.724913" ext-link-type="DOI">10.3389/fmars.2021.724913</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bib19"><label>19</label><mixed-citation>Grégoire, M., Oschlies, A., Canfield, D., Castro, C., Ciglenecki, I., Croot, P., Salin, K., Schneider, B., Serret, P., and Slomp, C.: Ocean Oxygen: the role of the Ocean in the oxygen we breathe and the threat of deoxygenation, European Marine Board, Ostend, Belgium, Zenodo, <ext-link xlink:href="https://doi.org/10.5281/zenodo.7941157" ext-link-type="DOI">10.5281/zenodo.7941157</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bib20"><label>20</label><mixed-citation>Guinehut, S., Dhomps, A.-L., Larnicol, G., and Le Traon, P.-Y.: High resolution 3-D temperature and salinity fields derived from in situ and satellite observations, Ocean Sci., 8, 845–857, <ext-link xlink:href="https://doi.org/10.5194/os-8-845-2012" ext-link-type="DOI">10.5194/os-8-845-2012</ext-link>, 2012.</mixed-citation></ref>
      <ref id="bib1.bib21"><label>21</label><mixed-citation>Hauser, D., Tourain, C., Hermozo, L., Alraddawi, D., Aouf, L., Chapron, B., Dalphinet, A., Delaye, L., Dalila, M., and Dormy, E.: New observations from the SWIM radar on-board CFOSAT: Instrument validation and ocean wave measurement assessment, IEEE T. Geosci. Remote Sens., 59, 5–26, <ext-link xlink:href="https://doi.org/10.1109/TGRS.2020.2994372" ext-link-type="DOI">10.1109/TGRS.2020.2994372</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bib22"><label>22</label><mixed-citation>Hollitzer, H. A. L., Patara, L., Terhaar, J., and Oschlies, A.: Competing effects of wind and buoyancy forcing on ocean oxygen trends in recent decades, Nat. Commun., 15, <ext-link xlink:href="https://doi.org/10.1038/s41467-024-53557-y" ext-link-type="DOI">10.1038/s41467-024-53557-y</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bib23"><label>23</label><mixed-citation>Huang, S., Shao, J., Chen, Y., Qi, J., Wu, S., Zhang, F., He, X., and Du, Z.: Reconstruction of dissolved oxygen in the Indian Ocean from 1980 to 2019 based on machine learning techniques, Front. Mar. Sci., 10, 1291232, <ext-link xlink:href="https://doi.org/10.3389/fmars.2023.1291232" ext-link-type="DOI">10.3389/fmars.2023.1291232</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bib24"><label>24</label><mixed-citation>Humphries, N. E., Fuller, D. W., Schaefer, K. M., and Sims, D. W.: Highly active fish in low oxygen environments: Vertical movements and behavioural responses of bigeye and yellowfin tunas to oxygen minimum zones in the eastern Pacific Ocean, Mar. Biol., 171, 55, <ext-link xlink:href="https://doi.org/10.1007/s00227-023-04366-2" ext-link-type="DOI">10.1007/s00227-023-04366-2</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bib25"><label>25</label><mixed-citation>IOC, SCOR, and IAPSO: The International thermodynamic equation of seawater – 2010: calculation and use of thermodynamic properties [includes corrections up to 31st October 2015], Intergovernmental Oceanographic Commission Manuals and Guides, 56, UNESCO, Paris, France, 196 pp., <ext-link xlink:href="https://doi.org/10.25607/OBP-1338" ext-link-type="DOI">10.25607/OBP-1338</ext-link>, 2015.</mixed-citation></ref>
      <ref id="bib1.bib26"><label>26</label><mixed-citation>Ito, T., Minobe, S., Long, M. C., and Deutsch, C.: Upper ocean O<sub>2</sub> trends: 1958–2015, Geophys. Res. Lett., 44, 4214–4223, <ext-link xlink:href="https://doi.org/10.1002/2017GL073613" ext-link-type="DOI">10.1002/2017GL073613</ext-link>, 2017.</mixed-citation></ref>
      <ref id="bib1.bib27"><label>27</label><mixed-citation>Ito, T., Cervania, A., Cross, K., Ainchwar, S., and Delawalla, S.: Mapping dissolved oxygen concentrations by combining shipboard and Argo observations using machine learning algorithms, J. Geophys. Res.: Mach. Learn. Comput., 1, e2024JH000272, <ext-link xlink:href="https://doi.org/10.1029/2024JH000272" ext-link-type="DOI">10.1029/2024JH000272</ext-link>, 2024a.</mixed-citation></ref>
      <ref id="bib1.bib28"><label>28</label><mixed-citation>Ito, T., Garcia, H. E., Wang, Z., Minobe, S., Long, M. C., Cebrian, J., Reagan, J., Boyer, T., Paver, C., Bouchard, C., Takano, Y., Bushinsky, S., Cervania, A., and Deutsch, C. A.: Underestimation of multi-decadal global O<sub>2</sub> loss due to an optimal interpolation method, Biogeosciences, 21, 747–759, <ext-link xlink:href="https://doi.org/10.5194/bg-21-747-2024" ext-link-type="DOI">10.5194/bg-21-747-2024</ext-link>, 2024b.</mixed-citation></ref>
      <ref id="bib1.bib29"><label>29</label><mixed-citation>Kim, H., Franco, A. C., and Sumaila, U. R.: A selected review of impacts of ocean deoxygenation on fish and fisheries, Fishes, 8, <ext-link xlink:href="https://doi.org/10.3390/fishes8060316" ext-link-type="DOI">10.3390/fishes8060316</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bib30"><label>30</label><mixed-citation>Kolodziejczyk, N., Prigent-Mazella, A., and Gaillard, F.: ISAS temperature, salinity, dissolved oxygen gridded fields, SEANOE [data set], <ext-link xlink:href="https://doi.org/10.17882/52367" ext-link-type="DOI">10.17882/52367</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bib31"><label>31</label><mixed-citation>Li, C., Huang, J., Ding, L., Liu, X., Yu, H., and Huang, J.: Increasing escape of oxygen from oceans under climate change, Geophys. Res. Lett., 47, e2019GL086345, <ext-link xlink:href="https://doi.org/10.1029/2019GL086345" ext-link-type="DOI">10.1029/2019GL086345</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bib32"><label>32</label><mixed-citation>Liu, G., Yu, X., Zhang, J., Wang, X., Xu, N., and Ali, S.: Reconstruction of the three-dimensional dissolved oxygen and its spatio-temporal variations in the Mediterranean Sea using machine learning, J. Environ. Sci., 157, 710–728, <ext-link xlink:href="https://doi.org/10.1016/j.jes.2025.01.010" ext-link-type="DOI">10.1016/j.jes.2025.01.010</ext-link>, 2025.</mixed-citation></ref>
      <ref id="bib1.bib33"><label>33</label><mixed-citation>Liu, Q., Liu, C., Meng, Q., Su, B., Ye, H., Chen, B., Li, W., Cao, X., Nie, W., and Ma, N.: Machine learning reveals biological activities as the dominant factor in controlling deoxygenation in the South Yellow Sea, Cont. Shelf Res., 283, 105348, <ext-link xlink:href="https://doi.org/10.1016/j.csr.2024.105348" ext-link-type="DOI">10.1016/j.csr.2024.105348</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bib34"><label>34</label><mixed-citation>Lu, B., Zhao, Z., Han, L., Gan, X., Zhou, Y., Zhou, L., Fu, L., Wang, X., Zhou, C., and Zhang, J.: Oxygenerator: Reconstructing global ocean deoxygenation over a century with deep learning, arXiv [preprint], <ext-link xlink:href="https://doi.org/10.48550/arXiv.2405.07233" ext-link-type="DOI">10.48550/arXiv.2405.07233</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bib35"><label>35</label><mixed-citation>Ma, D., Zhao, F., Zhu, L., Li, X., Wei, J., Chen, X., Hou, L., Li, Y., and Liu, M.: Deep learning reveals hotspots of global oceanic oxygen changes from 2003 to 2020, Int. J. Appl. Earth Obs. Geoinf., 136, 104363, <ext-link xlink:href="https://doi.org/10.1016/j.jag.2025.104363" ext-link-type="DOI">10.1016/j.jag.2025.104363</ext-link>, 2025.</mixed-citation></ref>
      <ref id="bib1.bib36"><label>36</label><mixed-citation>Mears, C., Lee, T., Ricciardulli, L., Wang, X., and Wentz, F.: Improving the Accuracy of the Cross-Calibrated Multi-Platform (CCMP) Ocean Vector Winds, Remote Sens., 14, 4230, <ext-link xlink:href="https://doi.org/10.3390/rs14174230" ext-link-type="DOI">10.3390/rs14174230</ext-link>, 2022.</mixed-citation></ref>
      <ref id="bib1.bib37"><label>37</label><mixed-citation>Milà, C., Ludwig, M., Pebesma, E., Tonne, C., and Meyer, H.: Random forests with spatial proxies for environmental modelling: opportunities and pitfalls, Geosci. Model Dev., 17, 6007–6033, <ext-link xlink:href="https://doi.org/10.5194/gmd-17-6007-2024" ext-link-type="DOI">10.5194/gmd-17-6007-2024</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bib38"><label>38</label><mixed-citation>NASA Ocean Biology Processing Group: Sea-viewing Wide Field-of-view Sensor (SeaWiFS) Level-2 Ocean Color Data, version R2018.8, NASA Ocean Biology Distributed Active Archive Center [data set], <ext-link xlink:href="https://doi.org/10.5067/ORBVIEW-2/SEAWIFS/L2/OC/2018" ext-link-type="DOI">10.5067/ORBVIEW-2/SEAWIFS/L2/OC/2018</ext-link>, 2018.</mixed-citation></ref>
      <ref id="bib1.bib39"><label>39</label><mixed-citation>Oschlies, A.: A committed fourfold increase in ocean oxygen loss, Nat. Commun., 12, <ext-link xlink:href="https://doi.org/10.1038/s41467-021-22584-4" ext-link-type="DOI">10.1038/s41467-021-22584-4</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bib40"><label>40</label><mixed-citation>Oschlies, A., Brandt, P., Stramma, L., and Schmidtko, S.: Drivers and mechanisms of ocean deoxygenation, Nat. Geosci., 11, 467–473, <ext-link xlink:href="https://doi.org/10.1038/s41561-018-0152-2" ext-link-type="DOI">10.1038/s41561-018-0152-2</ext-link>, 2018.</mixed-citation></ref>
      <ref id="bib1.bib41"><label>41</label><mixed-citation>Ping, B., Meng, Y., Su, F., Xue, C., and Li, Z.: Retrieval of subsurface dissolved oxygen from surface oceanic parameters based on machine learning, Mar. Environ. Res., 199, 106578, <ext-link xlink:href="https://doi.org/10.1016/j.marenvres.2024.106578" ext-link-type="DOI">10.1016/j.marenvres.2024.106578</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bib42"><label>42</label><mixed-citation>Regier, P. J., Ward, N. D., Myers-Pigg, A. N., Grate, J., Freeman, M. J., and Ghosh, R. N.: Seasonal drivers of dissolved oxygen across a tidal creek–marsh interface revealed by machine learning, Limnol. Oceanogr., 68, 2359–2374, <ext-link xlink:href="https://doi.org/10.1002/lno.12426" ext-link-type="DOI">10.1002/lno.12426</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bib43"><label>43</label><mixed-citation>Robinson, C.: Microbial respiration, the engine of ocean deoxygenation, Front. Mar. Sci., 5, <ext-link xlink:href="https://doi.org/10.3389/fmars.2018.00533" ext-link-type="DOI">10.3389/fmars.2018.00533</ext-link>, 2019.</mixed-citation></ref>
      <ref id="bib1.bib44"><label>44</label><mixed-citation>Salazar, J. J., Garland, L., Ochoa, J., and Pyrcz, M. J.: Fair train-test split in machine learning: Mitigating spatial autocorrelation for improved prediction accuracy, J. Petrol. Sci. Eng., 209, 109885, <ext-link xlink:href="https://doi.org/10.1016/j.petrol.2021.109885" ext-link-type="DOI">10.1016/j.petrol.2021.109885</ext-link>, 2022.</mixed-citation></ref>
      <ref id="bib1.bib45"><label>45</label><mixed-citation>Schmidtko, S., Stramma, L., and Visbeck, M.: Decline in global oceanic oxygen content during the past five decades, Nature, 542, 335–339, <ext-link xlink:href="https://doi.org/10.1038/nature21399" ext-link-type="DOI">10.1038/nature21399</ext-link>, 2017.</mixed-citation></ref>
      <ref id="bib1.bib46"><label>46</label><mixed-citation>Shao, J., Huang, S., Chen, Y., Qi, J., Wang, Y., Wu, S., Liu, R., and Du, Z.: Satellite-based global sea surface oxygen mapping and interpretation with spatiotemporal machine learning, Environ. Sci. Technol., 58, 498–509, <ext-link xlink:href="https://doi.org/10.1021/acs.est.3c08833" ext-link-type="DOI">10.1021/acs.est.3c08833</ext-link>, 2024. </mixed-citation></ref>
      <ref id="bib1.bib47"><label>47</label><mixed-citation>Sharp, J. D., Fassbender, A. J., Carter, B. R., Johnson, G. C., Schultz, C., and Dunne, J. P.: GOBAI-O2: temporally and spatially resolved fields of ocean interior dissolved oxygen over nearly 2 decades, Earth Syst. Sci. Data, 15, 4481–4518, <ext-link xlink:href="https://doi.org/10.5194/essd-15-4481-2023" ext-link-type="DOI">10.5194/essd-15-4481-2023</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bib48"><label>48</label><mixed-citation>Szekely, T., Gourrion, J., Pouliquen, S., and Reverdin, G.: The CORA 5.2 dataset for global in situ temperature and salinity measurements: data description and validation, Ocean Sci., 15, 1601–1614, <ext-link xlink:href="https://doi.org/10.5194/os-15-1601-2019" ext-link-type="DOI">10.5194/os-15-1601-2019</ext-link>, 2019.</mixed-citation></ref>
      <ref id="bib1.bib49"><label>49</label><mixed-citation>Szekely, T., Gourrion, J., Pouliquen, S., Reverdin, G., and Merceur, F.: CORA, Coriolis Ocean Dataset for Reanalysis, SEANOE [data set], <ext-link xlink:href="https://doi.org/10.17882/46219" ext-link-type="DOI">10.17882/46219</ext-link>, 2025.</mixed-citation></ref>
      <ref id="bib1.bib50"><label>50</label><mixed-citation>Valera, M., Walter, R. K., Bailey, B. A., and Castillo, J. E.: Machine learning based predictions of dissolved oxygen in a small coastal embayment, J. Mar. Sci. Eng., 8, 1007, <ext-link xlink:href="https://doi.org/10.3390/jmse8121007" ext-link-type="DOI">10.3390/jmse8121007</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bib51"><label>51</label><mixed-citation>Wagstaff, J. and Bean, B.: remap: Regionalized models with spatially smooth predictions, R J., 14, 160–178, <ext-link xlink:href="https://doi.org/10.32614/RJ-2023-004" ext-link-type="DOI">10.32614/RJ-2023-004</ext-link>, 2022.</mixed-citation></ref>
      <ref id="bib1.bib52"><label>52</label><mixed-citation>Wang, Z.: GEOXYGEN_Code, Zenodo [code], <ext-link xlink:href="https://doi.org/10.5281/zenodo.19852901" ext-link-type="DOI">10.5281/zenodo.19852901</ext-link>, 2026.</mixed-citation></ref>
      <ref id="bib1.bib53"><label>53</label><mixed-citation>Wang, Z., Xue, C., and Ping, B.: A reconstructing model based on time–space–depth partitioning for global ocean dissolved oxygen concentration, Remote Sens., 16, 228, <ext-link xlink:href="https://doi.org/10.3390/rs16020228" ext-link-type="DOI">10.3390/rs16020228</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bib54"><label>54</label><mixed-citation>Wang, Z., Fu, W., Xue, C., and Wang, G.: GEOXYGEN: A global monthly gridded dissolved oxygen product (0.5° <inline-formula><mml:math id="M228" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> 0.5°), Zenodo [data set], <ext-link xlink:href="https://doi.org/10.5281/zenodo.19703198" ext-link-type="DOI">10.5281/zenodo.19703198</ext-link>, 2026a.</mixed-citation></ref>
      <ref id="bib1.bib55"><label>55</label><mixed-citation>Wang, Z., Fu, W., Xue, C., and Wang, G.: GEOXYGEN: A global monthly gridded dissolved oxygen product (0.5° <inline-formula><mml:math id="M229" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> 0.5°) [data set], <ext-link xlink:href="https://doi.org/10.12157/IOCAS.20260223.002" ext-link-type="DOI">10.12157/IOCAS.20260223.002</ext-link>, 2026b.</mixed-citation></ref>
      <ref id="bib1.bib56"><label>56</label><mixed-citation>Xue, C., Wang, Z., Yue, L., and Niu, C.: A global four-dimensional gridded dataset of ocean dissolved oxygen concentration retrieval from Argo profiles, Geosci. Data J., 11, 775–789, <ext-link xlink:href="https://doi.org/10.1002/gdj3.251" ext-link-type="DOI">10.1002/gdj3.251</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bib57"><label>57</label><mixed-citation>Yamaguchi, R., Kouketsu, S., Kosugi, N., and Ishii, M.: Global upper ocean dissolved oxygen budget for constraining the biological carbon pump, Commun. Earth Environ., 5, <ext-link xlink:href="https://doi.org/10.1038/s43247-024-01886-7" ext-link-type="DOI">10.1038/s43247-024-01886-7</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bib58"><label>58</label><mixed-citation>Zhou, Y., Gong, H., and Zhou, F.: Responses of Horizontally Expanding Oceanic Oxygen Minimum Zones to Climate Change Based on Observations, Geophys. Res. Lett., 49, <ext-link xlink:href="https://doi.org/10.1029/2022gl097724" ext-link-type="DOI">10.1029/2022gl097724</ext-link>, 2022.</mixed-citation></ref>

  </ref-list></back>
    <!--<article-title-html>GEOXYGEN: a global long-term dissolved oxygen dataset based on biogeochemistry-aware machine learning framework and multi-source observations</article-title-html>
<abstract-html/>
<ref-html id="bib1.bib1"><label>1</label><mixed-citation>
      
Bopp, L., Resplandy, L., Orr, J. C., Doney, S. C., Dunne, J. P., Gehlen, M., Halloran, P., Heinze, C., Ilyina, T., Séférian, R., Tjiputra, J., and Vichi, M.: Multiple stressors of ocean ecosystems in the 21st century: projections with CMIP5 models, Biogeosciences, 10, 6225–6245, <a href="https://doi.org/10.5194/bg-10-6225-2013" target="_blank">https://doi.org/10.5194/bg-10-6225-2013</a>, 2013.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib2"><label>2</label><mixed-citation>
      
Breitburg, D., Levin, L. A., Oschlies, A., Gregoire, M., Chavez, F. P.,
Conley, D. J., Garcon, V., Gilbert, D., Gutierrez, D., Isensee, K., Jacinto,
G. S., Limburg, K. E., Montes, I., Naqvi, S. W. A., Pitcher, G. C.,
Rabalais, N. N., Roman, M. R., Rose, K. A., Seibel, B. A., Telszewski, M.,
Yasuhara, M., and Zhang, J.: Declining oxygen in the global ocean and
coastal waters, Science, 359, eaam7240,
<a href="https://doi.org/10.1126/science.aam7240" target="_blank">https://doi.org/10.1126/science.aam7240</a>, 2018.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib3"><label>3</label><mixed-citation>
      
Cao, R., Wang, S., Bao, S., Li, X., Tan, J., and Shao, C.: SE-LeNet: A data
reconstruction method for dissolved oxygen in tropical Pacific with deep
learning, in: Proc. 2024 IEEE Int. Conf. Parallel Distrib. Process. Appl.
(ISPA), IEEE, <a href="https://doi.org/10.1109/ISPA63168.2024.00031" target="_blank">https://doi.org/10.1109/ISPA63168.2024.00031</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib4"><label>4</label><mixed-citation>
      
Carpenter, J. H.: The accuracy of the winkler method for dissolved oxygen
analysis 1, Limnol. Oceanogr., 10, 135–140,
<a href="https://doi.org/10.4319/lo.1965.10.1.0135" target="_blank">https://doi.org/10.4319/lo.1965.10.1.0135</a>, 1965.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib5"><label>5</label><mixed-citation>
      
Chau, T. T. T., Gehlen, M., and Chevallier, F.: A seamless ensemble-based reconstruction of surface ocean <i>p</i>CO<sub>2</sub> and air–sea CO<sub>2</sub> fluxes over the global coastal and open oceans, Biogeosciences, 19, 1087–1109, <a href="https://doi.org/10.5194/bg-19-1087-2022" target="_blank">https://doi.org/10.5194/bg-19-1087-2022</a>, 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib6"><label>6</label><mixed-citation>
      
Chau, T.-T.-T., Gehlen, M., Metzl, N., and Chevallier, F.: CMEMS-LSCE: a global, 0.25°, monthly reconstruction of the surface ocean carbonate system, Earth Syst. Sci. Data, 16, 121–160, <a href="https://doi.org/10.5194/essd-16-121-2024" target="_blank">https://doi.org/10.5194/essd-16-121-2024</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib7"><label>7</label><mixed-citation>
      
Cheng, L. and Gouretski, V.: IAP Global Ocean Oxygen gridded product
(1-degree), CASODC [data set],
<a href="https://doi.org/10.12157/IOCAS.20231214.006" target="_blank">https://doi.org/10.12157/IOCAS.20231214.006</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib8"><label>8</label><mixed-citation>
      
Chen, Z., Siedlecki, S., Long, M., Petrik, C. M., Stock, C. A., and Deutsch,
C. A.: Skillful multiyear prediction of marine habitat shifts jointly
constrained by ocean temperature and dissolved oxygen, Nat. Commun., 15,
900, <a href="https://doi.org/10.1038/s41467-024-45016-5" target="_blank">https://doi.org/10.1038/s41467-024-45016-5</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib9"><label>9</label><mixed-citation>
      
Cocco, V., Joos, F., Steinacher, M., Frölicher, T. L., Bopp, L., Dunne, J., Gehlen, M., Heinze, C., Orr, J., Oschlies, A., Schneider, B., Segschneider, J., and Tjiputra, J.: Oxygen and indicators of stress for marine life in multi-model global warming projections, Biogeosciences, 10, 1849–1868, <a href="https://doi.org/10.5194/bg-10-1849-2013" target="_blank">https://doi.org/10.5194/bg-10-1849-2013</a>, 2013.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib10"><label>10</label><mixed-citation>
      
Franco, A. C., Hernández-Ayón, J. M., Beier, E., Garçon, V.,
Maske, H., Paulmier, A., Färber-Lorda, J., Castro, R., and
Sosa-Ávalos, R.: Air–sea CO<sub>2</sub> fluxes above the stratified oxygen
minimum zone in the coastal region off Mexico, J. Geophys. Res.-Oceans, 119,
2923–2937, <a href="https://doi.org/10.1002/2013JC009337" target="_blank">https://doi.org/10.1002/2013JC009337</a>, 2014.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib11"><label>11</label><mixed-citation>
      
Garabaghi, F. H., Benzer, S., and Benzer, R.: Modeling dissolved oxygen
concentration using machine learning techniques with dimensionality
reduction approach, Environ. Monit. Assess., 195, 879,
<a href="https://doi.org/10.1007/s10661-023-11492-3" target="_blank">https://doi.org/10.1007/s10661-023-11492-3</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib12"><label>12</label><mixed-citation>
      
Garcia, H., Cruzado, A., Gordon, L., and Escanez, J.: Decadal-scale
chemical variability in the subtropical North Atlantic deduced from nutrient
and oxygen data, J. Geophys. Res.-Oceans, 103, 2817–2830,
<a href="https://doi.org/10.1029/97JC03037" target="_blank">https://doi.org/10.1029/97JC03037</a>, 1998.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib13"><label>13</label><mixed-citation>
      
Garcia, H. E., Wang, Z., Bouchard, C., Cross, S. L., Paver, C. R., Reagan,
J. R., Boyer, T. P., Locarnini, R. A., Mishonov, A. V., Baranova, O.,
Seidov, D., and Dukhovskoy, D.: World Ocean Atlas 2023, Volume 3: Dissolved
Oxygen, Apparent Oxygen Utilization, and Oxygen Saturation, edited by:  Mishonov, A.,
NOAA Atlas NESDIS 91, 109 pp., <a href="https://doi.org/10.25923/rb67-ns53" target="_blank">https://doi.org/10.25923/rb67-ns53</a>,
2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib14"><label>14</label><mixed-citation>
      
Gilbert, D., Rabalais, N. N., Díaz, R. J., and Zhang, J.: Evidence for greater oxygen decline rates in the coastal ocean than in the open ocean, Biogeosciences, 7, 2283–2296, <a href="https://doi.org/10.5194/bg-7-2283-2010" target="_blank">https://doi.org/10.5194/bg-7-2283-2010</a>, 2010.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib15"><label>15</label><mixed-citation>
      
Giomi, F., Barausse, A., Steckbauer, A., Daffonchio, D., Duarte, C. M., and
Fusi, M.: Oxygen dynamics in marine productive ecosystems at ecologically
relevant scales, Nat. Geosci., 16, 560–566,
<a href="https://doi.org/10.1038/s41561-023-01217-z" target="_blank">https://doi.org/10.1038/s41561-023-01217-z</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib16"><label>16</label><mixed-citation>
      
Gong, H., Li, C., and Zhou, Y.: Emerging global ocean deoxygenation across
the 21st century, Geophys. Res. Lett., 48, e2021GL095370,
<a href="https://doi.org/10.1029/2021GL095370" target="_blank">https://doi.org/10.1029/2021GL095370</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib17"><label>17</label><mixed-citation>
      
Gouretski, V., Cheng, L., Du, J., Xing, X., Chai, F., and Tan, Z.: A consistent ocean oxygen profile dataset with new quality control and bias assessment, Earth Syst. Sci. Data, 16, 5503–5530, <a href="https://doi.org/10.5194/essd-16-5503-2024" target="_blank">https://doi.org/10.5194/essd-16-5503-2024</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib18"><label>18</label><mixed-citation>
      
Gregoire, M., Garcon, V., Garcia, H., Breitburg, D., Isensee, K., Oschlies,
A., Telszewski, M., Barth, A., Bittig, H. C., Carstensen, J., Carval, T.,
Chai, F., Chavez, F., Conley, D., Coppola, L., Crowe, S., Currie, K., Dai,
M., Deflandre, B., Dewitte, B., Diaz, R., Garcia-Robledo, E., Gilbert, D.,
Giorgetti, A., Glud, R., Gutierrez, D., Hosoda, S., Ishii, M., Jacinto, G.,
Langdon, C., Lauvset, S. K., Levin, L. A., Limburg, K. E., Mehrtens, H.,
Montes, I., Naqvi, W., Paulmier, A., Pfeil, B., Pitcher, G., Pouliquen, S.,
Rabalais, N., Rabouille, C., Recape, V., Roman, M., Rose, K., Rudnick, D.,
Rummer, J., Schmechtig, C., Schmidtko, S., Seibel, B., Slomp, C., Sumalia,
U. R., Tanhua, T., Thierry, V., Uchida, H., Wanninkhof, R., and Yasuhara,
M.: A global ocean oxygen database and atlas for assessing and predicting
deoxygenation and ocean health in the open and coastal ocean, Front. Mar.
Sci., 8, <a href="https://doi.org/10.3389/fmars.2021.724913" target="_blank">https://doi.org/10.3389/fmars.2021.724913</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib19"><label>19</label><mixed-citation>
      
Grégoire, M., Oschlies, A., Canfield, D., Castro, C., Ciglenecki, I.,
Croot, P., Salin, K., Schneider, B., Serret, P., and Slomp, C.: Ocean
Oxygen: the role of the Ocean in the oxygen we breathe and the threat of
deoxygenation, European Marine Board, Ostend, Belgium, Zenodo,
<a href="https://doi.org/10.5281/zenodo.7941157" target="_blank">https://doi.org/10.5281/zenodo.7941157</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib20"><label>20</label><mixed-citation>
      
Guinehut, S., Dhomps, A.-L., Larnicol, G., and Le Traon, P.-Y.: High resolution 3-D temperature and salinity fields derived from in situ and satellite observations, Ocean Sci., 8, 845–857, <a href="https://doi.org/10.5194/os-8-845-2012" target="_blank">https://doi.org/10.5194/os-8-845-2012</a>, 2012.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib21"><label>21</label><mixed-citation>
      
Hauser, D., Tourain, C., Hermozo, L., Alraddawi, D., Aouf, L., Chapron, B.,
Dalphinet, A., Delaye, L., Dalila, M., and Dormy, E.: New observations from
the SWIM radar on-board CFOSAT: Instrument validation and ocean wave
measurement assessment, IEEE T. Geosci. Remote Sens., 59, 5–26,
<a href="https://doi.org/10.1109/TGRS.2020.2994372" target="_blank">https://doi.org/10.1109/TGRS.2020.2994372</a>, 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib22"><label>22</label><mixed-citation>
      
Hollitzer, H. A. L., Patara, L., Terhaar, J., and Oschlies, A.: Competing
effects of wind and buoyancy forcing on ocean oxygen trends in recent
decades, Nat. Commun., 15, <a href="https://doi.org/10.1038/s41467-024-53557-y" target="_blank">https://doi.org/10.1038/s41467-024-53557-y</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib23"><label>23</label><mixed-citation>
      
Huang, S., Shao, J., Chen, Y., Qi, J., Wu, S., Zhang, F., He, X., and Du,
Z.: Reconstruction of dissolved oxygen in the Indian Ocean from 1980 to 2019
based on machine learning techniques, Front. Mar. Sci., 10, 1291232,
<a href="https://doi.org/10.3389/fmars.2023.1291232" target="_blank">https://doi.org/10.3389/fmars.2023.1291232</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib24"><label>24</label><mixed-citation>
      
Humphries, N. E., Fuller, D. W., Schaefer, K. M., and Sims, D. W.: Highly
active fish in low oxygen environments: Vertical movements and behavioural
responses of bigeye and yellowfin tunas to oxygen minimum zones in the
eastern Pacific Ocean, Mar. Biol., 171, 55,
<a href="https://doi.org/10.1007/s00227-023-04366-2" target="_blank">https://doi.org/10.1007/s00227-023-04366-2</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib25"><label>25</label><mixed-citation>
      
IOC, SCOR, and IAPSO: The International thermodynamic equation of seawater – 2010: calculation and use of thermodynamic properties [includes corrections up to 31st October 2015], Intergovernmental Oceanographic Commission Manuals and Guides, 56, UNESCO, Paris, France, 196 pp., <a href="https://doi.org/10.25607/OBP-1338" target="_blank">https://doi.org/10.25607/OBP-1338</a>, 2015.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib26"><label>26</label><mixed-citation>
      
Ito, T., Minobe, S., Long, M. C., and Deutsch, C.: Upper ocean O<sub>2</sub>
trends: 1958–2015, Geophys. Res. Lett., 44, 4214–4223,
<a href="https://doi.org/10.1002/2017GL073613" target="_blank">https://doi.org/10.1002/2017GL073613</a>, 2017.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib27"><label>27</label><mixed-citation>
      
Ito, T., Cervania, A., Cross, K., Ainchwar, S., and Delawalla, S.: Mapping
dissolved oxygen concentrations by combining shipboard and Argo observations
using machine learning algorithms, J. Geophys. Res.: Mach. Learn. Comput.,
1, e2024JH000272, <a href="https://doi.org/10.1029/2024JH000272" target="_blank">https://doi.org/10.1029/2024JH000272</a>, 2024a.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib28"><label>28</label><mixed-citation>
      
Ito, T., Garcia, H. E., Wang, Z., Minobe, S., Long, M. C., Cebrian, J., Reagan, J., Boyer, T., Paver, C., Bouchard, C., Takano, Y., Bushinsky, S., Cervania, A., and Deutsch, C. A.: Underestimation of multi-decadal global O<sub>2</sub> loss due to an optimal interpolation method, Biogeosciences, 21, 747–759, <a href="https://doi.org/10.5194/bg-21-747-2024" target="_blank">https://doi.org/10.5194/bg-21-747-2024</a>, 2024b.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib29"><label>29</label><mixed-citation>
      
Kim, H., Franco, A. C., and Sumaila, U. R.: A selected review of impacts of
ocean deoxygenation on fish and fisheries, Fishes, 8,
<a href="https://doi.org/10.3390/fishes8060316" target="_blank">https://doi.org/10.3390/fishes8060316</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib30"><label>30</label><mixed-citation>
      
Kolodziejczyk, N., Prigent-Mazella, A., and Gaillard, F.: ISAS temperature,
salinity, dissolved oxygen gridded fields, SEANOE [data set],
<a href="https://doi.org/10.17882/52367" target="_blank">https://doi.org/10.17882/52367</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib31"><label>31</label><mixed-citation>
      
Li, C., Huang, J., Ding, L., Liu, X., Yu, H., and Huang, J.: Increasing
escape of oxygen from oceans under climate change, Geophys. Res. Lett., 47,
e2019GL086345, <a href="https://doi.org/10.1029/2019GL086345" target="_blank">https://doi.org/10.1029/2019GL086345</a>, 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib32"><label>32</label><mixed-citation>
      
Liu, G., Yu, X., Zhang, J., Wang, X., Xu, N., and Ali, S.: Reconstruction of
the three-dimensional dissolved oxygen and its spatio-temporal variations in
the Mediterranean Sea using machine learning, J. Environ. Sci., 157,
710–728, <a href="https://doi.org/10.1016/j.jes.2025.01.010" target="_blank">https://doi.org/10.1016/j.jes.2025.01.010</a>, 2025.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib33"><label>33</label><mixed-citation>
      
Liu, Q., Liu, C., Meng, Q., Su, B., Ye, H., Chen, B., Li, W., Cao, X., Nie,
W., and Ma, N.: Machine learning reveals biological activities as the
dominant factor in controlling deoxygenation in the South Yellow Sea, Cont.
Shelf Res., 283, 105348, <a href="https://doi.org/10.1016/j.csr.2024.105348" target="_blank">https://doi.org/10.1016/j.csr.2024.105348</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib34"><label>34</label><mixed-citation>
      
Lu, B., Zhao, Z., Han, L., Gan, X., Zhou, Y., Zhou, L., Fu, L., Wang, X.,
Zhou, C., and Zhang, J.: Oxygenerator: Reconstructing global ocean
deoxygenation over a century with deep learning, arXiv [preprint],
<a href="https://doi.org/10.48550/arXiv.2405.07233" target="_blank">https://doi.org/10.48550/arXiv.2405.07233</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib35"><label>35</label><mixed-citation>
      
Ma, D., Zhao, F., Zhu, L., Li, X., Wei, J., Chen, X., Hou, L., Li, Y., and
Liu, M.: Deep learning reveals hotspots of global oceanic oxygen changes
from 2003 to 2020, Int. J. Appl. Earth Obs. Geoinf., 136, 104363,
<a href="https://doi.org/10.1016/j.jag.2025.104363" target="_blank">https://doi.org/10.1016/j.jag.2025.104363</a>, 2025.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib36"><label>36</label><mixed-citation>
      
Mears, C., Lee, T., Ricciardulli, L., Wang, X., and Wentz, F.: Improving the
Accuracy of the Cross-Calibrated Multi-Platform (CCMP) Ocean Vector Winds,
Remote Sens., 14, 4230, <a href="https://doi.org/10.3390/rs14174230" target="_blank">https://doi.org/10.3390/rs14174230</a>, 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib37"><label>37</label><mixed-citation>
      
Milà, C., Ludwig, M., Pebesma, E., Tonne, C., and Meyer, H.: Random forests with spatial proxies for environmental modelling: opportunities and pitfalls, Geosci. Model Dev., 17, 6007–6033, <a href="https://doi.org/10.5194/gmd-17-6007-2024" target="_blank">https://doi.org/10.5194/gmd-17-6007-2024</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib38"><label>38</label><mixed-citation>
      
NASA Ocean Biology Processing Group: Sea-viewing Wide Field-of-view Sensor
(SeaWiFS) Level-2 Ocean Color Data, version R2018.8, NASA Ocean Biology
Distributed Active Archive Center [data set],
<a href="https://doi.org/10.5067/ORBVIEW-2/SEAWIFS/L2/OC/2018" target="_blank">https://doi.org/10.5067/ORBVIEW-2/SEAWIFS/L2/OC/2018</a>, 2018.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib39"><label>39</label><mixed-citation>
      
Oschlies, A.: A committed fourfold increase in ocean oxygen loss, Nat.
Commun., 12, <a href="https://doi.org/10.1038/s41467-021-22584-4" target="_blank">https://doi.org/10.1038/s41467-021-22584-4</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib40"><label>40</label><mixed-citation>
      
Oschlies, A., Brandt, P., Stramma, L., and Schmidtko, S.: Drivers and
mechanisms of ocean deoxygenation, Nat. Geosci., 11, 467–473,
<a href="https://doi.org/10.1038/s41561-018-0152-2" target="_blank">https://doi.org/10.1038/s41561-018-0152-2</a>, 2018.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib41"><label>41</label><mixed-citation>
      
Ping, B., Meng, Y., Su, F., Xue, C., and Li, Z.: Retrieval of subsurface
dissolved oxygen from surface oceanic parameters based on machine learning,
Mar. Environ. Res., 199, 106578,
<a href="https://doi.org/10.1016/j.marenvres.2024.106578" target="_blank">https://doi.org/10.1016/j.marenvres.2024.106578</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib42"><label>42</label><mixed-citation>
      
Regier, P. J., Ward, N. D., Myers-Pigg, A. N., Grate, J., Freeman, M. J.,
and Ghosh, R. N.: Seasonal drivers of dissolved oxygen across a tidal
creek–marsh interface revealed by machine learning, Limnol. Oceanogr., 68,
2359–2374, <a href="https://doi.org/10.1002/lno.12426" target="_blank">https://doi.org/10.1002/lno.12426</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib43"><label>43</label><mixed-citation>
      
Robinson, C.: Microbial respiration, the engine of ocean deoxygenation,
Front. Mar. Sci., 5, <a href="https://doi.org/10.3389/fmars.2018.00533" target="_blank">https://doi.org/10.3389/fmars.2018.00533</a>, 2019.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib44"><label>44</label><mixed-citation>
      
Salazar, J. J., Garland, L., Ochoa, J., and Pyrcz, M. J.: Fair train-test
split in machine learning: Mitigating spatial autocorrelation for improved
prediction accuracy, J. Petrol. Sci. Eng., 209, 109885,
<a href="https://doi.org/10.1016/j.petrol.2021.109885" target="_blank">https://doi.org/10.1016/j.petrol.2021.109885</a>, 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib45"><label>45</label><mixed-citation>
      
Schmidtko, S., Stramma, L., and Visbeck, M.: Decline in global oceanic
oxygen content during the past five decades, Nature, 542, 335–339,
<a href="https://doi.org/10.1038/nature21399" target="_blank">https://doi.org/10.1038/nature21399</a>, 2017.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib46"><label>46</label><mixed-citation>
      
Shao, J., Huang, S., Chen, Y., Qi, J., Wang, Y., Wu, S., Liu, R., and Du,
Z.: Satellite-based global sea surface oxygen mapping and interpretation
with spatiotemporal machine learning, Environ. Sci. Technol., 58, 498–509,
<a href="https://doi.org/10.1021/acs.est.3c08833" target="_blank">https://doi.org/10.1021/acs.est.3c08833</a>, 2024.


    </mixed-citation></ref-html>
<ref-html id="bib1.bib47"><label>47</label><mixed-citation>
      
Sharp, J. D., Fassbender, A. J., Carter, B. R., Johnson, G. C., Schultz, C., and Dunne, J. P.: GOBAI-O2: temporally and spatially resolved fields of ocean interior dissolved oxygen over nearly 2 decades, Earth Syst. Sci. Data, 15, 4481–4518, <a href="https://doi.org/10.5194/essd-15-4481-2023" target="_blank">https://doi.org/10.5194/essd-15-4481-2023</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib48"><label>48</label><mixed-citation>
      
Szekely, T., Gourrion, J., Pouliquen, S., and Reverdin, G.: The CORA 5.2 dataset for global in situ temperature and salinity measurements: data description and validation, Ocean Sci., 15, 1601–1614, <a href="https://doi.org/10.5194/os-15-1601-2019" target="_blank">https://doi.org/10.5194/os-15-1601-2019</a>, 2019.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib49"><label>49</label><mixed-citation>
      
Szekely, T., Gourrion, J., Pouliquen, S., Reverdin, G., and Merceur, F.:
CORA, Coriolis Ocean Dataset for Reanalysis, SEANOE [data set],
<a href="https://doi.org/10.17882/46219" target="_blank">https://doi.org/10.17882/46219</a>, 2025.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib50"><label>50</label><mixed-citation>
      
Valera, M., Walter, R. K., Bailey, B. A., and Castillo, J. E.: Machine
learning based predictions of dissolved oxygen in a small coastal embayment,
J. Mar. Sci. Eng., 8, 1007, <a href="https://doi.org/10.3390/jmse8121007" target="_blank">https://doi.org/10.3390/jmse8121007</a>, 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib51"><label>51</label><mixed-citation>
      
Wagstaff, J. and Bean, B.: remap: Regionalized models with spatially smooth
predictions, R J., 14, 160–178, <a href="https://doi.org/10.32614/RJ-2023-004" target="_blank">https://doi.org/10.32614/RJ-2023-004</a>, 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib52"><label>52</label><mixed-citation>
      
Wang, Z.: GEOXYGEN_Code, Zenodo [code], <a href="https://doi.org/10.5281/zenodo.19852901" target="_blank">https://doi.org/10.5281/zenodo.19852901</a>, 2026.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib53"><label>53</label><mixed-citation>
      
Wang, Z., Xue, C., and Ping, B.: A reconstructing model based on
time–space–depth partitioning for global ocean dissolved oxygen
concentration, Remote Sens., 16, 228, <a href="https://doi.org/10.3390/rs16020228" target="_blank">https://doi.org/10.3390/rs16020228</a>,
2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib54"><label>54</label><mixed-citation>
      
Wang, Z., Fu, W., Xue, C., and Wang, G.: GEOXYGEN: A global monthly gridded dissolved oxygen product (0.5°&thinsp; × &thinsp;0.5°), Zenodo [data set], <a href="https://doi.org/10.5281/zenodo.19703198" target="_blank">https://doi.org/10.5281/zenodo.19703198</a>, 2026a.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib55"><label>55</label><mixed-citation>
      
Wang, Z., Fu, W., Xue, C., and Wang, G.: GEOXYGEN: A global monthly gridded
dissolved oxygen product (0.5°&thinsp; × &thinsp;0.5°) [data
set], <a href="https://doi.org/10.12157/IOCAS.20260223.002" target="_blank">https://doi.org/10.12157/IOCAS.20260223.002</a>, 2026b.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib56"><label>56</label><mixed-citation>
      
Xue, C., Wang, Z., Yue, L., and Niu, C.: A global four-dimensional gridded
dataset of ocean dissolved oxygen concentration retrieval from Argo
profiles, Geosci. Data J., 11, 775–789, <a href="https://doi.org/10.1002/gdj3.251" target="_blank">https://doi.org/10.1002/gdj3.251</a>,
2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib57"><label>57</label><mixed-citation>
      
Yamaguchi, R., Kouketsu, S., Kosugi, N., and Ishii, M.: Global upper ocean
dissolved oxygen budget for constraining the biological carbon pump, Commun.
Earth Environ., 5, <a href="https://doi.org/10.1038/s43247-024-01886-7" target="_blank">https://doi.org/10.1038/s43247-024-01886-7</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib58"><label>58</label><mixed-citation>
      
Zhou, Y., Gong, H., and Zhou, F.: Responses of Horizontally Expanding
Oceanic Oxygen Minimum Zones to Climate Change Based on Observations,
Geophys. Res. Lett., 49, <a href="https://doi.org/10.1029/2022gl097724" target="_blank">https://doi.org/10.1029/2022gl097724</a>, 2022.

    </mixed-citation></ref-html>--></article>
