GEOXYGEN: a global long-term dissolved oxygen dataset based on biogeochemistry-aware machine learning framework and multi-source observations
Abstract. Dissolved oxygen (DO) serves as an essential indicator of marine ecosystem health. However, sparse and uneven observations have limited our ability to characterize its full spatiotemporal variability, underscoring the continued need for long-term, high-resolution, and physically consistent global DO datasets. Here, we present GEOXYGEN, a global dataset of monthly DO fields at 0.5° × 0.5° resolution spanning 1960–2024 and depths from the surface to 5500 m (Wang et al., 2025, https://doi.org/10.5281/zenodo.17615657). GEOXYGEN is generated with a hierarchical modeling framework that accounts for regional and vertical heterogeneity. By integrating physical and biogeochemical predictors with an adaptive feature-selection strategy, GEOXYGEN achieves high predictive accuracy across all depth layers on an independent out-of-time test (R² > 0.92). The reconstructed spatial patterns align closely with the World Ocean Atlas 2023 climatology, and in subsurface and deep waters, GEOXYGEN demonstrates superior generalization relative to existing data-driven products. A sensitivity analysis further reveals that including coastal data in model training increases basin-wide uncertainty by approximately 7.5 %, underscoring that current observing systems remain insufficient to reliably resolve nearshore DO dynamics. GEOXYGEN provides a consistent, physically informed baseline for analyzing global and regional variability of DO. It also offers a valuable benchmark for evaluating and improving the representation of DO in climate and Earth system models and can support future studies on long-term deoxygenation trends and regional hotspots.