the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
GEOXYGEN: a global long-term dissolved oxygen dataset based on biogeochemistry-aware machine learning framework and multi-source observations
Abstract. Dissolved oxygen (DO) serves as an essential indicator of marine ecosystem health. However, sparse and uneven observations have limited our ability to characterize its full spatiotemporal variability, underscoring the continued need for long-term, high-resolution, and physically consistent global DO datasets. Here, we present GEOXYGEN, a global dataset of monthly DO fields at 0.5° × 0.5° resolution spanning 1960–2024 and depths from the surface to 5500 m (Wang et al., 2025, https://doi.org/10.5281/zenodo.17615657). GEOXYGEN is generated with a hierarchical modeling framework that accounts for regional and vertical heterogeneity. By integrating physical and biogeochemical predictors with an adaptive feature-selection strategy, GEOXYGEN achieves high predictive accuracy across all depth layers on an independent out-of-time test (R² > 0.92). The reconstructed spatial patterns align closely with the World Ocean Atlas 2023 climatology, and in subsurface and deep waters, GEOXYGEN demonstrates superior generalization relative to existing data-driven products. A sensitivity analysis further reveals that including coastal data in model training increases basin-wide uncertainty by approximately 7.5 %, underscoring that current observing systems remain insufficient to reliably resolve nearshore DO dynamics. GEOXYGEN provides a consistent, physically informed baseline for analyzing global and regional variability of DO. It also offers a valuable benchmark for evaluating and improving the representation of DO in climate and Earth system models and can support future studies on long-term deoxygenation trends and regional hotspots.
- Preprint
(2159 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 28 Jan 2026)
- RC1: 'Comment on essd-2025-699', Anonymous Referee #1, 06 Jan 2026 reply
-
RC2: 'Comment on essd-2025-699', Anonymous Referee #2, 27 Jan 2026
reply
Review of “GEOXYGEN: a global long-term dissolved oxygen dataset based on biogeochemistry-aware machine learning framework and multi- source observations” by Wang et al.
The manuscript presents a gridded dissolved oxygen (DO) dataset, GEOXYGEN, characterized by a high spatial resolution of 0.5° and an exceptionally dense vertical discretization of 187 layers from 1960 to 2024. However, several fundamental issues regarding data integrity, methodological rigor, and physical consistency must be addressed. The paper does not provide sufficient evidence that the reconstructed long-term oxygen changes, especially in data-sparse regions such as the Southern Ocean in early years, are not artifacts of the machine-learning methodology and predictor availability. Major revisions are therefore required before the dataset can be considered reliable for climate-scale analyses.
Major comments:
- Observational quality control and cross-source consistency
A central weakness of the manuscript is the limited and insufficiently documented quality control applied to the observational data. The authors combine multiple observational products and implicitly assume that systematic biases among different sources are negligible. This assumption is unlikely to hold. Historical dissolved oxygen measurements are well known to exhibit source-dependent biases related to measurement technique, calibration practice, and processing methodology, even when data are flagged as “good” in the public datasets.
The treatment of Argo oxygen data is particularly concerning. The manuscript does not clearly state whether any sensor bias correction is applied, nor does it assess the potential impact of known Argo oxygen sensor issues on the reconstruction. Given the dominant role of Argo in the modern observing system, this omission undermines confidence in the training target used by the machine-learning model.
- Validation strategy and independence at decadal time scales
The validation strategy relies primarily on holding out a limited number of years for testing. While this approach may be adequate for assessing short-term predictive skill, it is insufficient for evaluating decadal to multi-decadal variability, which is the central motivation for the dataset. Training and validation subsets drawn from adjacent years inevitably share the same observing system, spatial sampling patterns, and predictor relationships, leading to optimistic skill estimates. This is particularly problematic when the dataset is intended for long-term trend analysis. The manuscript does not demonstrate that reconstructed trends in the 1960s–1980s, especially in poorly observed regions, are robust and not dominated by relationships learned from the dense Argo-era data.
- Lack of uncertainty quantification
Another major limitation is the absence of a rigorous uncertainty framework. The manuscript reports standard skill metrics such as RMSE and R², but provides no quantitative estimate of uncertainty at the grid-cell level, nor for regionally integrated quantities such as basin means or OMZ volumes. For a dataset intended to support climate diagnostics and trend analysis, uncertainty estimates are essential. Users need to know how uncertainty varies spatially, temporally, and with depth, and how it grows backward in time as observations become sparse. The lack of uncertainty propagation into derived metrics, such as OMZ volume, severely limits the scientific reliability and usability of the product.
- Time inconsistent predictors and regime consistency
The model uses a large number of predictors, many of which are derived from satellite products or reanalyses that are only available after the 1990s. The manuscript suggests that in earlier decades these predictors are effectively masked, leaving the model to rely primarily on temperature, salinity, and oxygen solubility. This raises a serious concern that the reconstruction may effectively be governed by different rules in different time periods, potentially introducing artificial regime shifts or spurious trends. The manuscript does not demonstrate that reconstructions using the reduced predictor set are consistent with those obtained using the full predictor suite.
- Interpretation of predictors and mechanistic claims
Although the manuscript describes the framework as “biogeochemistry-aware,” many predictors originate from model or reanalysis products that themselves contain biases and uncertainties, and several predictors function primarily as proxies or coordinate variables rather than physical drivers. The discussion of predictor importance risks being interpreted as mechanistic attribution, despite the fact that machine-learning importance metrics do not imply causality.
- Regional partitioning and boundary continuity
The authors acknowledge that a single global model may be inadequate and therefore adopt a regional partitioning strategy. While pragmatic, this approach introduces the risk of discontinuities at region boundaries. The manuscript does not provide sufficient quantitative evidence that boundary fusion fully resolves these issues, particularly for variability and trends. An explicit evaluation of continuity across regional boundaries is needed to demonstrate that the partitioning does not introduce artificial spatial artifacts.
Specific comments:
- Lines 100-105: The manuscript does not clearly describe how duplicates are defined and removed. In practice, the same ship-based profile often appears in multiple archives with small differences in time, position, or depth sampling, and the criteria used to identify such duplicates must be explicitly defined. A vague reference to duplicate removal is not adequate. Furthermore, the proposed fallback strategy, local gridded outlier detection, cannot substitute for a systematic assessment of inter-source biases.
- The decision to exclude regions shallower than 200 m is supported by RMSE-based sensitivity analysis and is reasonable from a modeling perspective. However, this exclusion removes many of the regions where oxygen variability is most societally relevant, including upwelling shelves and seasonally hypoxic coastal systems.
- Section 4.1, in particular, does not deliver substantive new insight into oxygen dynamics, but instead reiterates known associations between oxygen, temperature, and stratification. This section should either be clearly reframed as a diagnostic assessment of model behavior or substantially strengthened with process-based analyses. As written, it overreaches relative to what the method can support.
Citation: https://doi.org/10.5194/essd-2025-699-RC2
Data sets
GEOXYGEN: a global long-term dissolved oxygen dataset (V1.0) Z. Wang et al. https://doi.org/10.5281/zenodo.17615657
Model code and software
GEOXYGEN-code Z. Wang https://github.com/layne1202/GEOXYGEN-code
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 345 | 192 | 20 | 557 | 19 | 18 |
- HTML: 345
- PDF: 192
- XML: 20
- Total: 557
- BibTeX: 19
- EndNote: 18
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
The authors created a 4-D global ocean dissolved oxygen atlas at 0.5˚x0.5˚ resolution by using multiple data sources and machine learning approaches. The ensemble machine learning submodel framework used to derive this data product is interesting, and this new ocean dissolved oxygen product shows slight improvement over previous products. This dataset has the potential to be used by the oceanography community to assess oxygen evolution under a changing climate. My recommendation is minor revision.
General comments:
Specific Comments:
Figure 1a: I was confused when looking at this figure. OSD, CTD and Argo are different approaches for DO concentration data and CCHDO, GLODAP, GEOTRACES and etc. are different sources of DO data. It seems they should not be mixed in this figure. It is unclear whether the authors intend to show changes in sampling approaches over time or changes in data sources.
Sections 2 and 3: It will be helpful to add a flow chart to show the data cleaning and model training pipeline
Lines 108-109: I would be more cautious about this outlier detection approach especially in some dynamic regions like ENTP, where local DO concentration could change a lot within 10-day window.
Figure 3 and related texts: The partitioning of the global ocean into different provinces is central to the machine learning method used in the manuscript, as the submodels are trained. However, this partitioning is not clearly justified. Some questions: 1) the partitioning does not distinguish OMZ from other region. For example, ETSP OMZ is included within the whole South Pacific province, 2) the authors state that the province partitioning is based on Fay and McKinley (2014), equatorial biomes identified in Fay and McKinley (2014) are not included, 3) this global dissolved oxygen product is 4-D, and the province partitioning is same for every year and does not account for time shifts, which is also an important point in Fay and McKinley (2014), and 4) How sensitive is this machine learning approach to the province selection?
Line 218: why longitude is not included as a predictor while latitude is included? Please note that the coordinate information (longitude and latitude) might also need to be transformed like time to represent true geographical distances.
Figure 8: the red dashed lines on the figures is misleading and they should not cross land.