the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Reconstructing Global Monthly Ocean Dissolved Oxygen (1960–2023) to Nearly 6000 m Depth Using Bayesian Ensemble Machine Learning
Abstract. Oceanic oxygen levels, crucial for marine ecosystems and biogeochemical cycles, have declined significantly over the past few decades, driven by climate change and posing severe environmental risks. However, historical dissolved oxygen (DO) measurements, especially below 2000 m, remain sparse, limiting comprehensive annual and seasonal analyses. Here we introduce the BEM-DOR framework, a Bayesian-optimized ensemble of six machine-learning models (Random Forest, XGBoost, LightGBM, CatBoost, Extremely Randomized Trees and Histogram-based Gradient Boosting) fused via dynamic weighting, to reconstruct global monthly DO distributions at 1°×1° resolution from the surface to 5902 m depth over 1960–2023. Validation against an independent dataset demonstrates that BEM-DOR outperforms existing products. Our dataset captures depth-dependent deoxygenation, with the most pronounced decline occurring between 150 and 200 m, and reveals dramatically accelerated oxygen loss in the Arctic Ocean and North Atlantic over the past decade. We quantify uncertainties from measurement errors, gridding processes, and model algorithms, providing the first long-term, high-resolution, uncertainty-quantified DO product from ocean surface to nearly 6000 m depth. The extension of DO data into the bathypelagic zone in this work is a significant contribution to deep ocean oxygen dynamics and global biogeochemical cycles.
- Preprint
(1504 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 29 Jul 2025)
-
AC1: 'Comment on essd-2025-273', Mingyu Han, 24 Jun 2025
reply
Response to editor’s comment
We appreciate the editor for the constructive feedback and the opportunity to clarify how our study differs from, and advances beyond, the earlier Indian Ocean dissolved-oxygen reconstruction (Huang et al., 2023).Table R1 Comparison between Huang et al. (2023) and this study
Huang et al. (2023)
This study
Region
Indian Ocean only
Global ocean (all basins)
Temporal coverage
1980–2019
1960–2023
Spatial resolution
0.5° × 0.5°
1° × 1°
Vertical range
Surface to 5,395m
Surface to 5,902 m
Vertical resolution
50 depth levels
75 depth levels
Methodology
Extremely Randomized Trees (ERT)
Bayesian-optimized ensemble of six tree-based learners (RF, XGBoost, LightGBM, CatBoost, ERT, Histogram-based GB)
Validation
Train/validation split
8-fold temporal cross-validation + comparison with independent products
Huang et al. (2023) built a four-dimensional monthly data set for 1980-2019 by training machine-learning algorithms with in-situ oxygen profiles, ocean reanalysis fields, and spatiotemporal variables. The authors trained several machine-learning algorithms and ultimately selected the Extremely Randomized Trees (ERT) model as the best performer. The authors reported basin-wide deoxygenation of -141.5±15.1 Tmol per decade, with a partial slowdown after 2000 and pronounced oxygen minimum zone (OMZ) expansion in the Arabian Sea, Bay of Bengal, and Equatorial Indian Ocean.
Our work is not a repetition of the previous research. It represents a significant advancement in both spatial domain and temporal coverage, as well as in methodology (Tabel R1). Below, we summarize the key innovations and contributions of our global dissolved oxygen reconstruction.
- Broader research scope and extended coverage.
Our study reconstructs monthly global dissolved oxygen concentrations from 1960 to 2023 on a 1°x1° grid, from the surface to nearly 6000 m depth. In contrast to earlier regionally focused studies, we provide a truly global product including all ocean basins. To address this wide range of oceanographic conditions, we developed a dynamic weighting framework that adapts to regional variability and helps fill critical data gaps in under-sampled areas.
By providing a global product, our study enables evaluation of Earth System Model simulations of marine biogeochemistry (Bopp et al., 2013), tracking of OMZ expansion (Schmidtko et al., 2017), and to inform spatial conservation planning in regions prone to hypoxia (Breitburg et al., 2018). The full data product has been made openly available, to maximize its value to the broader research community.
- Innovative and rigorous methodology.
While the previous Indian Ocean study applied the Extremely Randomized Trees (ERT) method, we introduce the Bayesian Ensemble Machine-learning Dissolved Oxygen Reconstruction (BEM-DOR) framework, a rigorously optimized multi-model ensemble that integrates Random Forest (Breiman, 2001), XGBoost (Chen and Guestrin, 2016), LightGBM (Ke et al., 2017), CatBoost (Prokhorenkova et al., 2018), ERT (Geurts et al., 2006) and Histogram-based Gradient Boosting (Guryanov, 2019; Friedman, 2001). Each learner is calibrated with Bayesian hyper-parameter optimization, and their predictions are ensembled through a data-adaptive soft-weighting scheme that reflects both global cross-validation skill and local prediction error. This framework uniquely leverages the complementary strengths of individual algorithms to achieve high-fidelity dissolved oxygen reconstructions across both open-ocean regimes and complex coastal environments. Thus, it overcomes key limitations of single-model approaches that struggle with such heterogeneous conditions.
We demonstrate the performance of our model through eight-fold temporal cross-validation with withheld observation profiles and by evaluating against independent products, including GOBAI-O2 (Sharp et al., 2022) and the Ito et al. (2024) reconstruction. The adaptive weighting reduces systematic biases that arise when a single algorithm is applied globally, providing a more reliable and representative dissolved oxygen concentrations. This approach yields higher accuracy and robustness across diverse ocean environments, a dimension not explored in the earlier regional work.
Reference
Bopp L, Resplandy L, Orr J C, et al. Multiple stressors of ocean ecosystems in the 21st century: projections with CMIP5 models. Biogeosciences, 2013, 10(10): 6225-6245.
Breiman L. Random forests. Machine learning, 2001, 45: 5-32.
Breitburg D, Levin L A, Oschlies A, et al. Declining oxygen in the global ocean and coastal waters. Science, 2018, 359(6371): eaam7240.
Chen T, Guestrin C. Xgboost: A scalable tree boosting system. Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. 2016: 785-794.
Friedman J H. Greedy function approximation: a gradient boosting machine. Annals of statistics, 2001: 1189-1232.
Geurts P, Ernst D, Wehenkel L. Extremely randomized trees. Machine learning, 2006, 63: 3-42.
Guryanov A. Histogram-based algorithm for building gradient boosting ensembles of piecewise linear decision trees. Analysis of Images, Social Networks and Texts: 8th International Conference, AIST 2019, Kazan, Russia, July 17–19, 2019, Revised Selected Papers 8. Springer International Publishing, 2019: 39-50.
Huang S, Shao J, Chen Y, et al. Reconstruction of dissolved oxygen in the Indian Ocean from 1980 to 2019 based on machine learning techniques. Frontiers in Marine Science, 2023, 10: 1291232.
Ito T, Cervania A, Cross K, et al. Mapping dissolved oxygen concentrations by combining shipboard and Argo observations using machine learning algorithms. Journal of Geophysical Research: Machine Learning and Computation, 2024, 1(3): e2024JH000272.
Ke G, Meng Q, Finley T, et al. Lightgbm: A highly efficient gradient boosting decision tree. Advances in neural information processing systems, 2017, 30.
Prokhorenkova L, Gusev G, Vorobev A, et al. CatBoost: unbiased boosting with categorical features. Advances in neural information processing systems, 2018, 31.
Schmidtko S, Stramma L, Visbeck M. Decline in global oceanic oxygen content during the past five decades. Nature, 2017, 542(7641): 335-339.
Sharp J D, Fassbender A J, Carter B R, et al. GOBAI-O 2: temporally and spatially resolved fields of ocean interior dissolved oxygen over nearly two decades. Earth System Science Data Discussions, 2022, 2022: 1-46.
Citation: https://doi.org/10.5194/essd-2025-273-AC1
Data sets
Global Monthly Dissolved Oxygen Reconstruction via Bayesian Ensemble Machine Learning Mingyu Han and Yuntao Zhou https://doi.org/10.5281/zenodo.15361818
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
137 | 23 | 9 | 169 | 8 | 8 |
- HTML: 137
- PDF: 23
- XML: 9
- Total: 169
- BibTeX: 8
- EndNote: 8
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1