the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Global Monthly Ocean Dissolved Oxygen (1960–2023) Reconstructed to 5,902 m with BLENDR, a Bayesian-Optimized Ensemble Learning Framework
Abstract. Oceanic oxygen levels, crucial for marine ecosystems and biogeochemical cycles, have declined significantly over the past few decades due to climate change, posing severe environmental risks. However, historical dissolved oxygen (DO) measurements, especially below 2,000 m, remain sparse, limiting comprehensive annual and seasonal analyses. Here, we introduce the BLENDR framework (Bayesian-optimized Learning and ENsemble modeling for Data Reconstruction), a Bayesian-optimized ensemble of six machine-learning models (Random Forest, XGBoost, LightGBM, CatBoost, Extremely Randomized Trees and Histogram-based Gradient Boosting) fused via dynamic weighting, to reconstruct global monthly DO distributions at a 1° × 1° resolution from the surface to 5,902 m from 1960 to 2023. Validation against an independent dataset demonstrated that BLENDR achieves lower errors than any individual model. Our dataset captures depth-dependent deoxygenation, with the most pronounced decline occurring between 150 and 200 m, and reveals severely accelerated oxygen loss in the Arctic Ocean and North Atlantic over the past decade. This work provides the first long-term, global monthly DO product from the ocean surface to 5,902 m. The bathypelagic DO data provided in this work is a significant contribution to deep ocean oxygen dynamics and global biogeochemical cycles.
- Preprint
(3047 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on essd-2025-781', Anonymous Referee #1, 10 Mar 2026
-
AC1: 'Reply on RC1', Mingyu Han, 27 Mar 2026
The comment was uploaded in the form of a supplement: https://essd.copernicus.org/preprints/essd-2025-781/essd-2025-781-AC1-supplement.pdf
-
AC1: 'Reply on RC1', Mingyu Han, 27 Mar 2026
-
RC2: 'Comment on essd-2025-781', Anonymous Referee #2, 30 Mar 2026
Major Comments
- Novelty and Data Sparsity in the Deep Ocean: The authors claim that while previous studies have focused on specific regions, temporal/spatial resolutions, or time spans, it remains challenging to simultaneously address all aspects. However, the core methodological approach appears highly derivative of recent works (e.g., Ito et al., 2024), with the primary novelty being the extension down to 5,902 m (To my knowledge, Ito et al., 2024 is a monthly product that spans a similar time period than what the paper is proposing). The authors explicitly state that historical DO measurements below 2,000 m remain sparse. Given this sparsity, extending the reconstruction to ~6,000 m requires rigorous justification. How are the authors confident that the reconstruction beyond 2000m is good ? Given that there are only few training data used to calibrate the model bellow 2000m. The manuscript must include a quantitative analysis (e.g., density plots or a table) of the number of available profiles per region below 2,000 m to prove that the machine learning algorithms are actually learning from sufficient data rather than blindly extrapolating based on upper-ocean trends.
- Choice of Reanalysis Data: The reconstruction relies on temperature, salinity, and velocity fields from the Ocean Reanalysis System 5 (ORAS5). The authors need to justify the selection of ORAS5 over other standard products like EN4 (used by Ito et al. 2024). Furthermore, because reanalysis products also suffer from high uncertainty in the deep ocean, the authors should discuss how the inherent uncertainties in ORAS5 deep-ocean variables propagate into the BLENDR DO reconstruction.
- Validation Robustness and Spatial Autocorrelation: To construct an independent validation subset from GLODAPv2, the authors applied a spatiotemporal filter to remove overlapping profiles within ±1° in latitude/longitude and the same calendar month. While this removes exact duplicates, it is an insufficient safeguard against data leakage. Oceanographic variables exhibit strong spatial autocorrelation; a ±1° radius is too narrow to ensure true independence. A more rigorous approach (e.g., removing profiles based on correlation scores or employing spatial block cross-validation) should be implemented.
- Lack of Independent Baseline Comparisons: Validating solely against isolated GLODAPv2 profiles is inadequate for a global reconstruction product. To demonstrate true efficacy, the reconstructed fields climatologies, seasonality as well as deoxigination patterns and rates must be compared against established gridded climatologies (e.g., World Ocean Atlas, Ito et al.).
- Benchmarking the BLENDR Framework: The authors justify their use of tree-based ensembles over neural networks by stating that preliminary NN trials did not yield consistent skill increases and were harder to tune. This is quite broad, particularly since lots of works (e.g. from Ito and Sharp, also from Ouala et al. 2026) used successfully neural networks for the exact same problem. The community can benefit from different methodological developments such as the ones used here, but they need clear justification. For instance, the reconstructed maps based on the BLENDER framework should be compared to the ones from a neural network model (with a similar architecture to the ones of the cited papers).
I hope that the authors will understand my comments in a constructive way, and that I value their work and the time they invested in the preparation of the manuscript. It might be that I have misunderstood something, in this case, if something wasn't clear for me as a reviewer, it is possible that it wouldn’t be clear also for the readers.
Citation: https://doi.org/10.5194/essd-2025-781-RC2 -
AC2: 'Reply on RC2', Mingyu Han, 31 Mar 2026
The comment was uploaded in the form of a supplement: https://essd.copernicus.org/preprints/essd-2025-781/essd-2025-781-AC2-supplement.pdf
- Novelty and Data Sparsity in the Deep Ocean: The authors claim that while previous studies have focused on specific regions, temporal/spatial resolutions, or time spans, it remains challenging to simultaneously address all aspects. However, the core methodological approach appears highly derivative of recent works (e.g., Ito et al., 2024), with the primary novelty being the extension down to 5,902 m (To my knowledge, Ito et al., 2024 is a monthly product that spans a similar time period than what the paper is proposing). The authors explicitly state that historical DO measurements below 2,000 m remain sparse. Given this sparsity, extending the reconstruction to ~6,000 m requires rigorous justification. How are the authors confident that the reconstruction beyond 2000m is good ? Given that there are only few training data used to calibrate the model bellow 2000m. The manuscript must include a quantitative analysis (e.g., density plots or a table) of the number of available profiles per region below 2,000 m to prove that the machine learning algorithms are actually learning from sufficient data rather than blindly extrapolating based on upper-ocean trends.
Data sets
Global Monthly Dissolved Oxygen Reconstruction via Bayesian Ensemble Machine Learning Mingyu Han and Yuntao Zhou https://doi.org/10.5281/zenodo.17548659
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 399 | 192 | 34 | 625 | 30 | 75 |
- HTML: 399
- PDF: 192
- XML: 34
- Total: 625
- BibTeX: 30
- EndNote: 75
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
General Comments
In this manuscript, Han et al. present a global, monthly dissolved ocean oxygen (DO) dataset from 1960 to 2023, extending down to 5,902 m depth, using a Bayesian-optimized ensemble learning framework named BLENDR. Reconstructing DO in the bathypelagic zone is an ambitious and valuable contribution to the oceanographic community.
However, I have several major concerns regarding the methodology and the validation of the results. The mathematical formulation of the dynamic weighting strategy contains physical inconsistencies, and the manuscript lacks essential spatial representativeness proofs for the validation set. Furthermore, for a dataset claiming to reconstruct deep-ocean oxygen, it lacks a comparison with existing deep-ocean products. Therefore, major revisions are required to address these methodological flaws and to fully validate the reliability of the reconstructed fields before this manuscript can be considered for publication.
Main Issues
1. Lack of Intercomparison with Existing Data Products
The manuscript currently lacks a robust comparison with other widely used ocean DO data products. To establish the reliability of this new product, it is essential to contextualize its performance against existing datasets. I strongly recommend adding a comprehensive data intercomparison section to validate the BLENDR outputs. The authors should refer to and compare their results within the context of recent multi-product coordinated intercomparisons, such as the one presented by Ito et al. (2025). This will significantly enhance the credibility of your product.
Reference: Ito T, Garcia H E, Wang Z, et al. Assessing the observational uncertainties of dissolved oxygen climatology and seasonal cycle through a coordinated intercomparison project[J]. Global Biogeochemical Cycles, 2025, 39(11): e2025GB008751.
2. Spatial-Temporal Representativeness of the Validation Set
The authors state that for each profile in GLODAPv2, they searched the CTD and OSD records for matches within ±1° and the same month, excluding those that matched. This filtered the dataset down to 8,020 profiles. However, the manuscript lacks a spatial-temporal distribution map of this filtered validation set. It is critical to prove the coverage and representativeness of these remaining 8,020 profiles. Without a distribution map showing the cruise tracks or sampling locations, readers cannot determine whether the validation set represents a global oceanic assessment or if it is merely biased toward a few localized, data-rich sub-regions. Please provide maps and temporal histograms of the validation set.
3. Potential Spatial Discontinuity in the Weight Allocation Strategy
While the dynamic weighting strategy is conceptually interesting, its current mathematical formulation may benefit from further justification. The transition mechanism between dynamic and static weights could potentially lead to spatial discontinuities. For instance, suppose grid cell A contains an observation, and the adjacent grid cell B does not. In cell A, the dynamic weight might heavily favor a specific model that perfectly fits the local observation; however, in cell B, the weight instantaneously reverts to the global average static weights (w_i) of the 6 models. This abrupt transition ("hard switch") between observed and unobserved regions might produce artificial gradients or step-changes at the boundaries, which may not fully align with the continuous nature of oceanographic variables. I recommend the authors discuss this potential limitation to ensure physical continuity.
4. Insufficient Assessment of Deep-Ocean Accuracy
A major selling point of this dataset is its extension to 5,902 m depth. However, the manuscript lacks a rigorous, depth-specific accuracy assessment for the deep ocean. Validating the entire water column collectively obscures potential biases in the bathypelagic zone. To substantiate the claims regarding deep-ocean reconstruction, I suggest conducting a direct comparison of your deep-ocean results with the DIVA-based dataset by Roach and Bindoff (2023). This comparison will help verify if your machine-learning ensemble correctly captures the subtle deep-water mass structures compared to variational analysis methods.
Reference: Roach, C. J., & Bindoff, N. L. (2023). Developing a New Oxygen Atlas of the World’s Oceans Using Data Interpolating Variational Analysis. Journal of Atmospheric and Oceanic Technology.
Other comments:
Lines 34-35: The word "biogeochemical" is used multiple times within a single or adjacent sentence. Please rephrase to avoid redundancy and improve the flow of the text.
Line 87: Typographical error. There appears to be an extraneous space after the degree symbol (°). Please remove the space for proper formatting.
Lines 125-126: Typo in the dataset name. The text reads "...use this filtered GLODA v2...". Please correct this misspelling to "GLODAPv2".
Line 306 (Equation 11): The equation for is written as . Without parentheses, this mathematically implies that is subtracted after the summation of is complete. Please add parentheses to ensure mathematical correctness: .
Lines 456-457: The sentence states, "However, the expansion rates increased again beyond 1,600 m because of expansion in the NP". Using the word "expansion" twice in such close proximity makes the sentence clunky.
Lines 624-625 (Data Availability): While the authors provided URL links for the Argo and WOD databases, the reference link or DOI for the GLODAPv2 dataset is missing. Please add the official link or DOI for GLODAPv2 to maintain consistency and adhere to ESSD's data accessibility standards.