the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Reconstruction of δ13CDIC in the Atlantic Ocean: A Probabilistic Machine Learning Approach for Filling Historical Data Gaps
Abstract. Stable carbon isotope composition of marine dissolved inorganic carbon (DIC), δ13CDIC, is a valuable tracer for oceanic carbon cycling. However, its observational coverage remains much sparser than that of DIC and other physical or biogeochemical variables, limiting its full potential. Here, we reconstruct δ13CDIC in the Atlantic Ocean using a probabilistic machine learning framework, Gaussian Process Regression (GPR). We compiled data from 51 historical cruises, including a high-resolution 2023 A16N section, and applied secondary quality control via crossover analysis, retaining 37 cruises for model training, validation, and testing. The trained GPR model achieved an average bias of −0.007 ± 0.082 ‰ and an overall uncertainty of 0.11 ‰, arising from measurement (0.07 ‰), mapping (0.08 ‰), and negligible input-variable (3.77 × 10−14 ‰) errors. Using the GLODAPv2.2023 Atlantic dataset as predictors, the reconstruction expanded the number of acceptable δ13CDIC samples by a factor of 7.65, from 8,941 to 68,435 across the Atlantic basins. The resulting dataset markedly improves the spatial resolution in longitude, latitude, and depth, and provides enhanced temporal continuity over the past four decades. Compared to the sparse original measurements, the reconstruction reduces spatial discontinuities and reveals finer vertical structures consistent with other high-resolution biogeochemical observations. This reconstructed δ13CDIC dataset provides new opportunities to resolve regional carbon cycle dynamics, validate Earth system models, refine estimates of oceanic carbon uptake, and extend climate reanalysis records. The data are publicly accessible at the data repository Zenodo under the following DOI: https://doi.org/10.5281/zenodo.16907402 (Gao et al., 2025).
- Preprint
(2310 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 26 Nov 2025)
- RC1: 'Comment on essd-2025-517', Anonymous Referee #1, 21 Oct 2025 reply
-
RC2: 'Comment on essd-2025-517', Patrick Rafter, 22 Oct 2025
reply
The manuscript “Reconstruction of d13CDIC in the Atlantic Ocean…” as reviewed by Patrick Rafter
First, I’d like to thank the other (anonymous) reviewer for their careful and useful review of this manuscript. If I were the author of this manuscript, I would greatly appreciate the many meaningful and well-informed comments. I don’t fully agree with all their suggestions, but it is undeniably a high-quality review.
For example, I think—for the most part—this study needs less additional work than the other reviewer. The suggestion to implement the ML method in a model environment would be a very interesting and valuable addition to this work, but I predict the authors’ response will be “outside the scope of the current study”. It sounds to me like a huge amount of new work, but I may be incorrect in this (or it may just be a huge amount of work for *me* and not someone else (it almost surely is)). Note that I do not have the experience in this space to comment on whether this model environment application is “now common practice”, but I will say that this would have been a novel (to me), interesting, and seemingly robust application of the methods developed here. But I would like to note that if this manuscript / dataset were to follow the reviewer’s advice, it would boost my score for the “significance” and “data quality” categories into and above the ‘Excellent’ category. As of now, I have scored these as ‘good’.
I also think the motivation is appropriate for this specific study and that the decadal trends in the Kernel Density Estimates (see Fig. 8) are an interesting outcome from this study (as it exists now).
Where I agree with the anonymous reviewer is that I think the new “reconstructed” dataset could be (I think): (1) expanded spatially using the GLODAP gridded product and (2) that this would be a very useful addition to our community. I am assuming these are “minor revisions” as the ML model is already built and I assume the application to the gridded product will be straightforward (and worth the time for the community to use!). I would also urge the authors to consider the other options listed by the anonymous reviewer to expand the ML methods temporally, although I am unfamiliar with the reviewer’s specific suggestions and cannot comment on the time requirements for such new applications.
Likewise, the other reviewer makes strong comments about the dataset itself. I agree that adding the reconstructed dataset as its own column (with -999 for other basins) to the existing GLODAP data would be very useful for the community. Even better would be for the community to have a gridded product!
Below I have listed notes I made on the manuscript as I read through it.
Line by line notes
27: need to define delta notation
79+: I don’t see a need to shorten “Section” here
100: I like the previous paragraph
132: what exactly does “exhibit high internal consistency” mean? Are there statistics to support this statement?
139: Is GPR an acronym? Perhaps not relevant, but I wanted to know
161: Repeated text
Fig. 2: I like the figure, but as the other reviewer noted, it would be better to use completely independent cruise datasets for the validation as well as the “independent” tests
192: I wonder if other Earth scientists would be as surprised to learn of Mean Absolute Error and Mean Bias Error. I think they might and it might therefore be useful to use a sentence or two describing why these additional metrics are useful to the study
202: Propagated error?
212: perturbed not perturbs
230: I’m unsure where the 10-fold cross-validation comes from
249: This text is also somewhat a repetition of earlier text
259: larger?
272: Incredibly / unbelievably low input variable uncertainty (Uinputs). I wonder if this is a propagation of the input variable uncertainties or an error has been made along the way.
295: Maybe this is not important, but lower case “n” is typically used to describe the sample size
302: Is it expected that there would be a model smoothing tendency?
397: Is there an expectation that the model output would closely align with the observed data? Wasn’t the 2023 data used to predict the “reconstructed data”? I’m not diminishing the work—I honestly think this is an expected outcome of using machine learning.
485: quality-controlled (?)Citation: https://doi.org/10.5194/essd-2025-517-RC2
Data sets
Reconstruction of δ13CDIC in the Atlantic Ocean: A Probabilistic Machine Learning Approach for Filling Historical Data Gaps Hui Gao, Zelun Wu, Zhentao Sun, Diana Cai, Meibing Jin, Wei-Jun Cai https://doi.org/10.5281/zenodo.16907402
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 1,049 | 36 | 16 | 1,101 | 16 | 16 |
- HTML: 1,049
- PDF: 36
- XML: 16
- Total: 1,101
- BibTeX: 16
- EndNote: 16
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
This is an interesting and generally well-written study addressing a worthy topic. The paper has good fundamentals and should be able to made into a solid contribution to the scientific literature. However, I believe it requires iteration, and likely additional analysis, before it will be suitable for publication at this journal.
I have three areas of criticism and one note of caution. The note of caution is just that I’m skeptical of the uinput calculation, see the line by line comments below.
My first criticism is that that validation was not handled as well as it should have been. See line by line comments below for an easy-to-implement and necessary improvement for the validation section. Separately, a suggestion that would further reinforce the validity of the method would be to implement the method in a model environment. This is now common practice for validation of machine learning refits of sparse observations, and is likely necessary for a first attempt with carbon isotopes, particularly one with such unusually sparse observations. There are numerous model simulations available that have explicitly simulated carbon isotopes (e.g., https://doi.org/10.5194/gmd-17-1709-2024 though there are many others). It should be workable to obtain one or more such set of outputs, subsample the distribution(s) across both time and space, apply random and cruise-wide systematic perturbations to the extracted output to represent measurement uncertainties, fit a ML model to the output, reconstruct the full distribution, and then evaluate the strengths and weaknesses of the full 4D reconstruction. This reveals critical information that is not provided by a reconstruction of a sparse data product with uneven and imperfect measurements of an unknown true distribution.
The second criticism is that the paper is not very well motivated at present. The authors state repeatedly that the upsampled distribution can be used for many new analyses, but the new product still has almost all of the limitations that the previous product… it is still sparse and uneven in space in time, just less so, and it now has the added complications from layers of machine learning smoothing. While I admit that the new data product is smoother spatially and less biased temporally, I don’t see that the authors have fully solved any problem with their current presentation. To that point, the authors mostly suggest ways that this might now be used, but do not go so far as to demonstrate any such analysis that would be quantitatively improved with the new product. I would like to see either more concrete examples of new analyses shown (not just listed), or, as such an example, a reorientation of the work toward estimating the full Atlantic distribution of the isotopes across space and time. For a spatially complete record they might apply the ML model to the GLODAPv2 gridded product. For a spatially and temporally complete product they might consider either using a time varying TS product and/or GOBAI-O2 (with estimates of the other predictors from other such ML refits in literature as necessary). In both cases, there would be some meaningful errors in the predictors, but, at least currently, the authors are suggesting that their estimates are completely insensitive to any plausible error in the predictors, so that may or may not be a concern (I suspect it will be after the uinput is re-evaluated).
Finally, the presentation of the dataset is a bit confusing (I only checked the .mat, but I'm assuming this applies to all files at Zenodo). The file contains essentially all of the fields from GLODAPv2 with their adjusted DI13C, which is called adjusted_C13, capitalizing "C" contrary to the GLODAP convention. If the goal is to make the file supplemental to and interoperable with GLODAPv2, then it would be better to release a file that has the full >10^6 rows, but only contains c13 data and has -999 except for the appropriate Atlantic subset. This way, someone could load GLODAPv2 and then load this file and have them both available and ready to access in identical formats. They could also easily sub in data from, for example, other other basins where this data product is missing observations but the GLODAPv2 product has them. This will also remind users to cite both products, rather than just grabbing all of the data from this new product and incorrectly attributing, for example, aou and cfcs to a data product that is only updating C13 and repackaging everything else. Finally, I think the Zenodo link would benefit from more descriptive text or a readme explaining what subset of data is presented, which fields are the new fields, how they are labeled, and how to make the data interoperable with, for example, measurements of DI13C in other ocean basins.
A minor criticism is that the paper is repetitive in places, repeatedly restating key claims throughout the manuscript.
To reiterate, I generally feel this paper can become a worthwhile contribution and should not be rejected unless these elements cannot be addressed. The text above is focused on constructive criticism, but the fundamentals of the paper remain strong.
Line by line comments:
42: lacked
94: this assertion needs further quantification in the North Atlantic, where there are routinely measurable decadal increases in Canth
97: along A61N, no “the” is needed
123: which standard depths?
125: how are adjustments proposed precisely?
125: how are adjustments validated precisely?
133: please explain this metric. How is consistency at 10^-5 level when the measurement uncertainty is orders of magnitude larger?
150: typically in oceanography, the k fold cross validation is separated by cruise rather than by randomly selecting measurements. This is because cruises are synoptic records of the state of the ocean, and having many other measurements at similar times and locations and measured by the same instruments and the same operators, as are provided by other measurements along a cruise, provides an overly-rosy set of validation statistics. It is therefore important to only use other cruises to construct the validation models for measurements along any given cruise. This validation exercise needs to be redone to follow this practice, or re-written to better convey that this practice was already adopted (if it was).
215: following this procedure, I would expect the uinpts to be larger than it was found to be. To be clear, I’m not surprised that it is small, but I am surprised that it is more than 10 orders of magnitude smaller than other sources of error. Surely a temperature input error of 20,000,000 degrees C would be expected to yield a bad estimate, yet this does not currently appear to be the case by that estimate of uinput. Does that suggest that the model is mostly a fit to the coordinate predictors that are assumed to have no uncertainty? If so, would it make sense to include some uncertainty in these predictors, given that CTD rosettes are not always directly below the ship and the ships don’t always stay exactly on station for a profile? Please also check that the uncertainty reported in the abstract isn’t the MBE of the Monte Carlo analysis. If unchanged, please explain this counter intuitive finding.
234: repeating comments from line 150
245: what is normalized sample density?
375: This is hinting at an application, but is not itself an application. We’ve only learned about KDEs here, and not about the ocean.
Figure 8b: the darkness of the borders on the mean values make this plot hard to parse. Consider lightening the width of those black lines, somewhat.
8c: consider changing axis limits from 0 to 3, even if this cuts off a miniscule portion of the sample distribution
395: couldn’t you now further parse this information by holding every predictor except xCO2 constant and varying that to estimate the change in the delta that would be expected had all physical and biogeochemical processes been held constant for a decade?
448: This is a seriously dense sentence. Please break it into two or more sentences and revise them both to employ plain language (limiting jargon and buzzwords) wherever possible.
451: I don’t think a good predictor of local flux is going to lead to a good prediction of local inventory. Consider deleting this sentence.