Reconstruction of &delta;<sup>13</sup>C<sub>DIC</sub> in the Atlantic Ocean: A Probabilistic Machine Learning Approach for Filling Historical Data Gaps

Gao, Hui; Wu, Zelun; Sun, Zhentao; Cai, Diana; Jin, Meibing; Cai, Wei-Jun

doi:10.5194/essd-2025-517

Preprints

https://doi.org/10.5194/essd-2025-517

Preprints

01 Sep 2025

| 01 Sep 2025

Status: a revised version of this preprint is currently under review for the journal ESSD.

Reconstruction of δ¹³C_DIC in the Atlantic Ocean: A Probabilistic Machine Learning Approach for Filling Historical Data Gaps

Hui Gao, Zelun Wu, Zhentao Sun, Diana Cai, Meibing Jin, and Wei-Jun Cai

Abstract. Stable carbon isotope composition of marine dissolved inorganic carbon (DIC), δ¹³C_DIC, is a valuable tracer for oceanic carbon cycling. However, its observational coverage remains much sparser than that of DIC and other physical or biogeochemical variables, limiting its full potential. Here, we reconstruct δ¹³C_DIC in the Atlantic Ocean using a probabilistic machine learning framework, Gaussian Process Regression (GPR). We compiled data from 51 historical cruises, including a high-resolution 2023 A16N section, and applied secondary quality control via crossover analysis, retaining 37 cruises for model training, validation, and testing. The trained GPR model achieved an average bias of −0.007 ± 0.082 ‰ and an overall uncertainty of 0.11 ‰, arising from measurement (0.07 ‰), mapping (0.08 ‰), and negligible input-variable (3.77 × 10⁻¹⁴ ‰) errors. Using the GLODAPv2.2023 Atlantic dataset as predictors, the reconstruction expanded the number of acceptable δ¹³C_DIC samples by a factor of 7.65, from 8,941 to 68,435 across the Atlantic basins. The resulting dataset markedly improves the spatial resolution in longitude, latitude, and depth, and provides enhanced temporal continuity over the past four decades. Compared to the sparse original measurements, the reconstruction reduces spatial discontinuities and reveals finer vertical structures consistent with other high-resolution biogeochemical observations. This reconstructed δ¹³C_DIC dataset provides new opportunities to resolve regional carbon cycle dynamics, validate Earth system models, refine estimates of oceanic carbon uptake, and extend climate reanalysis records. The data are publicly accessible at the data repository Zenodo under the following DOI: https://doi.org/10.5281/zenodo.16907402 (Gao et al., 2025).

Received: 21 Aug 2025 – Discussion started: 01 Sep 2025

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Hui Gao, Zelun Wu, Zhentao Sun, Diana Cai, Meibing Jin, and Wei-Jun Cai

Status: final response (author comments only)

RC1:
'Comment on essd-2025-517', Anonymous Referee #1, 21 Oct 2025

This is an interesting and generally well-written study addressing a worthy topic. The paper has good fundamentals and should be able to made into a solid contribution to the scientific literature. However, I believe it requires iteration, and likely additional analysis, before it will be suitable for publication at this journal.
I have three areas of criticism and one note of caution. The note of caution is just that I’m skeptical of the uinput calculation, see the line by line comments below.
My first criticism is that that validation was not handled as well as it should have been. See line by line comments below for an easy-to-implement and necessary improvement for the validation section. Separately, a suggestion that would further reinforce the validity of the method would be to implement the method in a model environment. This is now common practice for validation of machine learning refits of sparse observations, and is likely necessary for a first attempt with carbon isotopes, particularly one with such unusually sparse observations. There are numerous model simulations available that have explicitly simulated carbon isotopes (e.g., https://doi.org/10.5194/gmd-17-1709-2024 though there are many others). It should be workable to obtain one or more such set of outputs, subsample the distribution(s) across both time and space, apply random and cruise-wide systematic perturbations to the extracted output to represent measurement uncertainties, fit a ML model to the output, reconstruct the full distribution, and then evaluate the strengths and weaknesses of the full 4D reconstruction. This reveals critical information that is not provided by a reconstruction of a sparse data product with uneven and imperfect measurements of an unknown true distribution.
The second criticism is that the paper is not very well motivated at present. The authors state repeatedly that the upsampled distribution can be used for many new analyses, but the new product still has almost all of the limitations that the previous product… it is still sparse and uneven in space in time, just less so, and it now has the added complications from layers of machine learning smoothing. While I admit that the new data product is smoother spatially and less biased temporally, I don’t see that the authors have fully solved any problem with their current presentation. To that point, the authors mostly suggest ways that this might now be used, but do not go so far as to demonstrate any such analysis that would be quantitatively improved with the new product. I would like to see either more concrete examples of new analyses shown (not just listed), or, as such an example, a reorientation of the work toward estimating the full Atlantic distribution of the isotopes across space and time. For a spatially complete record they might apply the ML model to the GLODAPv2 gridded product. For a spatially and temporally complete product they might consider either using a time varying TS product and/or GOBAI-O2 (with estimates of the other predictors from other such ML refits in literature as necessary). In both cases, there would be some meaningful errors in the predictors, but, at least currently, the authors are suggesting that their estimates are completely insensitive to any plausible error in the predictors, so that may or may not be a concern (I suspect it will be after the uinput is re-evaluated).
Finally, the presentation of the dataset is a bit confusing (I only checked the .mat, but I'm assuming this applies to all files at Zenodo). The file contains essentially all of the fields from GLODAPv2 with their adjusted DI13C, which is called adjusted_C13, capitalizing "C" contrary to the GLODAP convention. If the goal is to make the file supplemental to and interoperable with GLODAPv2, then it would be better to release a file that has the full >10^6 rows, but only contains c13 data and has -999 except for the appropriate Atlantic subset. This way, someone could load GLODAPv2 and then load this file and have them both available and ready to access in identical formats. They could also easily sub in data from, for example, other other basins where this data product is missing observations but the GLODAPv2 product has them. This will also remind users to cite both products, rather than just grabbing all of the data from this new product and incorrectly attributing, for example, aou and cfcs to a data product that is only updating C13 and repackaging everything else. Finally, I think the Zenodo link would benefit from more descriptive text or a readme explaining what subset of data is presented, which fields are the new fields, how they are labeled, and how to make the data interoperable with, for example, measurements of DI13C in other ocean basins.

A minor criticism is that the paper is repetitive in places, repeatedly restating key claims throughout the manuscript.

To reiterate, I generally feel this paper can become a worthwhile contribution and should not be rejected unless these elements cannot be addressed. The text above is focused on constructive criticism, but the fundamentals of the paper remain strong.

Line by line comments:

42: lacked

94: this assertion needs further quantification in the North Atlantic, where there are routinely measurable decadal increases in Canth

97: along A61N, no “the” is needed

123: which standard depths?

125: how are adjustments proposed precisely?

125: how are adjustments validated precisely?

133: please explain this metric. How is consistency at 10^-5 level when the measurement uncertainty is orders of magnitude larger?

150: typically in oceanography, the k fold cross validation is separated by cruise rather than by randomly selecting measurements. This is because cruises are synoptic records of the state of the ocean, and having many other measurements at similar times and locations and measured by the same instruments and the same operators, as are provided by other measurements along a cruise, provides an overly-rosy set of validation statistics. It is therefore important to only use other cruises to construct the validation models for measurements along any given cruise. This validation exercise needs to be redone to follow this practice, or re-written to better convey that this practice was already adopted (if it was).

215: following this procedure, I would expect the uinpts to be larger than it was found to be. To be clear, I’m not surprised that it is small, but I am surprised that it is more than 10 orders of magnitude smaller than other sources of error. Surely a temperature input error of 20,000,000 degrees C would be expected to yield a bad estimate, yet this does not currently appear to be the case by that estimate of uinput. Does that suggest that the model is mostly a fit to the coordinate predictors that are assumed to have no uncertainty? If so, would it make sense to include some uncertainty in these predictors, given that CTD rosettes are not always directly below the ship and the ships don’t always stay exactly on station for a profile? Please also check that the uncertainty reported in the abstract isn’t the MBE of the Monte Carlo analysis. If unchanged, please explain this counter intuitive finding.

234: repeating comments from line 150

245: what is normalized sample density?

375: This is hinting at an application, but is not itself an application. We’ve only learned about KDEs here, and not about the ocean.

Figure 8b: the darkness of the borders on the mean values make this plot hard to parse. Consider lightening the width of those black lines, somewhat.

8c: consider changing axis limits from 0 to 3, even if this cuts off a miniscule portion of the sample distribution

395: couldn’t you now further parse this information by holding every predictor except xCO2 constant and varying that to estimate the change in the delta that would be expected had all physical and biogeochemical processes been held constant for a decade?

448: This is a seriously dense sentence. Please break it into two or more sentences and revise them both to employ plain language (limiting jargon and buzzwords) wherever possible.

451: I don’t think a good predictor of local flux is going to lead to a good prediction of local inventory. Consider deleting this sentence.

Citation: https://doi.org/10.5194/essd-2025-517-RC1
- AC3:
  'Reply on RC1', Hui Gao, 21 Dec 2025
  
  The comment was uploaded in the form of a supplement: https://essd.copernicus.org/preprints/essd-2025-517/essd-2025-517-AC3-supplement.pdf
  
  Citation: https://doi.org/10.5194/essd-2025-517-AC3
  - AC4: 'Reply on AC3', Hui Gao, 21 Dec 2025
    
    Note on a minor correction and supplementary explanation to our response regarding Comment 395
    
    We wish to clarify a minor typographical error in our previous response and supplement an explanation for the smaller δ¹³C_DIC shifts in older water masses.
    
    First, the correction: In the section describing Figure R2’s consistency with main manuscript results, we incorrectly referenced “main manuscript Figures 8h and 8d” – the correct references are main manuscript Figures 9h and 9d (reconstructed 2023 δ¹³C_DIC and observed 2023 δ¹³C_DIC, respectively). This typo does not alter the core conclusion of our response.
    
    Second, we supplement the explanation for smaller δ¹³C_DIC shifts in older water masses: These minor variations are mostly within the uncertainties, though they may also reflect hydrological changes over the decade.
    
    We apologize for the earlier typo and kindly ask the reviewers to refer to this attached response as the definitive version.
    
    Citation: https://doi.org/10.5194/essd-2025-517-AC4
RC2:
'Comment on essd-2025-517', Patrick Rafter, 22 Oct 2025

The manuscript “Reconstruction of d13CDIC in the Atlantic Ocean…” as reviewed by Patrick Rafter
First, I’d like to thank the other (anonymous) reviewer for their careful and useful review of this manuscript. If I were the author of this manuscript, I would greatly appreciate the many meaningful and well-informed comments. I don’t fully agree with all their suggestions, but it is undeniably a high-quality review.
For example, I think—for the most part—this study needs less additional work than the other reviewer. The suggestion to implement the ML method in a model environment would be a very interesting and valuable addition to this work, but I predict the authors’ response will be “outside the scope of the current study”. It sounds to me like a huge amount of new work, but I may be incorrect in this (or it may just be a huge amount of work for *me* and not someone else (it almost surely is)). Note that I do not have the experience in this space to comment on whether this model environment application is “now common practice”, but I will say that this would have been a novel (to me), interesting, and seemingly robust application of the methods developed here. But I would like to note that if this manuscript / dataset were to follow the reviewer’s advice, it would boost my score for the “significance” and “data quality” categories into and above the ‘Excellent’ category. As of now, I have scored these as ‘good’.
I also think the motivation is appropriate for this specific study and that the decadal trends in the Kernel Density Estimates (see Fig. 8) are an interesting outcome from this study (as it exists now).
Where I agree with the anonymous reviewer is that I think the new “reconstructed” dataset could be (I think): (1) expanded spatially using the GLODAP gridded product and (2) that this would be a very useful addition to our community. I am assuming these are “minor revisions” as the ML model is already built and I assume the application to the gridded product will be straightforward (and worth the time for the community to use!). I would also urge the authors to consider the other options listed by the anonymous reviewer to expand the ML methods temporally, although I am unfamiliar with the reviewer’s specific suggestions and cannot comment on the time requirements for such new applications.
Likewise, the other reviewer makes strong comments about the dataset itself. I agree that adding the reconstructed dataset as its own column (with -999 for other basins) to the existing GLODAP data would be very useful for the community. Even better would be for the community to have a gridded product!
Below I have listed notes I made on the manuscript as I read through it.

Line by line notes

27: need to define delta notation

79+: I don’t see a need to shorten “Section” here

100: I like the previous paragraph

132: what exactly does “exhibit high internal consistency” mean? Are there statistics to support this statement?

139: Is GPR an acronym? Perhaps not relevant, but I wanted to know

161: Repeated text

Fig. 2: I like the figure, but as the other reviewer noted, it would be better to use completely independent cruise datasets for the validation as well as the “independent” tests

192: I wonder if other Earth scientists would be as surprised to learn of Mean Absolute Error and Mean Bias Error. I think they might and it might therefore be useful to use a sentence or two describing why these additional metrics are useful to the study

202: Propagated error?

212: perturbed not perturbs

230: I’m unsure where the 10-fold cross-validation comes from

249: This text is also somewhat a repetition of earlier text

259: larger?

272: Incredibly / unbelievably low input variable uncertainty (Uinputs). I wonder if this is a propagation of the input variable uncertainties or an error has been made along the way.

295: Maybe this is not important, but lower case “n” is typically used to describe the sample size

302: Is it expected that there would be a model smoothing tendency?

397: Is there an expectation that the model output would closely align with the observed data? Wasn’t the 2023 data used to predict the “reconstructed data”? I’m not diminishing the work—I honestly think this is an expected outcome of using machine learning.

485: quality-controlled (?)

Citation: https://doi.org/10.5194/essd-2025-517-RC2
- AC1: 'Reply on RC2', Hui Gao, 21 Dec 2025
  
  The comment was uploaded in the form of a supplement: https://essd.copernicus.org/preprints/essd-2025-517/essd-2025-517-AC1-supplement.pdf
  
  Citation: https://doi.org/10.5194/essd-2025-517-AC1
RC3:
'Comment on essd-2025-517', Bin Lu, 15 Nov 2025

General comments:

This manuscript presents a valuable contribution by not only reconstructing δ¹³C_DIC fields in the Atlantic Ocean but also by compiling and providing a high-quality observational dataset that can serve as a fundamental resource for future studies. The authors conduct thorough quality control (QC) and crossover adjustments, and the methodology is overall well organized. The work demonstrates the potential of probabilistic machine learning—particularly Gaussian Process Regression (GPR)—for filling historical data gaps and quantifying uncertainty. These strengths make the dataset a meaningful addition to the ESSD data collection.
However, as a reviewer from an AI background, I am particularly sensitive to the modeling and evaluation methodology. I find that some aspects of the test strategy and dataset partitioning could lead to overestimated performance metrics. In addition, although the study substantially expands the number of δ¹³C_DIC samples (by 7.65 times compared with previous compilations), the reconstructed data remain spatially discontinuous and unevenly distributed. With appropriate revisions to clarify model evaluation, ensure test dataset independence, and temper claims about spatial continuity, the paper will be suitable for publication.
Specific comments:
L140–143: Please elaborate more clearly on the specific advantages of the Gaussian Process Regression (GPR) model in this context. In addition, it would strengthen the methodology section if you could provide a quantitative or qualitative comparison with other commonly used machine learning methods such as XGBoost and Random Forest, which are often applied to similar regression problems.
L219–221: The statement that “u_map may alternatively be estimated as the RMSE between reconstructed and observed δ¹³C_DIC on the training dataset” raises some concern. Because the model is already optimized to fit the training data, such an estimate cannot reliably represent its true mapping uncertainty or generalization ability across the Atlantic. Using the training RMSE in this way likely underestimates u_map, leading to an unrealistically low total uncertainty.
L234–236 (Sect. 3.1) and L249–252 (Sect. 3.2): The two test cruises selected (33MW19930704 and 33RO20050111) have been repeatedly sampled across multiple years, and some data from these cruise lines or neighboring years may have been used in training or validation. Moreover, many of the 2023 observations were collected along the same A16 section, meaning the model likely already learned features specific to this transect. Using these data as the test set could therefore inflate the evaluation metrics. Please clarify how you ensured true independence between training and test datasets.
L286–287 and Fig. 5(a): The paper states that 5,997 samples were used for evaluation, which represent the intersection between reconstructed and observed values from 8,941 acceptable δ¹³C_DIC samples in GLODAPv2.2023. I am concerned about potential overlap between these evaluation samples from GLODAP and the training data. Please clarify how independence was maintained and whether any duplicate or overlapping data points were excluded.
L288–289: When calculating the coefficient of determination (R²) between observed and reconstructed values, you might consider using anomaly-based R² (i.e., R² computed from anomalies relative to a local mean or climatology) rather than raw values. This approach could reduce the influence of large-scale offsets and provide a more realistic assessment of the model’s ability to reproduce spatial–temporal variations.

Citation: https://doi.org/10.5194/essd-2025-517-RC3
- AC2: 'Reply on RC3', Hui Gao, 21 Dec 2025
  
  The comment was uploaded in the form of a supplement: https://essd.copernicus.org/preprints/essd-2025-517/essd-2025-517-AC2-supplement.pdf
  
  Citation: https://doi.org/10.5194/essd-2025-517-AC2

Hui Gao, Zelun Wu, Zhentao Sun, Diana Cai, Meibing Jin, and Wei-Jun Cai

Data sets

Reconstruction of δ13CDIC in the Atlantic Ocean: A Probabilistic Machine Learning Approach for Filling Historical Data Gaps Hui Gao, Zelun Wu, Zhentao Sun, Diana Cai, Meibing Jin, Wei-Jun Cai https://doi.org/10.5281/zenodo.16907402

Hui Gao, Zelun Wu, Zhentao Sun, Diana Cai, Meibing Jin, and Wei-Jun Cai

Viewed

Total article views: 1,362 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
1,220	103	39	1,362	39	50

HTML: 1,220
PDF: 103
XML: 39
Total: 1,362
BibTeX: 39
EndNote: 50

Views and downloads (calculated since 01 Sep 2025)

Month	HTML	PDF	XML	Total
Sep 2025	936	16	7	959
Oct 2025	137	26	12	175
Nov 2025	75	22	8	105
Dec 2025	72	39	12	123

Cumulative views and downloads (calculated since 01 Sep 2025)

Month	HTML	PDF	XML	Total
Sep 2025	936	16	7	959
Oct 2025	137	26	12	175
Nov 2025	75	22	8	105
Dec 2025	72	39	12	123

Viewed (geographical distribution)

Total article views: 1,355 (including HTML, PDF, and XML) Thereof 1,355 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 27 Dec 2025

Short summary

Observations of stable carbon isotopes in dissolved inorganic carbon are sparse, limiting their potential in carbon cycle studies. We compiled 51 cruises and used a machine learning method trained on 37 cruises that passed secondary quality control to reconstruct isotope values in the Atlantic. The reconstruction expands usable samples from 8,941 to 68,435, reducing noise, filling gaps, preserving decadal trend, and strengthening studies of carbon variability and model validation.


Total:	0
HTML:	0
PDF:	0
XML:	0