the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Barium in seawater: Dissolved distribution, relationship to silicon, and barite saturation state determined using machine learning
Adam Subhas
Heather Kim
Ann Dunlea
Laura Whitmore
Alan Shiller
Melissa Gilbert
William Leavitt
Tristan Horner
Abstract. Barium is widely used as a proxy for dissolved nutrients and particulate organic carbon fluxes in seawater. However, these proxy applications are limited by insufficient knowledge of the dissolved distribution of Ba ([Ba]). For example, there is significant spatial variability in the Ba–Si relationship, and ocean chemistry may influence sedimentary Ba preservation. To help address these issues, we developed 4,095 models for predicting [Ba] using Gaussian Progress Regression Machine Learning. These models were trained to predict [Ba] from standard oceanographic observations using GEOTRACES data from the Arctic, Atlantic, Pacific, and Southern Oceans. Trained models were then validated by comparing predictions against withheld [Ba] data from the Indian Ocean. We find that a model using depth, T, S, [O2], [PO4], and [NO3] as predictors can accurately predict [Ba] in the Indian Ocean with a mean absolute percentage deviation of 6.3 %. We use this model to simulate [Ba] on a global basis using these same six predictors in the World Ocean Atlas. The resulting [Ba] distribution constrains the total Ba budget of the ocean to 122±8 × 1012 mol and clarifies the global relationship between dissolved Ba and Si. We also calculate the saturation state of seawater with respect to barite, revealing that the ocean below 1,000 m is, on average, at or near saturation. We describe a number of possible applications for our model output, ranging from use in biogeochemical models to paleoproxy calibration. Our approach could be extended to other trace elements with relatively minor adjustments and demonstrates the utility of machine learning to accurately simulate the distributions of tracers in the sea.
Öykü Mete et al.
Status: open (until 27 Apr 2023)
-
RC1: 'Comment on essd-2023-67', Anonymous Referee #1, 14 Mar 2023
reply
The manuscript “Barium in seawater: Dissolved distribution, relationship to silicon, and barite saturation state determined using machine learning” by Mete et al. developed a Gaussian Progress Regression Machine Learning (ML) approach to predict dissolved Ba ([Ba]) in the ocean. This study is significant for understanding the marine Ba cycle because it provides a global picture of the vertical and spatial distribution of [Ba] and Ωbarite and suggests factors that intimately link to [Ba]. It is exciting that the ML-derived Ba profiles are in excellent agreement with in situ data. The manuscript is well-reasoned and well-written. I enjoy reading it and am happy to recommend it for publication if the following concerns could be addressed. I hope the authors find my comments constructive and help them make the manuscript more impactful.
Sect. 2 and 3.1: The ML model split the observed datasets into two partitions: the data from the Arctic, Atlantic, Pacific, and Southern Oceans were used for model training, whereas the data from the Indian Ocean were reserved for model testing. Yes, as indicated by the authors, the location-based training-testing separation is to minimize overfitting. However, we also need to be careful that the training data happen to perfectly cover the minimum and maximum [Ba] (according to Figs. 4A-7A), so [Ba] in the Indian Ocean is very well predicted. I would like to know whether the ML model also works well when testing data fall outside training data. That said, it is necessary to include the randomly assigned training-testing separation results for comparison in the appendix.
For the paper to benefit the community, additional discussion about the implications of existing interpretations that rely on Ba* would be of great interest. Unlike [Ba] or Ωbarite, the scientific significance of Ba* is not clear in the current version of the manuscript. Specifically, what do the positive and negative Ba* mean? Does the global Ba* heterogeneity in Figs. 4-7 reveal oceanographic and biogeochemical processes affecting the dissolved Ba-Si relationship? I believe the relationship to silicon is one of the main targets of this study which requires in-depth discussion.
Sect. 5.1: When the authors identified the optimal predictor model, they eliminated features that offer the least improvement to ML model performance. Why are only MLD and chlorophyll a eliminated, but not salinity (they improve the model equivalently low, i.e., -3%)? Including salinity tends not to change the MAD much due to its high p-value. The authors need to justify it further.
Minor comments:
L81: The solubility product Ksp is a constant at a given temperature and pH. Thus, Ksp values at different depths are different. The text needs to clarify how [SO4] and Ksp are assigned.
L266-267: [Si]in situ and [Ba]in situ from the WOA, [Ba]predicted from ML model output?
Fig.3B and L502-513: The authors attribute the deviation between observed and ML-modeled [Ba] from SS259 in the deep Bay of Bengal to the uncertainty of in situ [Ba] measurements. Could this deviation result from the factors eliminated from Model #3336? This possibility needs to be discussed at least.
Citation: https://doi.org/10.5194/essd-2023-67-RC1 -
RC2: 'Comment on essd-2023-67', Christophe Monnin, 21 Mar 2023
reply
L. 348 et seq. The criterion for equilibrium cannot be Ω =1.0000000…. which never happens. Instead a range of Ω values must be defined, that reflects the uncertainties in the input data (Ba and SO4 concentrations) and in the thermodynamic model (barite solubility product and activity coefficient). Taking Ω values between 0.9 and 1.1 is already a very demanding criterion that we have retained in our paper (Monnin et al., 199). So the discussion in this paragraph should be corrected according to this. For example Ω = 0.97 can be considered as a sign of barite equilibrium. See Monnin et al., 1999 for a discussion. I see that the authors discuss this point in section 5.2.3.
Also the statement in the abstract that "the ocean below 1,000 m is, on average, at or near saturation" is a simplification of what has been previously depicted by Monnin et al. (and by Rushdi et al.). For example we wrote that in the Pacific Ocean "There is a return to undersaturation of the water column at depths of about 3500 m in the Pacific and of about 2500 m in the Southern Ocean. The reverse is found for GEOSECS station 446 in the Gulf of Bengal for which the highest Ba concentrations can be found at depth: surface waters are undersaturated and equilibrium is reached below 2000 m". This simplifying statement by the authors deteriorates the conclusions that they have obtained with their powerful and elaborate approach.
The discussion averaging barite saturation for the global ocean (L. 579-580:" the ocean below 1,000 m exhibits a mean Ω barite ≥0.92, which implies that much of the deep ocean is close to saturation with respect to BaSO4") tends to hide the fact that specific regions of the global ocean do not fit in this picture and should be given close attention (e.g. the role of hydrothermal activity above ridges, or, as discussed by the authors, the continental input through river discharge).
This being said, the paper is quite well organized, the presentation of the model and of the results quite concise.
Although my opinion is that the discussion repeats in part what has been concluded from what the authors call mechanistic models and as such should be simplified, the paper presents a very good account of the Ba problem in the ocean and a way to address it (by what could be called "brute force"…). It could be published as it is. The fact that the model can be adapted for other tracers with a minimal effort is quite encouraging.
L. 67: missing figure number
Typo in the vertical axis of Fig. 4, 5 and 6D: replace the lower 1.75 by 0.75.
L.338. Sentence construction inadequate.
Citation: https://doi.org/10.5194/essd-2023-67-RC2 -
RC3: 'Comment on essd-2023-67', Frank Pavia, 28 Mar 2023
reply
I really enjoyed reading this paper by Mete et al. The manuscript was well-written, well-organized, extremely clear, and the work generates a product that should be used by chemical oceanographers and paleoceanographers alike. I have a few comments, and only the first is substantial.
The decision to only use Indian Ocean data as the validation dataset is definitely curious, and I think not justified well-enough in the text. The authors cite Rafter et al. 2019 as their source for doing location-based separation of training and test data to avoid overfitting. Rafter et al., however, don't isolate a single basin for this - they use whole ship transects as their witheld data, and these transects span multiple basins, hemispheres, and latitudes. Testing a globally-trained dataset on a regionally-confined subset of data doesn't, at least to a reader not well-versed in these sorts of choices, inspire the maximum amount of confidence in the results of the global output of the model. Perhaps the authors could more completely explain this choice to bulwark against this criticism.
Figures 4-7: The 0.75 value for barite saturation is labeled as 1.75 in all these figures.
Line 342: Change Look to Looking
Figure 8: I think it would be helpful to have a key/legend for the basins in A and C, similar to the key for depths in B and D. It is not easy to match the text color of the basins to the colors of the histograms in panels A and C.
Lines 507-513: Did all or many of the best models tend to produce the same systematic mismatch between predicted and measured Ba for the Singh et al. 2013 data? It would be helpful to know to make sure it isn't a quick of this specific model.
Section 5.3. I really enjoyed reading this section and am looking forward to the community's use of this data product.
Citation: https://doi.org/10.5194/essd-2023-67-RC3 -
RC4: 'Comment on essd-2023-67', Anonymous Referee #4, 28 Mar 2023
reply
Summary: This study uses a machine learning approach to reconstruct global Ba concentrations in the ocean, and uses the model output to calculate Ba* and barite saturation state in the global ocean. In general this is solid study that provides model output that will be useful to other researchers, and the methodology is sound, with one exception that I detail below. I think that with minor revisions the study should be acceptable for publication.
Specific Comments:
- Line 89: I disagree that mechanistic modeling should be called the “gold standard”. A model is useful if one can learn something from it, period. Some mechanistic models are useful, some statistical models are useful.
- Line 104: The entire process and methodology of this study seems to owe a large intellectual debt to ML-based trace metal modeling studies of Roshan et al. These pioneering studies should be acknowledged here, e.g. Roshan et al. (2018), Roshan et al. (2020)
- Line 196: Explain what you mean by “non-parameteric” and “kernel-based”
- Line 196: What is the specific MATLAB function, and what options did you specify
- Line 199: Explain the meaning of “basis” and “kernel-function” parameters
- Line 310: The p-values seem to be meaningless. Not sure they add any value here.
- Figure 8: Are these values volume-normalized? If not, they would skew toward surface values where grid boxes are smaller.
- Section 5.1: It makes sense to remove models with lat and lon as predictors. After that, I disagree with all of the choices presented in this section, which ultimately lead to the choice of 1 model out of a possible 1,687 — talk about overfitting!
- Eliminating models with Chl-a and MLD predictors: I will accept eliminating Chl-a, since including it degraded the median model. But just because including MLD only improved the average model by 3% is not a good reason to remove it as a predictor. You have a small sample size in the validation set, and MLD may encode key information for particular environments that are under-represented in the validation set. If it improves the model on average, it is reasonable to keep it.
- Eliminating models with Si eliminates the strongest predictor, which seems foolish. There is no reason to eliminate Si just because it appears in the definition of Ba*, which is not even in the target data. If you want the model to predict Ba* in addition to Ba, you could add that to the target when you train the models, but that is still no reason to remove Si from the predictor data (if it were, Si wouldn’t even be in the list of features that you consider for this model).
- The reason given for eliminating models with <=4 features is not valid. The analysis shows that *on average* the models with 5-8 predictors performed best (Figure 3). But that doesn’t mean that there are not models with <5 predictors that could perform just as well and be just as probable (in fact there clearly are, as shown in Figure 3). It is arbitrary to eliminate these models.
- In general, there is simply no good reason to choose 1 model as the “optimal” model. In fact the great benefit of the model testing that the authors have done is that it affords an ensemble of models from which to choose, many of them being equally or approximately equally probable. It one wants to “weight” the models one could do so be defining a probability function (MAD or something similar would do) and assigning a probability to each of the models. This would be better than simply choosing one single model (equivalent to assigning that model a probability of 1 and all the other models a probability of 0).
- Line 414: Figure 3 doesn’t show sea surface Ba.
- Line 426: Or maybe the model is just wrong in those regions. Do any other of the possible models (e.g., not model #3336) show elevated Ba at those locations?
- Line 430: Sure, it’s reasonable. It’s just unreasonable to say that there are no other possibilities.
- Line 551: It would be better to base such uncertainties on an ensemble of most-probable models (rather than a single model)
References:
- Roshan, S., DeVries, T., Wu, J., & Chen, G. (2018). The internal cycling of zinc in the ocean. Global biogeochemical cycles, 32(12), 1833-1849.
- Roshan, S., DeVries, T., & Wu, J. (2020). Constraining the global ocean Cu cycle with a data‐assimilated diagnostic model. Global Biogeochemical Cycles, 34(11), e2020GB006741.Citation: https://doi.org/10.5194/essd-2023-67-RC4
Öykü Mete et al.
Data sets
Distribution of dissolved barium in seawater determined using machine learning T. J. Horner and O. Z. Mete https://www.bco-dmo.org/dataset/885506
Öykü Mete et al.
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
288 | 79 | 15 | 382 | 2 | 2 |
- HTML: 288
- PDF: 79
- XML: 15
- Total: 382
- BibTeX: 2
- EndNote: 2
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1