A global monthly 3D field of seawater pH over 3 decades: a machine learning approach

Zhong, Guorong; Li, Xuegang; Song, Jinming; Qu, Baoxiao; Wang, Fan; Wang, Yanjun; Zhang, Bin; Cheng, Lijing; Ma, Jun; Yuan, Huamao; Duan, Liqin; Li, Ning; Wang, Qidong; Xing, Jianwei; Dai, Jiajia

doi:https://doi.org/10.5194/essd-17-719-2025

Articles | Volume 17, issue 2

https://doi.org/10.5194/essd-17-719-2025

© Author(s) 2025. This work is distributed under
the Creative Commons Attribution 4.0 License.

https://doi.org/10.5194/essd-17-719-2025

© Author(s) 2025. This work is distributed under
the Creative Commons Attribution 4.0 License.

Articles | Volume 17, issue 2

Data description paper

|

24 Feb 2025

Data description paper |

| 24 Feb 2025

A global monthly 3D field of seawater pH over 3 decades: a machine learning approach

Guorong Zhong, Xuegang Li, Jinming Song, Baoxiao Qu, Fan Wang, Yanjun Wang, Bin Zhang, Lijing Cheng, Jun Ma, Huamao Yuan, Liqin Duan, Ning Li, Qidong Wang, Jianwei Xing, and Jiajia Dai

Download

Final revised paper (published on 24 Feb 2025)
Supplement to the final revised paper
Preprint (discussion started on 15 May 2024)
Supplement to the preprint

Interactive discussion

Status: closed

RC1:
'Comment on essd-2024-151', Anonymous Referee #1, 11 Jun 2024

The comment was uploaded in the form of a supplement: https://essd.copernicus.org/preprints/essd-2024-151/essd-2024-151-RC1-supplement.pdf

Citation: https://doi.org/10.5194/essd-2024-151-RC1
- AC1: 'Reply on RC1', Guorong Zhong, 04 Aug 2024
  
  The reply on RC1 is listed in the attached PDF file.
  
  Citation: https://doi.org/10.5194/essd-2024-151-AC1
RC2:
'Comment on essd-2024-151', Anonymous Referee #2, 30 Jun 2024

Please find the Reviewer's comments in the attached document.

Citation: https://doi.org/10.5194/essd-2024-151-RC2
- AC2: 'Reply on RC2', Guorong Zhong, 04 Aug 2024
  
  The reply on RC2 is listed in the attached PDF file.
  
  Citation: https://doi.org/10.5194/essd-2024-151-AC2

Peer review completion

AR: Author's response | RR: Referee report | ED: Editor decision | EF: Editorial file upload

AR by Guorong Zhong on behalf of the Authors (07 Aug 2024) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (03 Sep 2024) by Frédéric Gazeau

RR by Anonymous Referee #2 (17 Sep 2024)

RR by Anonymous Referee #1 (20 Sep 2024)

Suggestions for revision or reasons for rejection

Review of: “A global monthly 3D-field of seawater pH over 3 decades: a machine
learning approach” by G. Zhong et al.; submitted to Earth System Science Data
First of all, I would like to thank the authors for the careful and thoughtful responses to my comments
and suggestions. I believe their revisions have improved the manuscript, and the new details make it
clearer to read. However, there are still some issues that deserve to be addressed before the manuscript
can be published.
Specific points to raise :
- To avoid confusion regarding temperature, I suggest adding the following clarification: "Here, we
present a monthly four-dimensional 1°×1° gridded product of global seawater pH at total scale and insitu temperature (without standardization to 25°C)."
This will ensure that readers understand the product focuses on pH measurements, while avoiding any
implication that it is also a temperature product. I recommend adding this clarification at line 60 and
elsewhere when necessary.
- The use of sin(Lat) as a predictor is questionable since latitude is not circular.
Response: This normalization method was inspired from previous research, such as Denvil-Sommer,
A., et al. (2019), where they also normalized latitude and longitude to radians using sine and cosine
transformations. Also, we have corrected the description name in Table 1 to "Sine of (latitude ·
π/180°)", "Sine of (longitude · π/180°)", and "Cosine of (longitude · π/180°)". As we used the "sind"
and "cosd" function (sind(latitude) equals sin(latitude · π/180°)) in MATLAB, the original description
was misleading and has been corrected.
The use of sin(Lat) as a predictor remains questionable, as latitude is not a circular (or periodic)
variable. The response mentions that this normalization method was inspired by previous research,
such as Denvil-Sommer et al. (2019), and the correction to the description in Table 1 has been made.
However, I still have concerns for the following reasons:
• Sine and cosine functions are typically applied to periodic variables, such as longitude or day of
the year, where values "wrap around." Latitude, on the other hand, is not periodic—lat = -90° is
not equivalent to lat = 90°.
• Furthermore, the expression sin(lat · π/180°) seems inappropriate, as radians conversion should
account for the full range of latitude values. If anything, it would be sin(lat · π/90°), but even
this is not ideal for latitude.
Given these issues, I maintain that latitude is not a suitable candidate for inclusion via sine and cosine
transformations.
- Clarify how depth is used as a predictor and whether it corresponds to the depth of retrieval of the
output or if the FFNN estimates X values for X depth levels.
Response: Thanks for the suggestion. Depth was used in the same way as latitude or time-related
variables in Table 1. The sample depths of GLODAP measurements were input into FFNNs during the
training process, and the depths of 41 depth layers defined as target output layers were input into
FFNNs during the interpolation process to generate a product covering 0-2000m. The description has
been added in the 2.1 section as the following: "Temporal and spatial sample information, including
latitude, longitude, depth and sample time, was also used as supplementary variables. Latitude and
longitude were normalized to radians using sine and cosine transformations, to present connected
sample position information. The spatial sample position and time information of GLODAP
measurements were input in the training of FFNNs, and the spatial position and time of defined 1° and
monthly product grids were input into FFNNs during the interpolation process to output a gridded
product."
Thank you for the explanation regarding how depth is used as a predictor in the FFNN models.
However, I am still unclear as to why pressure (pres) is not systematically included as a predictor in
every FFNN (as seen in Table 2). Given that depth-related information is a critical factor, especially in
oceanographic models, it seems logical that pres would be consistently used alongside depth.
Additionally, this raises the broader question of why other key spatial-temporal predictors, such as
longitude, latitude, and time, are not always systematically included as inputs in the FFNNs. It's unclear
why time, in particular, is only integrated in some models and not others, given its fundamental
importance in understanding temporal variability in the data.
I suggest providing a clearer rationale for the selective use of these variables and ensuring that key
predictors are consistently applied across all FFNNs, or explaining why their inclusion is sometimes
omitted.
- Adding a column to Table 1 to indicate which process each variable is associated with would be
informative.
Response: Thanks for the suggestion. The related processes have been added in Table 1 as the
following:
Thank you for incorporating the suggestion to add the related processes to Table 1. However, I believe
further clarification is still needed for some variables. Specifically, variables like PAR, KD, RRS, and
Ta/b may also be associated with the biological production of organic matter, as they are crucial in this
context. Ensuring that these variables are linked to biological processes in the table will provide a more
complete understanding of their roles. I don’t think that ‘Supplementary for lacking interannual
variability of other variables, or potential correlation with unclear process affecting pH’ is relevant for
these variables.
-Line 191: The paragraph is unclear. The statement, “Therefore, the uncertainty of our pH product was
directly estimated from the FFNN pH predicting errors, instead of synthesizing the inherent uncertainty
of each used predictor product,” needs further clarification. How was this done?
Response: As described in equation (2), the uncertainty was estimated from local pH value and pH
predicting error in the corresponding province. For the uncertainty in certain grid, we first convert pH
predicting error in the corresponding province into difference of [H+ ], by logarithm transfer of
predicted and GLODAP measured pH and then calculating RMSE. Subsequently, the RMSE of [H+]
was transferred to pH uncertainty based on the local pH value. 𝜎 = −log ଵ଴(10ି୮ୌ 𝑅𝑀𝑆𝐸 బ − [ୌశ])
−pH where RMSE[H+] was the RMSE of [H+ ] converted from FFNN pH predicting error in each
vertical layer and in each biogeochemical province. pH0 was the local predicted pH value in the grid
that uncertainty was estimated. Due to missing inherent uncertainty of particular predictor product,
estimating uncertainty from inherent uncertainty of used predictor products was unfeasible.
Thank you for the detailed response, but the explanation is still unclear regarding how the method
described provides local uncertainties. Specifically, the statement “the uncertainty of our pH product
was directly estimated from the FFNN pH predicting errors, instead of synthesizing the inherent
uncertainty of each used predictor product” remains ambiguous.
While equation (2) describes converting pH prediction errors into RMSE for [H+] and then back to pH
uncertainty, it’s still not clear how this process provides uncertainty estimates at a local (grid-specific)
level. Could you provide more detailed clarification on how this method operates for each grid and how
the local predicted pH value (pH ) factors into the uncertainty calculation? Additionally, a more ₀
intuitive explanation of why inherent uncertainty from each predictor was not feasible would help
clarify this point for readers.
- Moreover, it would be interesting to add comparison against qualified pH data from BGC-Argo
dataset.
Response: Thanks for the suggestion. Comparison of global scale trend has been added in Table 4. The
BGC ARGO pH data qualified by IMOS has been added in the validation section. Different from the
validation results based on the GLODAP dataset, the RMSE between FFNN pH and BGC ARGO pH
data is higher in the deep ocean. Only the bias between FFNN pH and BGC ARGO pH data tends to
increase with depth in most basins. In contrast, greater biases between FFNN pH and GLODAP pH
occur mainly in the surface layer. Especially in the Southern Ocean, the bias between FFNN pH and
GLODAP pH is nearly zero below 1000 m, notably lower than biases between FFNN pH and BGC
ARGO pH data ranging from 0.053 to 0.076. This may be primarily attributed to the discrepancies
between GLODAP dataset and the BGC ARGO dataset in the deep ocean, as our product was based on
GLODAP dataset and small biases with GLODAP pH were observed in the deep ocean.
Thank you for incorporating the comparison with the BGC-Argo pH data into the validation section.
However, there are several important points that still need to be addressed:
• BGC-Argo should be written with proper formatting (i.e., "BGC-Argo," not all caps for "Argo").
• It would be valuable to include a reference to the BGC-Argo dataset, such as Claustre et al.
(2020) https://www.annualreviews.org/content/journals/10.1146/annurev-marine-010419-
010956.
• As it is mentioned that only data qualified by IMOS were used, does this imply that the
validation was limited to the Southern Ocean and data from CSIRO? If so, the validation is not
truly global. For a more comprehensive validation, I suggest using data from all DACs (Data
Assembly Centers), accessible via the GDACs (Global Data Assembly Centers). There are two
GDACs available: US GDAC and France Coriolis GDAC (https://argo.ucsd.edu/data/data-fromgdacs/).
• A geographical map of BGC-Argo pH profiles should be included to visualize where the
validation was performed.
• Specific details regarding the data used in the validation should be provided. Only delayedmode pH-adjusted data with QC (Quality Control) 1 applied should be used for a robust
comparison.
• Additionally, BGC-Argo should be acknowledged in the acknowledgments, following the
guidelines here: https://argo.ucsd.edu/data/acknowledging-argo/.

Referee Report: PDF

Hide

ED: Reconsider after major revisions (07 Oct 2024) by Frédéric Gazeau

AR by Guorong Zhong on behalf of the Authors (14 Nov 2024) Author's response Manuscript

EF by Anna Glados (15 Nov 2024) Author's tracked changes

ED: Referee Nomination & Report Request started (18 Nov 2024) by Frédéric Gazeau

RR by Anonymous Referee #1 (27 Nov 2024)

Suggestions for revision or reasons for rejection

Review of: “A global monthly 3D-field of seawater pH over 3 decades: a machine
learning approach” by G. Zhong et al.; submitted to Earth System Science Data

First of all, I would like to thank the authors for the careful and thoughtful responses to my comments and suggestions. I believe their revisions have significantly improved the manuscript, and the new details make it clearer to understand.

There are two minor comments that I would like the authors to address before publication.

Specific points to raise :

1. Depth as a predictor:
I apologize for the confusion caused by my earlier comments where I used “pressure” instead of “depth.”. I fully understand and agree with the authors’ rationale for excluding pressure due to its high correlation with depth. However, my concern pertains to the use of depth as an input predictor, which is not applied consistently across all bioregions. From the authors' first response, I understand that depth is used as input predictor to estimate pH at specific levels (e.g., one of the 41 defined depth levels). However, I remain unclear how pH at different depths is estimated in certain bioregions where depth is not included as an input (particularly in the mixed layer).
For example, in the Subpolar North Atlantic bioregion, pH in the mixed layer is estimated using predictors such as Phosphate, DO, Nmon, DIC, Sal, and Bathy, but depth is not explicitly included. None of these environmental predictors can fully substitute for depth. This issue also applies to the Equatorial Atlantic and Subtropical South Atlantic in the mixed layer. By contrast, for intermediate layers, this concern does not arise as depth is consistently included.
The paragraph the authors added regarding longitude, latitude, and time being replaceable by other environmental variables is very useful and improves clarity in the text. However, this point does not address the specific issue of how pH can be accurately retrieved for different depths when depth is not used as an input predictor.

2. Validation using BGC-Argo data:
Considering the spatial distribution of BGC-Argo data, which is concentrated mainly in the Southern Ocean, I think it would be valuable to include the number of points (or profiles) used to compute the biases presented in Table 5. This information would help clarify the representativeness of the validation results.

Referee Report: PDF

Hide

ED: Publish subject to minor revisions (review by editor) (06 Dec 2024) by Frédéric Gazeau

AR by Guorong Zhong on behalf of the Authors (12 Dec 2024) Author's response Author's tracked changes Manuscript

ED: Publish as is (26 Dec 2024) by Frédéric Gazeau

AR by Guorong Zhong on behalf of the Authors (31 Dec 2024)

Download

Article (8595 KB)
Full-text XML

Short summary

The continuous uptake of atmospheric CO₂ by the ocean leads to decreasing seawater pH, which is an ongoing threat to the marine ecosystem. This pH change has been globally documented in the surface ocean, but information is limited below the surface. Here, we present a monthly 1° gridded product of global seawater pH based on a machine learning method and real pH observations. The pH product covers the years from 1992 to 2020 and depths from 0 to 2000 m.