the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Synthesis of data products for ocean carbonate chemistry
Abstract. As the largest active carbon reservoir on Earth, the ocean is a cornerstone of the global carbon cycle, playing a pivotal role in modulating ocean health and regulating climate. Understanding these crucial roles requires access to a broad array of data products documenting the changing chemistry of the global ocean as a vast and interconnected system. This review article provides a comprehensive overview of 60 existing ocean carbonate chemistry data products, encompassing compilations of cruise datasets, derived gap-filled data products, model simulations, and compilations thereof. It is intended to help researchers identify and access data products that best align with their research objectives, thereby advancing our understanding of the ocean's evolving carbonate chemistry.
Competing interests: One of the co-authors, Anton Velo (Instituto de Investigacions Mariñas, IIM – CSIC, Vigo, Spain), is a member of the editorial board of Earth System Science Data.
Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.- Preprint
                                        (935 KB) 
- Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
- 
                     CC1:  'Comment on essd-2025-255', Kunal Chakraborty, 21 May 2025
            
                        
            
                            
                    
            
            
            
                        - 
                                        
                                     AC1:  'Reply on CC1', L.-Q. Jiang, 27 May 2025
                            
                                        
                            
                                            
                                    
                            
                            
                            
                                        Dear Kunal Chakraborty, Thank you for your kind words about the article and for highlighting the data products that should be included. We'll be sure to incorporate them in the next version of the paper during the revision process. Liqing Citation: https://doi.org/10.5194/essd-2025-255-AC1 - 
                                                        
                                                     CC2:  'Reply on AC1', Kunal Chakraborty, 15 Jun 2025
                                            
                                                        
                                            
                                                            
                                                    
                                            
                                            
                                            
                                                        Dear Dr. Liqing Jiang, I'm glad to hear that you found the listed data products useful for inclusion in the manuscript. Thank you very much for agreeing to include them during the revision process. Best regards, Kunal Chakraborty Citation: https://doi.org/10.5194/essd-2025-255-CC2 
 
- 
                                                        
                                                     CC2:  'Reply on AC1', Kunal Chakraborty, 15 Jun 2025
                                            
                                                        
                                            
                                                            
                                                    
                                            
                                            
                                            
                                                        
 
- 
                                        
                                     AC1:  'Reply on CC1', L.-Q. Jiang, 27 May 2025
                            
                                        
                            
                                            
                                    
                            
                            
                            
                                        
- 
                     RC1:  'Comment on essd-2025-255', Anonymous Referee #1, 21 Jul 2025
            
                        
            
                            
                    
            
            
            
                        General Comments 
 The review ‘Synthesis of Data Products for Ocean Carbonate Chemistry’ presents an extensive and thorough summary of multiple oceanic carbonate system data products. This work includes an overview of cruise data compilations, time series data synthesis products, gridded and derived data products, multi-product analyses, and model based data synthesis products for ocean carbonate chemistry. Key information on data availability and access links, with associated references, is reported concisely and clearly. In addition, it is explained that overlaps, such as the use of SOCAT and/or GLODAPv2 in the majority of data products, allows for additional quality control and the test and intercomparison of the different approaches used to generate the respective products. This work is a very useful tool and valuable contribution and will considerably benefit the scientific community in several disciplines. The preprint manuscript is well written with key information regarding each data product reported in a series of tables. I recommend publication following minor revisions, as detailed in the following.Specific Comments 
 Check for repeat definitions of certain acronyms, e.g. DIC, TA, as the subsequently repeated definitions can be removed to make the text more concise. Make sure to use the acronyms in the remainder of the text, for consistency and instead of writing out in full each time.
 Check consistency of certain variables, specifically the saturation states as Ωarg, Ωarag and ΩAr styles are used, for example. Or if the different notations that are used in the manuscript text are due to the specific notation of that variable in the data product, then perhaps retail use of consistent acronyms, if they are the same, e.g. dissolved inorganic carbon (DIC), and re-define variables using a different acronym/notation per dataset, e.g. aragonite saturation state (Ωarag).
 Check consistency with defining the pH scale used, e.g. pH on total scale or pH on the total hydrogen ion scale.Technical Corrections 
 Line 111 is ‘… 1690 to 1730 Gt of Carbon …’ a global average? what is defined as surface ocean (depth)?’
 Line 113 regarding ‘…the oceans' buffer capacity…’ add details to further explain this concept on first usage; buffer against?
 Line 120 replace ‘… parameters… ‘ with ‘variables’ for correctness and consistency as used on Lines 347, 348, 385 for example; this is the case for the use of this word in other places in the text
 Line 129 is there a word missing at the end of the statement ‘… weakened seawater buffer capacity by biologically induced CO2…’?
 Line 184 replace ‘… parameters… ‘ with ‘variables’; this is the case for the use of this word in other places in the text
 Line 221 acronyms ‘… dissolved inorganic carbon (DIC), total alkalinity (TA)…’ are already defined earlier in the text
 Line 243 replace ‘… parameters… ‘ with ‘variables’; this is the case for the use of this word in other places in the text
 Line 266 replace ‘… parameters… ‘ with ‘variables’; this is the case for the use of this word in other places in the text
 Line 292 replace ‘… parameters… ‘ with ‘variables’; this is the case for the use of this word in other places in the text, which won’t be indicated for each further occurrence beyond page 10 to limit the repetition
 Line 349 acronyms ‘… dissolved inorganic carbon (DIC), total alkalinity (TA)…’ are already defined earlier in the text
 Line 414 replace ‘…alkalinity…’ with ‘… TA…’, assuming it is total alkalinity or otherwise please specify
 Line 488 is the statement ‘… surface-ocean carbonate conditions …’ referring to carbonate ion concentrations or carbonate system variables, please clarify
 Line 690-691 check font and type setting
 Line 717-718 check all that acronyms previously defined could be used for all variables listen in full
 Line 735 has acronym ‘… OA …’ been defined?
 Line 841 has ‘ … [H+] …’ been defined/explained in full?Citation: https://doi.org/10.5194/essd-2025-255-RC1 - 
                                        
                                     AC2:  'Reply on RC1', L.-Q. Jiang, 07 Sep 2025
                            
                                        
                            
                                            
                                    
                            
                            
                            
                                        Apologies for the delayed response, and many thanks for your kind words about this manuscript as well as your excellent suggestions. I’ll work with my co-authors to implement them soon. Citation: https://doi.org/10.5194/essd-2025-255-AC2 
 
- 
                                        
                                     AC2:  'Reply on RC1', L.-Q. Jiang, 07 Sep 2025
                            
                                        
                            
                                            
                                    
                            
                            
                            
                                        
- 
                     RC2:  'Comment on essd-2025-255', Meg Yoder, 03 Oct 2025
            
                        
            
                            
                    
            
            
            
                        This paper encapsulates a wide arrange of carbonate system chemistry data including bottle samples, sensor data, gridded products, interpolations, and modeling outputs of varying complexities. This work will act as a useful synthesis, and currently reaches the goal stated in the abstract to “help researchers identify and access data products that best align with their research objectives”. The authors effectively divide a wide variety of data products into cohesive groups and make clear to the reader how at its core, all of these products stem from direct measurements. On the whole, I believe this paper will shortly be ready for publication and prove to be a useful scientific resource, pending minor revision to improve clarity and utility for the reader. General document comments 1. There is inconsistency across the document with respect to acronyms. A uniform system should be applied throughout the document. Understandably, with 35 pages of text it may make sense to redefine acronyms more than once as a reader may be looking at select sections. Especially in a research community where acronyms are used very often, acronyms could be included in this document to educate the reader, but consideration should be given to the purpose of abbreviating throughout the document. A selection of examples highlighting inconsistent acronym use: - The full name and acronym are used (line 443 “European Station for Time Series in the Ocean (ESTOC)”).
- Only full names are used, even when the acronym was defined earlier or is commonly used (line 956 “Global Carbon Budget”, line 952 “Sixth Coupled Model Intercomparison Project”).
- The full name is used and acronym is not defined, then acronym used later in the text for No. 55 “Global Ocean Biogeochemical Models”...”GOBMs”.
- Acronyms are used as titles for most, but not all, datasets.
- The acronym is defined far earlier in the text (“LDEO” defined on line 173, used without definition on line 573.)
- In some cases, the acronym is used first and full name included in parentheses (line 1060 “CARINA (CARbon dioxide IN the Atlantic Ocean)” but in most cases full name is used and acronym is defined in parentheses (line 1065-1066 “Global Ocean Ship-based Hydrographic Investigations Program (GO-SHIP”).
- Acronyms are defined but then not used later in the text (line 817 “World Ocean Atlas 2013 (WOA13)”).
 2. Descriptions with repetitive information This project requires significant synthesis of many datasets, and as a result is a long document. One way you might consider abbreviating the text would be to eliminate sentences in the description of each data product that restate the utility of the measurements, which you’ve already explained to the reader (very convincingly!) in the introduction. For example, data set 16 states that “These synergistic measurements have contributed to global ocean carbon observation networks (e.g., the newly released SOCATv22), which have improved our ability to characterize natural and anthropogenic drivers of ocean carbon uptake and acidification.” One might revise this to simply say “the measurements are included in global ocean carbon observation networks (e.g., the newly released SOCATv22)”. Another example is Dataset 20 which states “[a]ltogether, SPOTS’ pilot increased the readiness of biogeochemical time-series (Lange et al., 2023) and facilitates a variety of applications that benefit from the collective value of biogeochemical time-series observations”. This is likely true, but similar statements could be made for each dataset. Uniformity across the descriptions should be assessed. If I had to guess, a scientist from each project wrote a summary of their dataset and understandably is accustomed to explaining the utility of the data along with the description of the data itself. 3. Interpretation within description Similarly, a few of the datasets include a sentence on interpretation. For example, dataset 13 states “The findings underscore significant variability in the seasonality and interannual trends of surface carbonate chemistry across different regions and reef zones.” While this is likely true, it stands out when reading the descriptions because most do not include this type of interpretation or description of data patterns, and it is outside the aims and scope of ESSD to include. 3. Connecting datasets There are some datasets which have different versions or very closely related products but have been listed as subsets (26a and 26b) while others have separate entries (44 and 47, 58 and 59). Some clarification could be made about why this choice was made in the methods, or a uniform decision about closely related products could be made. 4. Submission of future datasets It’s a great service to the research community that you intend to update this collection of data in the future. In order to get submission from scientists who weren’t involved in the first iteration, it may be worth mentioning this fact in the abstract, so that those who may have data to offer know they can add theirs even if they don’t read further into the paper. Line edits Line 112 - Reconsider word choice for “belies”, sentence should be simplified for clarity Line 158 - are cruise datasets ordered in any particular way? Line 160 - same as above for time series data. Line 260 -Reconsider word choice for “aegis”, sentence should be simplified for clarity Line 468 - Dataset 19: Were DIC and TA samples truly collected hourly for 6 years at this site? I am skeptical (but highly impressed if so!). Clarification about if these are calculations from pH/pCO2 sensor data should be described more fully in the text. Line 524 reads “Self-Organizing Map-Feed-Forward-Network (SOM-FFN)” but has a misplaced hyphen and should read “Self-Organizing-Map Feed-Forward-Network (SOM-FFN)”. Line 719 - It seems some descriptions of the reconstructed or estimated pCO2 data include uncertainty and misfit values (such as #38), but most do not. I would either leave this out here, or consider whether uncertainties should be added to all datasets (probably not). Line 752- 41) Refers to resolution in 3 nautical miles in the text but uses degrees in the data table, probably worth aligning in the text. Line 887-888 It doesn’t seem needed to list individual depth levels, as levels aren’t listed for other products. Line 902- Table 4. Clarity and consistency between in text descriptions and tables would be helpful for the reader, as well as consistency in where specific information is listed. I have identified a number of examples, but each of the tables and datasets should be checked over. Below is an example. No. 42 and 43 list temporal resolution as “adjusted to 2002” and “adjusted to 2000” respectively, then state that they are climatologies in the highlights column of the table. Alternatively, No. 44 includes both pieces of information in the temporal resolution (“monthly climatology, centered around 2010/2011”). Please clarify if being centered around and referenced to a specific year are the same and if so align language, as well as aligning where in the text and tables the information is included. Line 856- No. 49 Appears to be a typo in text or title. Clarify years covered, description says 1994, 2004, and 2014, title says 1994 to 2007, and table says 1994 and 2007. Citation: https://doi.org/10.5194/essd-2025-255-RC2 
Data sets
Surface Ocean CO2 Atlas Database Version 2024 (SOCATv2024) (NCEI Accession 0293257) Dorothee C. E. Bakker et al. https://doi.org/10.25921/9wpn-th28
Global Ocean Data Analysis Project version 2.2023 (GLODAPv2.2023) (NCEI Accession 0283442) Siv K. Lauvset et al. https://doi.org/10.25921/zyrq-ht66
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 2,174 | 234 | 35 | 2,443 | 51 | 60 | 
- HTML: 2,174
- PDF: 234
- XML: 35
- Total: 2,443
- BibTeX: 51
- EndNote: 60
Viewed (geographical distribution)
| Country | # | Views | % | 
|---|
| Total: | 0 | 
| HTML: | 0 | 
| PDF: | 0 | 
| XML: | 0 | 
- 1
 
                         
                         
                         
                        



 
                 
                 
                 
                 
                
The review article ‘Synthesis of Data Products for Ocean Carbonate Chemistry’ offers a thorough and valuable summary of existing ocean carbonate data products, which will greatly benefit the scientific community. However, it overlooks two machine learning-based products that focus on improving surface pCO2 estimates in the Indian Ocean region.
The first is an ML-based climatological pCO2 data product recently developed and published in the journal Scientific Data for the Bay of Bengal region (https://www.nature.com/articles/s41597-024-03236-w) (Joshi et al., 2024). This data product integrates publicly available open-ocean observations with data from the Indian Exclusive Economic Zone. Given that the Bay of Bengal is a unique basin with very limited publicly accessible pCO2 observations, this high-resolution (~0.083°) climatological pCO2 data product represents a significant advancement in our understanding of pCO2 dynamics in the region. Therefore, it may be appropriate to include this product in Section 3.1.3 (i.e., Gridded and derived data products) of the manuscript to enhance its visibility and encourage its use within the scientific community.
The second is a hybrid data product that corrects long-term (1980–2019), high-resolution (~0.083° or 1/12°) modeled surface pCO2 for the Indian Ocean region (as a part of RECCAPv2) using cruise-based observations and an XGB algorithm. This product, available at https://www.nature.com/articles/s41597-025-04914-z (Ghoshal et al., 2025), falls under Section 3.1.6 (i.e., Model-based and hybrid data products and analysis) of this manuscript.
In this study, a machine learning (ML) approach is employed to correct biases in surface pCO2 simulations generated by the INCOIS-BIO-ROMS model (pCO2model) over the period 1980–2019. The ML model is trained using the differences between observed (pCO2obs) and modeled pCO2 to estimate the spatio-temporal deviations (pCO2obs − pCO2model). These interannually and climatologically varying deviations are then added back to the original model output, resulting in two improved data products: pCIBR_Int and pCIBR_Clim.
Evaluation against independent datasets, including moored observations (BOBOA), the gridded SOCAT product, and other ML-based pCO2 products (such as CMEMS-LSCE-FFNN and OceanSODA), demonstrates a significant improvement of approximately 40% ± 3.31% in RMSE compared to the original model. These corrected pCO2 products are expected to improve the accuracy of air–sea CO₂ flux estimates across the Indian Ocean from 1980 to 2019, helping to better identify key source and sink regions and enhancing our understanding of the Indian Ocean’s contribution to the global carbon budget.
Further, in Section 3.1.6 (i.e., Model-based and hybrid data products and analysis), you may also consider including the model-based dataset and analysis of ocean acidification in the Indian Ocean from 1980 to 2019, as presented by Chakraborty et al. (2024). The paper is available at: https://agupubs.onlinelibrary.wiley.com/doi/pdf/10.1029/2024GB008139. This study provides a comprehensive assessment of ocean acidification trends across the Indian Ocean and its sub-regions, utilizing outputs from a numerical model, an offline biogeochemical (BGC) model, and two machine learning-based products. Overall, the research consolidates the current state of knowledge on Indian Ocean acidification by integrating available field observations, reconstructed datasets, and model simulations.
References:
Joshi, A. P., Ghoshal, P. K., Chakraborty, K., & Sarma, V. V. S. S. (2024). Sea-surface p CO2 maps for the Bay of Bengal based on advanced machine learning algorithms. Scientific Data, 11(1), 384.
Ghoshal, P. K., Joshi, A. P., & Chakraborty, K. (2025). An improved long-term high-resolution surface p CO2 data product for the Indian Ocean using machine learning. Scientific Data, 12(1), 577.
Chakraborty, K., Joshi, A. P., Ghoshal, P. K., Baduru, B., Valsala, V., Sarma, V. V. S. S., Metzl, N., Gehlen, M., Chevallier, F., & Lo Monaco, C. (2024). Indian Ocean acidification and its driving mechanisms over the last four decades (1980–2019). Global Biogeochemical Cycles, 38(9), e2024GB008139. https://doi.org/10.1029/2024GB008139.