the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
CARIMED (CARbon, tracers, and ancillary data In the MEDiterranean Sea): A ship-based data synthesis product – overview and quality control procedures
Abstract. The Mediterranean Sea (MedSea) is highly sensitive to climate-driven changes in temperature, oxygen, and pH, among other variables. To better assess these long-term trends, we developed CARIMED (CARbon, tracers, and ancillary data In the MEDiterranean Sea), the first comprehensive, harmonised data synthesis product for the MedSea. CARIMED integrates hydrographic, inorganic carbon, transient tracer, and ancillary measurements from 46 research cruises spanning the period from 1976 to 2018, containing observations for the entire water column across all MedSea sub-basins. A substantial component of the data was retrieved from fragmented or locally archived historical records, thus consolidating previously inaccessible measurements. Following global synthesis approaches, CARIMED applies a quality-controlled, and bias-adjusted framework. A key adaptation was the secondary quality control (2QC) procedure, specifically tailored to the MedSea's unique hydrography, utilising sub-basin divisions and supplementary checks (including statistical consistency assessments) to resolve complex, often contradictory, inter-cruise offsets. This rigorous process minimised systematic biases, yielding a dataset with improved consistency, and highlights the urgent need for adapted standard operating procedures and reference materials to address the MedSea biogeochemical particularities. CARIMED delivers two complementary, freely available products: the aggregated original cruise data product (https://doi.org/10.20350/digitalCSIC/17785; García-Ibáñez et al., 2025) and the final bias-adjusted data synthesis product (https://doi.org/10.25921/cp5b-zq67; Álvarez et al., 2025). This essential resource establishes a new benchmark for assessing long-term biogeochemical trends, validating regional ocean models, and supporting climate-change mitigation and adaptation strategies in this rapidly changing semi-enclosed basin.
Competing interests: Antón Velo is editor within ESSD
Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.- Preprint
(1554 KB) - Metadata XML
-
Supplement
(5050 KB) - BibTeX
- EndNote
Status: final response (author comments only)
- RC1: 'Comment on essd-2025-759', Anonymous Referee #1, 19 Jan 2026
-
RC2: 'Comment on essd-2025-759', Anonymous Referee #2, 24 Jan 2026
Álvarez et al. present a highly valuable data compilation product focused on the Mediterranean Sea. To my knowledge, this is the first discrete bottle based, full water column data product for this region, representing a major advance in regional data synthesis efforts. The data product is of high quality, and the manuscript merits timely publication. Below are some suggested revisions that I hope the authors will consider:
Major comments:
1. While the authors’ efforts to conduct the crossover analysis are commendable, I have concerns regarding the underlying basis for this approach. In Lines 421–426, the basins in which the 2QC was conducted are listed. Please provide figures, either in the main text or in the Supplementary Material, demonstrating that seasonal and interannual natural variability in these basins is sufficiently small over time periods of decades to justify such adjustments. Without this evidence, there is a risk that real natural variability could be inadvertently removed.
2. The international research community has moved away from the outdated WHP format and has adopted a new standard (https://doi.org/10.3389/fmars.2021.705638). The WHP format was originally shaped by technical constraints of its time, resulting in non-intuitive abbreviations such as ALKALI and TCARBN. Notably, such abbreviations are not used consistently by the authors themselves in the manuscript, where terms such as DIC and TA are used instead. Another example is SAMPNO (Niskin Bottle #), which can be easily confused with Sample_ID, the unique identifier for a data row.
3. The provided NetCDF file does not seem to follow a specific template, with numerous empty strings as attributes, impeding readability. The CMIP community provides well-established examples of standardized NetCDF file structures, and I recommend following their templates. Please pay close attention to variable class types. For example, flags, station identifiers, cast numbers, and similar fields can be stored as integers (int32 or int64) rather than doubles to save disk space. In addition, please adopt a single convention for missing values (either NaN or −999, but not both); I recommend using −999. Finally, where possible, please maintain the same variable ordering as in the Excel file to improve consistency.
4. When the same variable is measured using multiple methods (e.g., CTDSAL vs. Salinity, CTDOXY vs. Oxygen), both values should be reported. Although a merged column will serve the needs of most users, providing both measurements allows additional quality control by users who wish to perform independent assessments.
5. Checking the internal consistency between calculated and measured pH, as well as between calculated and measured carbonate ion content, is a powerful approach for quality control of ocean carbon system variables. If this has not already been done, please consider incorporating this step into the QC procedure.
6. pH was reported at 25oC. Consider adding an additional column showing the pH values adjusted to the in-situ conditions.
7. SECT_ID values in the NetCDF file are empty, all entries are set to −999, and should instead be stored as strings.
8. Time values are not stored correctly in either the Excel or NetCDF files, all values are zero, appearing as “0000” in Excel and 0 in the NetCDF file.
9. The date information is currently stored as numerical values in a non-standard format. I recommend splitting the date into three separate columns (Year, Month, and Day). Alternatively, if a single column is preferred, the date should follow the ISO-8601 format (YYYY-MM-DD), which is widely supported by standard software libraries.
10. All the variables should be rounded to an appropriate number of decimal places. Otherwise, the current presentation may give a misleading impression of the measurement uncertainty.
11. Consider creating two separate folders: one containing the unadjusted values and another containing the bias-adjusted values. That will facilitate data use, as many programs can capture the columns automatically.
12. Consider adding a table to the supplementary information showing all the QC flag changes during the QC process.
13. Consider to adopt the concept of a Sampling ID, as it uniquely identifies each data row. This is particularly important when merging measurements from multiple sources into a single file, as done in this study. See the paper mentioned above for additional details about how to derive Sample IDs.
14. Access to individual cruise data files could be improved by adding, in the abstract, a link to the CARIMED page hosted on NCEI’s server: https://www.ncei.noaa.gov/access/ocean-carbon-acidification-data-system/oceans/CARIMED/. This would substantially facilitate access to the underlying cruise-level datasets. In addition, please consider including the dataset DOIs in Table 1. If the table becomes overly wide, some columns could be removed (e.g., Year, which is already contained in the date field).
15. Strictly speaking, the use of the term “concentration” is incorrect when the variables are reported using per mass based units. According to the IUPAC Gold Book, the term “content” should be used when referring to amounts expressed per unit mass of seawater, whereas “concentration” refers to the amount of solute per unit volume of solution.
Minor comments:
1. Table 1. Align the ordering with the table served through NCEI. Whatever you prefer, but it will be nice to be consistent with each other.
2. Page 8, Line 171: Cruise ID is not equivalent to Section ID. The latter is a leg or transect that is frequently visited for research purposes, e.g., P16. The former refers to a specific cruise visiting that leg, e.g., P16_2025.
3. Page 9, line 190. Please specify the equations or software used for this unit conversion and provide the appropriate citations. When calculating density for the conversion, please also indicate which temperature was used. For oxygen measurements, it is essential to use the temperature at which the Winkler samples were fixed, rather than the room or water-bath temperature that may be used for other variables (e.g., DIC).
4. Page 22, Table 3. Hour (HHMM). Shouldn’t "hour" be time here?
5. In the Excel file, Temperature has no units, only scale.
Citation: https://doi.org/10.5194/essd-2025-759-RC2 -
RC3: 'Comment on essd-2025-759', Anonymous Referee #3, 31 Jan 2026
The current manuscript describes CARIMED and the workflow and general approach behind it. CARIMED comes as two datasets that collect research cruise data of inorganic carbon and/or transient tracers in the Mediterranean Sea from 1976 to 2018, reformatted to a common, typically used format in the field (WHP) and made internally consistent. The procedures underlying this effort of data harmonization, quality control, and bias identification and adjustment (1QC, 2QC) are described in detail and follow state of the art techniques, e.g., those developed in GLODAPv2. Where necessary, they were expanded and adapted to the regional MedSea peculiarities, e.g., by focussing on specific water layers of low variability. The outcome is a benchmark and go-to dataset on MedSea carbon and ventilation dynamics, that will help re-define understanding of the interior Mediterranean carbon dynamics and beyond (e.g., as reference for adjustment of sensor-based observations through improved regressions, e.g., like CANYON-MED). The manuscript is well-written and it is easy to follow the line of thoughts and arguments.
Behind this data set description paper stands an enourmous work of tracing, finding, and recovering historic data as well as harmonizing data across 4 decades. The authors are to be commended for this work as well as for their comprehensive description (incl. historical background, e.g., section 4.3, which is extremely important to allow informed use of the data).
Questions and suggestions:This work focusses on carbon and transient tracer cruises and data, with nutrients as ancillary parameters. While it mentions regionally-focussed data products for nutrients in the Western MedSea, it would be nice to get an impression of how many more cruises there would be (than the current 46), if the data collection would relax its carbon/transient tracer requirement and include all MedSea cruises with high quality nutrient data? I guess this can only be done as an order of magnitude estimate for the entire Mediterranean at this point. But it may be helpful, to illustrate the potential and to kindle motivation of the same/other researchers to next target such an expanded, stronger nutrient-focussed FAIR MedSea dataset?
I feel that a few key figures have been placed in the supplemental materials. While one can easily get overwhelmed by the number of figures that come out of such an analysis, I suggest the authors go through the supplemental material and check whether they (visually) support key aspects of the work and approach, and should rather move some of them to the main manuscript instead. Views may differ on which of them are "key", but, e.g., I would have thought a version of Fig. S8 on the intra-cruise homogeneity across depth/time as one such key figure, relevant to get readers of the main manuscript an idea of the reliability of the data products across depth/time.
For the main manuscript, Fig. S8 should be re-worked by (1) using a colormap with a number of discrete rather than continuous color shading, which typically helps to better make out the data magnitude, and (2) using a less intense color for NaN (e.g., light grey instead of black) to focus the eye on the data rather than on the gaps. (The same holds for Figure 3.)
Aspect to improve:
- Fig. 1 / early section 2: In most of the text you refer to two synthesis products whereas in figure 1 you talk about three complementary outputs of your work. Maybe this can be harmonized a little more (whether you created two or three data outcomes?).Minor items:
- l.151: check sentence
- l.268: check sentence
- l.425: check sentence
- l.637-640: paragraph can be removed, because repeated at least the fourth time here.
- l.676: LINKCitation: https://doi.org/10.5194/essd-2025-759-RC3
Data sets
CARbon, tracers, and ancillary data In the MEDiterranean Sea (CARIMED) (NCEI Accession 0309255). Marta Álvarez et al. https://doi.org/10.25921/cp5b-zq67
CARIMED (CARbon, tracers, and ancillary data In the MEDiterranean Sea) aggregated original cruise data product [dataset] Maribel I. García-Ibáñez et al. https://doi.org/10.20350/digitalCSIC/17785
Viewed
| HTML | XML | Total | Supplement | BibTeX | EndNote | |
|---|---|---|---|---|---|---|
| 385 | 132 | 16 | 533 | 72 | 20 | 14 |
- HTML: 385
- PDF: 132
- XML: 16
- Total: 533
- Supplement: 72
- BibTeX: 20
- EndNote: 14
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
This is a very useful study showing the aggregated data of the MedSea. The data is harmonized and quality controlled using state of the art methods, strongly increasing their availability and usefulness for future studies. The data set and product include data that were almost lost and now saved for the future.
Some general comments:
I would be interested to know how many data points (and their percentage) were discarded for the different variables. This would give an idea of the quality of the data entering the data product. This is important information enabling the assessment of the reliability of the data product.
The use of NUT throughout the manuscript just for nutrients is not meaningful and strongly reduces the readability.
Eastern and western only get capitals when they are part of a standing name. It is thus the eastern Med basin and western Med basin, or eastern Med Sea and western Med Sea. I identified some below, but please correct throughout the manuscript.
I suggest not to use RM as abbreviation, since it diminishes the readability and its use is not necessary to reduce the number of words or length of the manuscript.
Some minor comments and corrections in the following:
L61-63 Please provide some more info on these abbreviations. The reader is now just presented with a list of letters without knowing what they intend
L66-67 GLODAP double definition
L72 … efforts include … (add -s, delete comma)
L72 nutrients (add -s)
L74 and further … (add and)
L97 Why ‘carbon-relevant variables’ within parentheses?
L99 parameters or variables?
L111 and 118 This is the same information, one of these can be deleted
L113 Figure 1: Please explain NCEI-OCADS and IEO-CSIC here
L194-195 “Nitrite and nitrate were summed to obtain total oxidised nitrogen (NO2- + NO3-), and listed as nitrate” I do not understand this. If nitrate occurs in the data set, why would you take nitrite, add it to nitrate and call the sum nitrate? This is not logical at all, because you change existing (and probably correct) nitrate values. If there is some logic behind this, please explain in the text.
L202 Why are the units not mentioned, as this is done for all other variables?
L220 lower instead of reduced
L224-225 Please provide some examples of such other issues
L258 research vessels instead of RV’s
L262 north, eastern
L305, 306 helium, tritium (no capitals)
L318 eastern and western (no capitals)
L329-332 METEOR_51_2 was also included in GLODAPv2. Were the adjustments applied taken from that project, or were new adjustments developed? If the second case, how do these adjustments compare?
L363 characterises (with s as you are using British spelling)
L378-379 … GLODAP updates, integrating ship-based biogeochemical observations (Olsen et al., 2016; Lauvset et al., 2024), have established a benchmark for 2QC procedures. (insert commas after updates and after 2024 for better readability)
L398-399 “MedSea biogeochemical variables are distinct in terms of their cycles and drivers (Álvarez et al., 2023) and challenging in terms of analytical SOPs and RMs” How can BGC variables be challenging as to SOPs and RM? Such SOPs were especially produced for just such measurements. If you just want to state that measuring is difficult, keep it simple and rephrase this sentence.
L400 In contrast instead of Conversely
L405 western (no capital)
L406 use instead of propose (you have used it, right? Not just proposed it)
L422-425 I would be helpful to show a map with the locations of the basins and the basin boundaries. The readers do not have any info on that. Also geographical names of seas that occur in the manuscript should be shown there. Also the info in the first paragraph of §5.2 cannot be appreciated without a chart.
L436 The 250 km radius for cross-overs was taken from in the GLODAP effort, right? Because of the elevated variability in the Med Sea, the criteria for the vertical range were modified, which I agree with. However, I would also expect that the horizontal criteria for cross-overs be modified, because of the elevated variability which also occurs on horizontal spatial scales. Please justify why you did not do it. Please show with examples what the differences between cross-overs over 250 km and smaller radii could be.
L457 sub-basins (+s)
L472-474 “the CARIMED synthesis comprises historical and recent cruises collected using evolving methodologies, SOPs, and quality assurance procedures. Many early cruises were conducted before the availability of RMs for biogeochemical variables” This is obviously your argument for less stringent limits of adjustments. However, exactly the same holds for GLODAP with recent and historical cruises, so your argument is not valid.
L513-515 “All corrections were rigorously inspected to ensure they did not remove true temporal trends or natural variability before being applied.” Which trends did you detect and use? Please explain and possibly make a list of such trends, including references
L521-524 Please skip “units” after the numbers as practical salinity does not have a unit
L537-538 “sometimes exceeding the adjustment limit by an order of magnitude” Is it still valid to use adjustments for DIC and TA of this magnitude? I do not think so. If the data is so far off the scale, there are probably also other problems with the measurements or with data processing. There are additional good arguments needed to justify the inclusion in the data product of such data.
L538-540 Findings of an intercomparison are mentioned. Were results of that study already included in the judgement on the CARIMED data? If not, this should be rephrased and made clearer. As it is now, this does not add any more confidence to the data.
Figures in the Supplementary Material are partly, for example S6 and S7, of lesser quality and hard to read completely.