CARINA TCO 2 data in the Atlantic Ocean

Water column data of carbon and carbon-relevant hydrographic and parameters from 188 cruises in the Arctic, Atlantic and Southern Ocean have been retrieved and merged in a new data base: the CARINA (CARbon IN the Atlantic) Project. These data have gone through rigorous quality control (QC) procedures to 5 assure the highest possible quality and consistency. Secondary quality control, which involved objective study of data in order to quantify systematic di ﬀ erences in the reported values, was performed for the pertinent parameters in the CARINA data base. Systematic biases in the data have been corrected in the data products. The prod-ucts are three merged data ﬁles with measured, adjusted and interpolated data of all 10 cruises for each of the three CARINA regions (Arctic, Atlantic and Southern Ocean). Ninety-eight cruises were conducted in the “Atlantic” deﬁned as the region south of the Greenland-Iceland-Scotland Ridge and north of about 30 ◦ S. Here we report the details of the secondary QC which was done on the total dissolved inorganic carbon (TCO 2 ) data and the adjustments that were applied to yield the ﬁnal data product in the 15 Atlantic. Procedures of quality control – including crossover analysis between stations and inversion analysis of all crossover data – are brieﬂy described. Adjustments were applied to TCO 2 measurements for 17 of the cruises in the Atlantic Ocean region. With these adjustments, the CARINA data base is consistent both internally as well as with GLODAP data, an oceanographic data set based on the WOCE Hydrographic Program 20 in the 1990s, and is now suitable for accurate assessments of, for example, regional oceanic carbon inventories, uptake rates and model validation. Data coverage and parameter measured


Introduction
CARINA is a database containing inorganic carbon, alkalinity and relevant associated data such as temperature, salinity inorganic nutrients and oxygen from hydrographic cruises in the Arctic, Atlantic and Southern Oceans. The project started as an informal, 10 unfunded project in Delmenhorst, Germany, in 1999 during the workshop on "CO 2 in the North Atlantic", with the main goal to create a uniformly formatted database of carbon and carbon relevant variables in the ocean to be used for accurate assessments of oceanic carbon inventories and uptake rates. The collection of data and the quality control of the data have been a main focus of the CARINA project. During the project, 15 both primary and secondary QC of the data have been performed. Primary QC is the process whereby the quality of the data is assessed to be reasonable, based on general knowledge of the data and known trends in the Atlantic. Secondary QC assesses the quality of the data based on more advanced knowledge of parameters affecting the data and usually requires further analysis. This report describes the consistency 20 analysis of total dissolved inorganic carbon (TCO 2 ) for the Atlantic Ocean part of the CARINA database. A more comprehensive description of the complete CARINA data base can be found in Key et al. (2009) as well as the other papers in this special issue. The CARINA database consists of two parts: the first part is the individual cruise files containing all the original data as reported by the measurement teams including, in 25 many cases, the quality flags originally assigned to the data. These files are in the World Ocean Circulation Experiment (WOCE) Hydrographic Program Office (WHPO) 3 exchange format where the first lines consist of the condensed metadata. There are no calculated or interpolated values in the individual cruise files, and no adjustments have been applied to any of these values.
The second part of CARINA is three merged data sets, one for each of the Atlantic Ocean (NA), Arctic Mediterranean Seas (AMS), and Southern Ocean (SO) regions. 5 These files constitute the whole CARINA data set which has been modified from the original data set in the following ways: it includes interpolated values for nutrients, oxygen, and salinity when those data were missing and the criteria for interpolation described in Key et al. (2009) was met. It also includes calculated carbon parameters when possible (e.g. if pH was missing but TCO 2 and Total Alkalinity (TALK) were 10 measured, pH was calculated). Calculations were made using the Matlab ® version of the CO2SYS Program (van Heuven et al., 2009), using the sulfuric acid constant of Dickson (1990), the hydrofluoric acid constant of Dickson and Riley (1979) and the carbonate constants of Mehrbach et al. (1973) as refitted by Dickson and Millero (1987). Calculated and interpolated values have been given the quality flag "0" to distinguish 15 them from measured data. Finally, most parameters in the merged data files have been adjusted according to the corrections described in Sect. 4. The parameters that were considered for adjustment are salinity, TCO 2 , TALK, pH, O 2 , nutrients and CFCs.
Other parameters, such as 14 C, 13 C and SF 6 , which were present in the individual cruise files, have not been included in the secondary QC procedures and are included 20 in the merged data files as is.

Data Provenance and Structure
The CARINA database includes data and metadata from 188 oceanographic cruises/campaigns, of which 5 entries consist of multiple cruises. The Atlantic Ocean subset of the CARINA data set (CARINA-ATL) consists of 98 cruises/entries, of which 25 one is a time series, and two are collections of multiple cruises over several years within the framework of a common project. Five of these cruises are in common with 4 the Southern Ocean (SO) region, and five are in common with the Arctic Mediterranean Seas (AMS) region. These overlapping cruises ensure consistency between the three regions of the CARINA data set. Additionally, six reference cruises are included in the secondary QC for CARINA-ATL to ensure consistency between CARINA and historical data bases, i.e. GLODAP.

5
The Atlantic Ocean region of CARINA is loosely defined as the area between the Greenland-Iceland-Scotland Ridge and 30 • S, but several cruises overlap with the surrounding regions, thus extending the spatial area covered. Figure 1 shows the position of all hydrographic stations in CARINA-ATL. Figure 2 shows the geographical distribution of the TCO 2 measurements whereas Fig. 3 represents the distribution of the 10 TCO 2 measurements over the years. As can be seen from Figs. 1 and 2, most of the data are from the Subpolar North Atlantic. Large data gaps exist for the Tropical and South-Eastern part of the Atlantic Ocean. Figure 3 shows that although the CARINA-ATL database spans almost three decades from 1978 to 2006, the majority (71%) of the TCO 2 measurements were made from the mid 1990's to the mid 2000's. Over-15 all, TCO 2 is measured at 57% the stations occupied on the cruises, about the same as TALK compared to about 80% for salinity, oxygen and nutrients. Of note is that chloro-fluoro carbon (CFC) data are particularly abundant for some regions. The individual cruises/campaigns are uniquely identified by a string of 12 characters called an EXPOCODE. The first 2 characters represent a 2-digit number identifying 20 the country code of the research vessel. They are followed by a two-character platform code uniquely assigned by the National Oceanographic Data Center (NODC, see www. nodc.noaa.gov). The final eight characters denote the date of departure from port in the format YYYYMMDD. For instance, the EXPOCODE 06MT20040311 refers to a cruise conducted on the German (06) ship Meteor (MT) which departed on 11 March 2004.

25
The expocodes of the cruises used in CARINA-ATL are listed in Table 1. The table shows that a large number of nations and research ships were involved in the collection of the data over the years. Table 2 in Tanhua et al. (2009b) provides listing of ships and nations involved in the cruises. Several of these cruises were part of multi-cruise and 5 Printer-friendly Version Interactive Discussion multi-year, nationally and internationally funded projects. Table 1 contains the values of the adjustments which were agreed upon by the participants of the North Atlantic group of the CARINA project and which were applied to the original data to obtain the merged data product. This report presents the motivation for the TCO 2 adjustments.

5
The TCO 2 measurements are a key parameter in the CARINA effort such that we provide a short description of analysis methodologies. Prior to the mid-1980s, TCO 2 was determined by potentiometric titration with acid as part of alkalinity titrations. The TCO 2 was determined from the amount of acid needed to go from the first to the second inflection point in the titration curve (Bradshaw et al., 1981). While the method 10 is precise, the accuracy is poor (±20 µmol kg −1 ). The second approach was to acidify a small aliquot of sample and measure to evolved CO 2 by a gas chromatograph (Takahashi, 1983) or infrared analyzer. This approach was limited by the accuracy by which the small sample (∼0.5 ml) could be dispensed. Adaptations of these methods are currently employed in underway and autonomous systems (Wang et al., 2007) 15 The accuracy of TCO 2 measurements greatly improved at the start of the WOCE Hydrographic Program (WHP) because of several major developments. An integrative analysis method was perfected based on coulometry and these analytical systems were commercially produced (UIC, Inc.). An accurate inlet and dispensing system was developed, called a Single Operator Multi-parameter Metabolic Analyzer (SOMMA) 20 with a high degree of automation facilitating relatively rapid sample throughput of a sample every 15 to 20 min (Johnson et al., 1993). These systems were provided to all investigators of the CO 2 program of WHP funded by the US Department of Energy (DOE). This meant rapid and uniform adaptation to this technology. As part of the DOE effort a handbook of best practices was developed providing guidance on proper 25 sampling, analysis, and data reduction techniques for inorganic carbon system analyses (DOE, 1994). Finally, the accuracy of the measurements were greatly improved by  (Dickson et al., 2003) that were provided to all investigators making measurements on the WHP cruises free of charge or at nominal costs. With adaptation of the protocols and use of the new instrumentation, the accuracy of the measurements increased by 5 to 10-fold and reported accuracies of 1-2 µmol kg −1 are now common.

5
The CARINA data set for TCO 2 has a large number of cruises that benefited from these improvements. When the information is available, Table 1 indicates which cruises had analyses that were referenced to CRM values and which cruises used a coulometer, listed as Coul, for analyses, or a SOMMA for extraction of CO 2 from the sample. In the latter case, a coulometer was always used for analyses. Of the 26 cruises which 10 are known to have used CRMs and have enough data, only 5 needed adjustments to reach consistency with the other cruises.

Computational analysis approach
The main goal of the CARINA project was to gather all available hydrographic cruise data for the Atlantic, Arctic and Southern Ocean and using secondary quality control 15 (QC) procedures, determine a set of corrections, or adjustments, per parameter. These adjustments are applied to the cruises to generate a self-consistent data set.
A first level of QC (primary QC) was applied as part of collating all cruises into the Atlantic, Arctic and Southern Ocean data set. This involved correcting for obvious reporting errors and outliers (Tanhua et al., 2009). The second level of QC (secondary 20 QC), which involves determining the adjustments to make the TCO 2 values consistent for the data set, was highly automated using custom designed software and is described in detail in Tanhua et al. (2009). The basic criteria if TCO 2 values need to be adjusted is based on comparison of stations of different cruises where they overlap or cross each other in space. This is called the crossover analyses. An inverse least 25 squares routine was applied to all Atlantic crossover data and the deviation of the data from the least squares solution was determined. To assign adjustments, it was a priori assumed that the cruises would be biased with a constant offset. That is, the methods 7 do not lend themselves to determining trend in biases with, for instance depth, or with time along a cruise track. Only data which were collected below ∼1500 m and in the same oceanographic region (Atlantic) were compared to each other in order to minimize the effect of natural variability in the studied parameter. In the crossover approach, two cruises are com-5 pared if they have at least 3 stations with enough data below 1500 m within a radius of 222 km. For each crossover identified, an offset and its standard deviation were calculated. Thus each cruise had a set of offsets where it "crossed-over" other cruises.
Since each cruise can only have one potential correction applied to each parameter, a least-square method of determining the appropriate correction by matrix inversion 10 (Wunsch, 1996) was applied to our data sets as described in Johnson et al. (2001). Of the three least-square methods described in Johnson et al. (2001), only two were used here: the Weighted Least-Square (WLSQ) where weights are assigned to cruises in the inversion process, and the Weighted Dampened Least-Square (WDLSQ) in which, in addition to the weights assigned in the WLSQ method, limits are also set for the 15 corrections calculated by the procedure. More details about the methods used are provided in Tanhua et al. (2009). A set of 29 cruises was selected as core cruises because of the expected higher quality of their data, due to the use of CRMs and SOMMAs, as well as their geographical coverage. These core cruises were assigned a higher weight than the others in the inversion procedure to insure that the final CARINA-20 ATL data set be consistent with the data of highest quality.
An additional 6 cruises, designated as reference cruises, were taken from the Global Ocean Data Project (GLODAP) data set (Key et al., 2004), a similar project based on the WHP of the 1990s. These 6 reference cruises were incorporated in the CARINA-ATL database as core cruises to insure consistency with GLODAP but were removed 25 from the final data product. Core cruises are indicated in column 8 of Table 1 The result of the inversion procedure yields a set of corrections which, if applied to the individual cruises would be called adjustments and would minimize the offsets of the crossovers. In the case of TCO 2 , additive, rather than multiplicative, corrections were calculated. Since offsets between data sets would most likely be due to offsets in calibration standards, a constant offset was deemed more appropriate. Each correction 5 was thoroughly reviewed by the participants of the project, taking into account the quality of the crossovers, the quality of the data and other factors such as possible temporal or geographical variability to determine whether a correction was reasonable or not. It was agreed that, in general, corrections smaller than 4 µmol kg −1 would be within the uncertainty of these approaches and therefore not significant. In these cases, 10 no correction was applied and the correction listed as 0 in Table 1. Corrections greater than 4 µmol kg −1 which were deemed reasonable were rounded to the nearest integer.
When it could not be determined whether the offset was real or not, the adjustment was assigned a value of −888 in Table 1. The accepted corrections, referred to as adjustments, were then applied to the data 15 set and a new inversion was performed. The corrections generated by the "adjusted" data set were reviewed again before a final set of adjustments was assigned.

Results
All results and analyses made by the group for the crossovers and inversions, including figures for each individual crossover can be found on the CARINA website at http: 20 //cdiac.ornl.gov/oceans/CARINA/Carina inv.html . Table 1 lists the adjustments and their respective standard deviation (see Tanhua  et al., 2009) based on both methods. Figure 4 is a plot of these values as a function of cruise number. As can be seen on the figure, both the WLSQ and WDLSQ methods produced very similar results. In most cases, the adjustments significantly 25 lowered the differences between the cruises. Figure 5 shows the values of the offsets from the crossover analysis before and after applying the adjustments. As ex- pected, the vast majority of the offsets were reduced, indicating that the new data set is more self-consistent. The few offsets which became larger after adjustment are most likely related to the different weight assigned to some cruises. The columns labeled as WSLQ(adj) and WDSLQ(adj) in Table 1 show that the inversion procedure, when applied on the 'adjusted' data set, suggests lower corrections.

5
Out of the collection of cruises considered in CARINA-ATL, 20 did not have any deep DIC data and were assigned an adjustment value of −999; 16 had data which, for some reason, did not allow us to assign a meaningful adjustment value and were given an adjustment value of −888; 44, whose suggested correction was less than 4 µmol kg −1 , excluding the reference cruises, were not adjusted and their adjustment 10 value is therefore 0. The TCO 2 values for 17 cruises showed consistent offsets with the different approaches and were assigned a non-zero adjustment.
The value of the adjustments, whether it be 0 or not, were vetted by the group taking several factors into consideration. Some adjustments were quite obvious, as when a particular cruise showed a similar offset with all the cruises with which it crossed.

15
In these cases, the results of the inversions agreed with the crossover results from, not only the core cruises but the other ones as well. Other adjustments were not as simple and the results from the inversions, although taken as a starting point, were either accepted, rejected or modified based on all information available. The results are presented below and are categorized as a function of the type of analysis it required.

20
The sections below explain the values of the adjustments listed in the table, as well as the adjustment flags. The adjustment flags usually refer to the quality of the data used to make the adjustment. A value of 2 means that the quality is good and a value of 3 means that the quality is questionable, resulting in the data not being included in the merged data product. A flag of −999 indicates that there was insufficient data 25 to make a meaningful suggestion. A plot of all the offset values for each cruise is a diagnostic tool that was very useful and widely used in the determination of the corrections. Typical examples of such plots are shown in Fig. 6, each illustrating one of the different situations the cruise data could present and which are discussed below. Cruises which only contained calculated TCO 2 data were obviously not adjusted in the final data product and therefore do not have an adjustment value reported in  Cruises with no TCO 2 data were assigned a value of −999. In most cases, TCO 2 was not measured on these cruises. These cruises were removed from Table 1 but can be found on the CARINA data repository web site (http://carina.ifm-geomar.de). In other cases, they were shallow cruises and had no data below 1500 m, which was the 10 only data considered to do the analysis, or they were either short cruises or in remote geographical area and therefore did not have any stations which could be considered as a crossover.

Cruises with insufficient data or insufficient quality data to suggest an adjustment (−888)
15 Cruises with insufficient data or insufficient quality data could not be adjusted and were assigned a value of −888 in the adjustment table to indicate that. Since a successful crossover analysis between two cruises requires each cruise to have at least 3 stations meeting the requirements of a crossover (Tanhua et al., 2009), cruises with sparse data fell under that category. When data was insufficient, it was obviously not possible to 20 assign an adjustment flag so its value was set to −999. Two cruises had enough data but their quality was deemed too low to suggest an adjustment. These cruises were assigned a value of −888 for the adjustment but the adjustment flag was set to 3 to indicate the reason.

Cruises in regions too variable to suggest an adjustment (−888)
Cruises in regions too variable to suggest an adjustment were not given any adjustment, and a value of −888 was used in the table to indicate that. The adjustment flags have been given a value of 2 to indicate that the precision of the data is good but the geographical region of the cruise is naturally variable and the offsets ob-5 served between the cruises could be real. Therefore, the values generated by the WLSQ and WDLSQ methods are not applied and no adjustment is assigned (see Fig. 6b). There are three cruises in the CARINA-ATL data set for which it is the case: 58JH19920712 and 58JH19940723 (cruises #130 & 135) both occurred in the highly variable Greenland-Iceland-Scotland ridge area,whereas 06MT19970107 was located 10 in the Mediterranean outflow region.

Cruises which clearly show no offset with other cruises (0)
These cruises usually occurred in stable region and produced high quality TCO 2 data which compared well with other cruises in the same area, including the core cruises (see Fig. 6c). All offsets from the crossover analysis where below the 4 µmol kg −1 limit 15 and therefore, no adjustment was warranted.

Cruises which clearly show an offset with other cruises
Crossover analysis for these cruises showed a consistent offset, indicating that the TCO 2 data in question was clearly either too high or too low (see Fig. 6d). The inversion results also confirmed that assessment. From the offsets determined by the different 20 methods, an adjustment was proposed and agreed upon by the group. For these cases, the adjustments were rounded to the nearest integer.

Cruises with a different adjustment than the one suggested by inversions
Cruises with a different adjustment than the one suggested by inversions usually occurred in regions where some variability is expected. As a result, some crossovers showed fairly large offsets which skewed the inversion results. In general, the average offset with the core cruises was a good indicator as to whether the adjustment 5 was reasonable or not. An example of this is 18HU19920527 (cruise #37), which is a cruise which happened in the variable Labrador Sea. In most of the cases, though, the crossovers with the core cruises simply helped decide whether the inversion result was going to be rounded up or down.

06MT19960613, Cruise #14
Most of the crossovers indicated that the data needed an upward adjustment of ∼5 µmol kg −1 but the inversion suggested an adjustment of ∼9 kg mol −1 . The accepted adjustment is slightly lower than what the inversion suggested due to the fact that the average of all crossovers, as well as the results of the crossovers with the core cruises, 15 agreed with each other and were also lower than the inversion results. The original data for that cruise also compared favorably with OACES-1993 data. Therefore, it explains why the offsets with the core cruises largely determined the value of the recommended adjustment.

20
The average of all crossovers agree with the crossovers with the core cruises. The inversion calculation showed the TCO 2 measurements to be too high by about 5 µmol kg −1 but the offsets with the core cruises were consistently around 9 µmol kg −1 . So most crossovers showed a higher offset than the adjustment suggested by the in- version. Based on these results, the adjustment value was taken closer to the average of the crossovers and thus an adjustment of −9 is assigned.

33LK19960415, Cruise #84
The estimate of an adj. Was made difficult by the large scatter of the data. However, the few crossover results were all consistent with each other so that an adjustment 5 slightly lower than the inversion result was warranted.

35LU19950909, Cruise #95
There were few crossovers to base a decision on but they were consistent with each other. The accepted adj is higher than the suggested inversion value due to the crossover results with core cruises.

35TH19990712, Cruise #106
All crossovers agreed that the TCO 2 data were too low. The final adj is slightly higher than the one suggested by the inversion based on the crossovers with core cruises

64PE20000926, Cruise# 152
All crossovers agreed that the TCO 2 data were too low by about 8.5 to 9 µmol kg −1 .

15
Crossovers with just core cruises confirmed the value.

64TR19900701, Cruise #155
The TCO 2 data had a good precision and consistently showed an offset of about 7 µmol kg −1 with all other cruises. Comparison with the core cruises confirmed it.

64TR19910408, Cruise #157
Despite the variability of the crossovers' region, offsets consistently show that the TCO 2 data were low. The value for the average offset with the core cruises (−6.23 µmol kg −1 ) is in agreement with the inversion result, which justifies the chosen adjustment value.

74DI19900612, Cruise #170
5 Although the data used in the crossover analysis were not abundant, they were consistently high by about 7 µmol kg −1 . Crossover analysis with core cruises showed a slightly higher offset (∼8 µmol kg −1 ) but not as high as the inversion suggested. In view of the sparsity of the data and the crossovers' results, the lower range of the offsets was cautiously retained for the adjustment value.

29HE19920714 (reference cruise)
The inversion suggested a high correction of ∼−6 µmol kg −1 but the offsets with the core cruises were consistently low. In this case, the high correction value was driven by two crossovers involving the same stations from this cruise. This evidence suggested that the stations involved could be bad and therefore were disregarded.   Table 1. List of the cruises considered in CARINA-ATL and the associated information regarding their TCO 2 data. A question mark refers to information unavailable at the time this article was published. The last columns give the results of the different inversions performed as described in the text and the final adjustments which were applied to the data to produce the final self-consistent CARINA-ATL data set.        Table 1). An explanation of the difference between the two methods is provided in the text.  Table 1). An explanation of the difference between the two methods is provided in the text.

5.
Plot of the offsets in TCO 2 for all cross-overs before adjustments (black symbols) in ascending order from left to right. The red symbols are the offsets after adjustments were made.  5. Plot of the offsets in TCO 2 for all cross-overs before adjustments (black symbols) in ascending order from left to right. The red symbols are the offsets after adjustments were made. Fig. 6. Example of plots of all offsets for one cruise and their standard deviation as a vertical bar versus cruise number. The standard deviation is the deviation of the mean of all points below 1500 m used for the particular crossover comparison. Each plot is representative of a type of cross-over analysis result described in Sect. 4: (a) data is of insufficient quality, based on the large standard deviation, to estimate an adjustment, (b) the geographical region where the cruise took place is known to experience too much variability to suggest an adjustment, (c) The cross-over offset was well below the 4 µmol kg −1 cutoff and no adjustment is recommended and (d) the cross-over analysis showed a large and consistent offset with other cruises and an adjustment was applied.