Coastal Ocean Data Analysis Product in North America (CODAP-NA) - An internally consistent data product for discrete inorganic carbon, oxygen, and nutrients on the North American ocean margins

. Internally-consistent, quality-controlled data products play an important role in promoting regional to global research efforts to understand societal vulnerabilities to ocean acidification (OA). However, there are currently no such data products for the coastal ocean where most of the OA-susceptible commercial and recreational fisheries and aquaculture industries are located. In this collaborative effort, we compiled, quality controlled (QC), and synthesized two decades of discrete measurements of inorganic carbon system parameters, oxygen, and nutrient chemistry data from the North American 40 continental shelves, to generate a data product called the Coastal Ocean Data Analysis Product for North America (CODAP-NA). There are few deep-water (>1500m) sampling locations in the current data product. As a result, cross-over analyses, which rely on comparisons between measurements on different cruises in the stable deep ocean, could not form the basis for cruise-to-cruise adjustments. For this reason, care was taken in the selection of data sets to include in this initial release of CODAP-NA, and only data sets from laboratories with known quality assurance practices were included. New consistency 45 checks and outlier detections were used to QC the data. Future releases of this CODAP-NA product will use this core data product as the basis for cruise-to-cruise comparisons. We worked closely with the investigators who collected and measured these data during the QC process. This version (v2021) of the CODAP-NA is comprised of 3,391 oceanographic profiles from 61 research cruises covering all continental shelves of North America, from Alaska to Mexico in the west and from Canada to the Caribbean in the east. Data for 14 variables (temperature; salinity; dissolved oxygen concentration; dissolved 50 inorganic carbon concentration; total alkalinity; pH on the Total Scale; carbonate ion concentration; fugacity of carbon dioxide; and concentrations of silicate, phosphate, nitrate, nitrite, nitrate plus nitrite, and ammonium) have been subjected to extensive QC. CODAP-NA is available as a merged data product (Excel, CSV, MATLAB, and NetCDF, doi:10.25921/531n-c230, https://www.ncei.noaa.gov/data/oceans/ncei/ocads/metadata/0219960.html)


Introduction
Anthropogenic ocean acidification (OA) refers to the process by which the ocean's uptake of excess anthropogenic atmospheric carbon dioxide (CO2) reduces ocean pH and calcium carbonate mineral saturation states (Feely et al., 2004;Orr et al., 2005;Jiang et al., 2019;IPCC, 2011). OA is making it more difficult for marine calcifiers to build shells and skeletal structures and is endangering coral reefs and other marine ecosystems Gattuso and Hanson, 2011).
Coastal ecosystems account for most of the economic activities related to commercial and recreational fisheries and aquaculture industries, supporting about 90% of the global fisheries yield and 80% of known species of marine fish (Cicin-Sain et al., 2002). Studies have shown that OA has the potential to significantly impact both the fisheries and aquaculture industries, and change the way humans make their living, run their communities, and live their lives in coastal regions around the world (Cooley and Doney, 2009;Barton et al., 2012Barton et al., , 2015. The Global Ocean Data Analysis Project (GLODAPv2) offers an internally-consistent data product for discrete samplingbased, open-ocean carbonate chemistry, nutrient chemistry, isotopes, and transient tracer data (Olsen et al., 2016; 70 2020), allowing for a slew of new research products related to OA and its temporal trends in the global ocean (e.g., Jiang et al., 2015a;Gruber et al., 2019;Lauvset et al., 2020). While there are several coastal surface water partial pressure of CO2 (pCO2) data products and climatologies (e.g., Bakker et al., 2016;Laruelle et al., 2017;Roobaert et al., 2019;Takahashi et al., 2020), internally-consistent data products for water column carbonate and nutrient chemistry data in the coastal ocean currently do not exist. Such products would contribute significantly to our understanding of the current status 75 of OA and its temporal trends, and help guide OA mitigation and adaptation efforts in coastal oceans.
The impact of OA on North American ocean margins is expected to vary significantly from region to region, with distinct regional drivers amplifying or mitigating overall coastal acidification. Anthropogenic carbon dioxide (CO2) invasion has been identified as the primary driver of open ocean acidification over decadal time scales, but coastal ocean acidification is 80 influenced by many other physical, biological, and anthropogenic processes that can oppose or amplify the anthropogenic CO2 uptake. The continental West Coast (WC) and East Coast (EC) are in two vastly different ocean basins (Pacific vs. Atlantic) with different amounts of net organic matter remineralization in deeper waters flowing along the path of the Global Thermohaline Circulation (Broecker, 1991;Feely et al., 2008;Jiang et al., 2010;Wanninkhof et al., 2015). In the surface ocean, latitudinal variation of sea surface temperature (SST) and the ratio of dissolved inorganic carbon (DIC) to total 85 alkalinity (TA) result in significantly different pH and calcium carbonate mineral saturation states between the Alaska Coast and Gulf of Mexico (Jiang et al., 2019;Cai et al., 2020). Upwelling can bring deep waters with corrosive OA chemistry (resulting from large respiratory CO2 loads) to the surface, while onshore surface flow can bring less-corrosive open ocean waters to the coastline (Hales et al., 2005;Feely et al., 2008Feely et al., , 2016. Riverine input of low-pH water is found to intensify OA shoreward of the shelf break on the EC (Hunt et al., 2011;Xue et al., 2016). However, riverine water composition also varies 90 significantly and the Mississippi River is a source of high-TA water to the Gulf of Mexico Stets et al., 2014;Gomez et al., 2020). Eutrophication (enhancement of biological production of organic matter through addition of nutrients) causes high pH and calcium carbonate mineral saturation states in surface waters of the coastal ocean, and can lead to subsurface hypoxia (via subsequent respiration of that production), which is associated with low pH and calcium carbonate mineral saturation (Borges and Gypens, 2010;Cai et al., 2011;Laurent el al., 2017;Feely et al., 2016Feely et al., , 2018. The lack of OA 95 synthesis efforts on North American ocean margins hampers our understanding of the geographic pattern and relative regional progression rates of OA along these coastlines (Cai et al., 2020).
Carbonate data in the coastal ocean are often collected by multiple laboratories with different methods and instruments.
Many of the data sets may have never been shared with any major data centers, nor have these data sets gone through 100 rigorous quality control (QC) and inter-comparison analyses. The lack of observations in intermediate and deep water (water depth >1500 m) makes it challenging to adjust the data based on constancy of parameters in deep water (i.e., cross-over analyses) as is done for the open ocean (Lauvset and Tanhua, 2015). All these factors contribute to the lack of internallyconsistent data products for these important coastal environments. In this study, we compiled and QCed discrete samplingbased data for inorganic carbon, oxygen, and nutrient chemistry, and hydrographic parameters collected from the entire 105 North American continental shelves. We serve both the internally-consistent climate quality data product, as well as the QCed original cruise data through the NOAA National Centers for Environmental Information (NCEI). This effort will promote future OA research, modeling, and data synthesis in critically important coastal regions to help advance the OA adaptation, mitigation, and planning efforts of North American coastal communities. While we only partially address limitations associated with the lack of deep and intermediate data in this study, we do produce a data product that can be 110 used as the basis to address these limitations and incorporate additional coastal cruises going forward. We hope this release will be considered analogous to GLODAPv2 (Olsen et al. 2016), in the sense that the new data sets added in the subsequent GLODAPv2.2019 and .2020 updates (Olsen et al., 2019; were brought to be internally consistent with the fully quality-controlled data in the original GLODAPv2 product.

115
From a geopolitical perspective, the term "continental shelf" is defined as the region between the coastline (excluding estuaries) and a distance of 200 nautical miles (~370 km) offshore. While this definition is not as mechanistic as one based on a change in bathymetric gradient or a hydrographic condition such as chlorophyll or salinity levels, it is regionally and seasonally invariant, and captures the full extent of coastal influences . This version of the data product is focused on the continental shelves of the North American (NA) coasts (Figure 1), including: 120 -Alaska Coast (AC) -including the large marine ecosystems (LMEs) of Gulf of Alaska, East Bering Sea, Northern Bering-Chukchi Seas, and Beaufort Sea (see Sherman et al., 2009 for more information on the LMEs).
-West Coast (WC) -including the LMEs of California Current and Gulf of California.
-East Coast (EC) -including the LMEs of Northeast U.S. and Southeast U.S. continental shelf regions.

Parameters / variables
For the current version of the CODAP-NA, inorganic carbon system parameters, oxygen, nutrients, and related hydrographic parameters were included (Table 1). CTDPRES, CTDTEMP, CTDSAL, and CTDOXY were commonly measured with 135 pressure, temperature, conductivity, and oxygen sensors, respectively, mounted on a CTD rosette. In some cruises with surface samples collected from flow-through systems, temperature and salinity were also provided in columns reserved for CTDTEMP and CTDSAL, respectively. Water samples were collected and measured onboard or later in a shore-based laboratory for discrete salinity, discrete dissolved oxygen concentration (DO), dissolved inorganic carbon concentration (DIC), total alkalinity (TALK), pH, carbonate ion concentration ([CO3 2-]), fugacity of carbon dioxide (fCO2), and 140 concentrations of silicic acid, phosphate, nitrate, nitrite, nitrate plus nitrite, and ammonium. For discrete pH on the Total Scale, [CO3 2-], and fCO2, both measured and calculated values were presented. Saturation states of aragonite (Warag) and calcite (Wcalc) could only be calculated. The carbonate system calculations were conducted using the MATLAB version 3.01 (Sharp et al., 2020) of the CO2SYS program (Lewis and Wallace, 1998), with the dissociation constants for carbonic acid of Lueker et al. (2000), bisulfate (HSO4 -) of Dickson (1990), hydrofluoric acid (HF) of Perez and Fraga (1987), and with the 145 total borate equations of Lee et al., (2010

150
CODAP-NA was focused on chemical oceanographic data (inorganic carbon system parameters, oxygen, and nutrients) collected from discrete sampling-based observations. This also included discrete samples taken from shipboard flow-through systems rather than solely water collected in sampling rosette bottles. Carbon parameters recorded from continuous underway measurements by inline analytical instruments were excluded, as they had been QCed and included within the Surface Ocean CO2 Atlas (SOCAT) (Bakker et al., 2016). The same was true for carbon parameters from time-series 155 moorings. Data from large open estuaries (e.g., Salish Sea, Chesapeake Bay, Bay of Fundy) are excluded during this first round of analysis, but these are among the data that may be able to benefit from secondary QC against CODAP-NA. When a cruise spans ocean margins and also contains a subset of measurements within estuaries, the estuarine data from that cruise is retained for this data product.

160
We started with the highest quality coastal data sets to define a protocol for consistent QC and inter-comparison, which will subsequently be applied to other compiled coastal data sets. As a first step, only climate-quality discrete measurements (core data sets) with known quality and metadata from the Atlantic Oceanographic and Meteorological Laboratory, Pacific Marine Environmental Laboratory, University of South Florida, University of Miami, University of Alaska Fairbanks, University of New Hampshire, and University of Delaware were included (Table 2). These data sets will serve as a reference for QCing 165 future data sets.

175
Cruise data set quality control often involves two steps: primary QC and secondary QC (Tanhua et al., 2010). These steps should follow initial, sometimes called "0-level" QC which is performed for individual measurements based on instrument readings and observations collected during the analyses. Primary QC is the process of identifying outliers and obvious errors within an individual cruise data set using measurement metadata or approaches like property-to-property plots ( Figure 2). It should largely be done by the investigators responsible for the measurements. In addition, it is critical to provide additional 180 uniform primary QC to all cruises within a data product using common tools and common thresholds to help identify any issues that have been missed by the data producers. These issues are communicated back to the investigators so that the issues could be reviewed and, if necessary, addressed. This additional layer of primary QC is often performed by the data product synthesis community. Secondary QC is a process in which data from one cruise are objectively compared against data from another cruise or a previously synthesized dataset in order to quantify systematic differences in the reported values.

185
The secondary QC process often entails cross-over analysis (Lauvset and Tanhua, 2015), and increasingly regional Multiple Linear Regression (MLR) and inversions (Olsen et al., 2019;. Due to the scarcity of cross-over stations at depths where parameters were not likely to be influenced by temporal variations (sampling depth >1500 m, Olsen et al., 2020) on coastal cruises, secondary QC was not conducted for this version of the 190 CODAP-NA and no cruise-wide offsets or multiplicative adjustments were applied. Instead, the QC relied on (a) stringent criteria for the selection of data sources, and (b) an enhanced primary QC procedure with rigorous consistency checks. This version of the CODAP-NA only accepted data from laboratories with direct involvement in the CODAP effort and with a track record of producing high-quality data and following best practices, making secondary quality control less essential. It is likely that there are other very high-quality coastal cruise data sets that are not yet included in this version of CODAP-NA.

Figure 2. A diagram showing major steps of the quality control (QC) process. Note uncertainty is separated into outliers (scatter) and systematic offset (all data from the cruise has a bias). [CO3 2-] is carbonate ion concentration,
fCO2 is fugacity of carbon dioxide. Refer to Table 1 for the rest of the abbreviations.

200
We worked directly with the data providers who knew their data best to conduct these primary QC procedures in order to leverage all of the resources related to a measurement: details related to the methods, instrumentation, reference standards, access to the raw data, and the analysts' recollection of the measurements. As part of the QC process, comparisons were made between many combinations of measured values. For a subset of properties, inter-consistency calculations and 205 algorithm estimates based on other measurements allowed additional checks. Below are the 5 major steps of the QC procedures used for CODAP-NA (Figure 2). A new suite of QC tools is under development to allow these many comparisons and calculations to be performed quickly and efficiently, and these tools will be made available to the public soon with a separate paper dedicated to their rationales, development details, and instructions (Jiang et al., in prep.). A prototype version was used for CODAP-NA, though many software packages would, in principle, allow the comparisons and 210 plots we use.
Step One was to ensure all of the cruise data files were ingested into NCEI's archives and documented with a rich metadata record (Jiang et al., 2015b). Maintaining a cruise data table allowing future users of the data product to access the original data files is an important component of any synthesis effort. For this study, a table with key metadata is available through 215 this link: https://www.ncei.noaa.gov/access/ocean-acidification-data-stewardship-oads/synthesis/NAcruises.html. The following fields are listed in the table: A sequential number of the individual cruise data set (NO), expedition code (EXPOCODE), flags indicating the quality of the cruise (Cruise_flag, see Table 3), cruise identifier (Cruise_ID), Start_date, End_date, measured parameters, and links to NCEI's archive) . Table 3. Cruise flags used for this product.

Flag value Meaning
A These were dedicated OA cruises that were executed following Best Practices for global ocean work as outlined in Hood et al. (2011) and other documents as can be found on GO-SHIP site * . Colloquially these are referred to as GO-SHIP quality. Traceable standards and certified reference materials were used, and deep stations (> 2500 m) were sampled to allow using near-constant deep-water concentrations as anchor points. A third inorganic carbon system parameter, such as pH or carbonate ion concentration were often measured, allowing consistency checks.

B
These are dedicated OA cruises that had onboard inorganic carbon measurements performed according to Best Practices (Dickson et al. 2007), and many other parameters to highest accuracy through use of standards and certified reference materials. However, the cruises did not necessarily have all other parameters analyzed to highest standards, such as freezing nutrients for shoreside analyses; not taking oxygen and nutrients samples on most Niskins; not normalizing CTD oxygen trace to Winkler oxygen values, insufficient metadata etc. There often are insufficient deep stations to compare data with open ocean data.
C These were opportunistic cruises where OA parameters were measured in the water column. They include standard hydrographic, carbon, and OA parameters; T, S, O2, nutrients, TALK, DIC, pH. Many parameters, including carbon and OA parameters were measured shoreside; CTD oxygen data were not adjusted to Winkler oxygen values. Generally, no dedicated OA personnel were onboard.
D Underway samples only. These cruises had no CTD casts, and only had samples taken from the seawater supply line, with often a limited amount of other hydrographic parameters. T and S were obtained from thermosalinographs with limited or no salinity check samples.
( * https://www.go-ship.org/HydroMan.html) Step Two was to load the measurement values from the original cruise data files into MATLAB and conduct necessary 225 calculations ( Figure 2). All missing values were replaced with "-999" during this process. Variables without a QC flag from the original cruise data file were assigned an initial flag of 2 (good values, Table 4). Variables that were clearly out of range (e.g., a DIC value of < 0) were automatically assigned with a QC flag of "4" (bad values). The QC flags for all "-999" values or missing values were replaced with "9" (missing values). All bottle measurement flags with a corresponding Niskin_flag of 3 or 4 were replaced with the corresponding Niskin_flags. For example, if a discrete salinity measurement has a Salinity_flag of 2, but the corresponding Niskin_flag (QC flag of the Niskin bottle where the sample was drawn) is 3, the original Salinity_flag will be updated from 2 to 3.
Some surface samples from a few coastal cruises were collected from flow-through systems onboard research vessels, instead of Niskin bottles on sampling rosettes. In such cases, the temperature and salinity values were stored under the 235 CTDTEMP and CTDSAL columns, respectively, although they were not measured from sensors mounted on a CTD rosette.
Similarly, their sampling depth values were extracted from the metadata as the depth of the water inlet and stored under CTDPRES (Table 1). When water inlet depth information was not available, its sampling pressure was set to be 5 dbar.
There is a column named "Observation_type" in the CODAP data product file to indicate whether a sample is from a "Flowthrough" system or a "Niskin" bottle.

245
(d) conservative temperature, absolute salinity, sigma-theta; (e) recommended_Oxygen (f) apparent oxygen utilization (AOU); (g) recommended_Nitrate_and_Nitrite; (h) calculated pH, carbonate ion, and fCO2 at in-situ conditions using CO2SYS from DIC and TALK, along with 250 temperature, salinity, pressure, and nutrients; and (i) in-situ pH, carbonate ion, and fCO2 from their respective values at their measurement conditions. Sample_IDs were calculated from STATION_ID (station identification number), CAST_NO (cast number) and NISKIN_ID (Niskin identification) based on equation (1), if they were not already available: 255 Sample_ID = Station_ID × 10000 + Cast_number × 100 + Niskin_ID (1) For example, at station 15, the 2nd cast, a Niskin_ID of 3 will have a Sample_ID of 150203. In cases when they could not be calculated (e.g., Station_ID is non-numerical), Sample_ID was assigned as 1, 2, 3, … from the first row to the last row of the 260 original cruise data file.
Sampling depth (Depth) and pressure (CTDPRES) were calculated from one another where applicable using the equations of "gsw_z_from_p", and "gsw_p_from_z", respectively, from the International Thermodynamic Equation of Seawater 2010 calculated Depth values were used to replace the original Depth values.
The "recommended_salinity_PSS78" column was created by merging the discrete salinity and CTDSAL columns. Data were preferentially chosen from the discrete measurements provided their QC flags were equal to 2 or 6. If these values were not available, CTDSAL values with QC flags of 2 or 6 were chosen. In the absence of these two, discrete salinity measures with 270 QC flags other than 2 or 6 were chosen. Lastly, the CTDSAL values with other QC flags were chosen. The same principles were applied to merge the oxygen data. The merged discrete oxygen and CTDOXY data were stored in the column named "recommended_Oxygen. (Table 1).
Conservative temperature (Θ) is proportional to the potential enthalpy and is recommended as a replacement for potential 275 temperature (q), as it more accurately represents the heat content (IOC et al., 2010). Absolute Salinity (SA) is the mass fraction of salt in seawater (unit: g/kg) based on conductivity ratio plus a regional correction term as opposed to the practical salinity scale (SP, Practical Salinity Scale 1978, or PSS-78, unitless, based solely on the conductivity ratio) (Le Menn et al., 2018). Conservative temperature, absolute salinity, and sigma-theta were calculated using the equations of "gsw_CT_from_t", "gsw_SA_from_SP", and "gsw_sigma0", respectively, from the TEOS-10 (IOC et al., 2010). Apparent 280 oxygen utilization (AOU) was calculated based on absolute salinity, conservative temperature, latitude, longitude, CTDPRES, and recommended_Oxygen variable using the function "gsw_O2sol" as described in the TEOS-10 (IOC et al., 2010). Oxygen solubility is estimated with the combined equation from Garcia and Gordon (1992).
In order to measure nitrate, it is first reduced to nitrite and then this new nitrite is measured alongside the nitrite originally in 285 seawater (Hydes and Hill, 1985). The concentration of nitrite in ocean water is usually much lower than nitrate. When nitrite is not reported, it is often because its concentration is too low to be detectable. carbon parameter. When it was not available, DIC was used. If neither of them was available, TALK derived from salinity with the locally interpolated alkalinity regression (LIARv2) method was used for the adjustment from measurement to in-situ 300 conditions (Carter et al., 2018). Carbonate_insitu_calculated, pH_TS_insitu_calculated, fCO2_insitu_calculated, aragonite saturation state, calcite saturation state, and Revelle_Factor were calculated from DIC and TALK, along with insitu temperature, salinity, pressure, silicate, and phosphate using the same dissociation constants as above (Table 1). When either silicate or phosphate data were unavailable, their mean values during the cruise were used for the calculation. Samples with a salinity of less than 15 were excluded from this calculation, due to the potentially large uncertainties.

305
Step Three was to identify outliers. Outliers were determined by visual inspection. Two types of outlier identification were used for this effort: (a) a broad-scale outlier identification by visually examining the plot of a variable against its sampling depth and other property-to-property plots, and (b) a fine-scale outlier identification based on consistency checks. Here, consistency checks refer to both the "internal consistency checks", i.e., the comparison of a measurement with its calculated 310 value (e.g., spectrophotometrically-measured pH vs. pH calculated from other carbon parameters using CO2SYS), as well as validation checks, i.e., a measurement with one method against the same measurement made with a different method (e.g., oxygen measured from Winkler vs. a sensor, though in this case the oxygen profile is frequently adjusted to the Winkler titration values, so the measurements are not truly independent). For the broad-scale outlier identification we made plots of all variables against depth (or sigma-theta when only surface values are available), as well as these plots ( Figure 2

335
In addition, the values for dissolved oxygen, DIC, TALK, Silicate, Phosphate and Nitrate were also calculated from existing estimation algorithms (e.g., Carter et al., 2018). These estimates were then compared against the measured CTDOXY, Oxygen, DIC, TALK, Silicate, Phosphate and Nitrate, respectively, to help assess whether cruise-to-cruise biases exist ( Figure 2). These algorithms are intended primarily for open-ocean estimation. They are used in the coastal environment only to call attention to measurements that require additional QC, and never to directly assign flags.

340
For all the aforementioned plots, we enable features to go through each profile individually with all data from a cruise plotted together in the background. Similarly, we are able to go through each cruise individually with all data from all cruises plotted together in the background. These approaches allow us to detect systematic offsets.
Step four was to append all of the individual cruise data files one after another into one data product file with all of the variables as listed in Table 1. All rows with a Niskin_flag of "4" (Table 4) were removed. Data values with QC flags that were not 2 (good), 3 (questionable), or 6 (average of duplicate measurements) were replaced with "-999", and their corresponding QC flags were changed to "9". For surface samples collected from flow-through systems, their Cast_numbers and Niskin_IDs were all set to "-999", and their Niskin_flags were all set to "9". The contents of Observation_type were 350 standardized to be either "Niskin" or "Flow-through". The merged data product file was further QCed by plotting all of the non-missing values for each variable. These plots were examined further, with focus on the outliers falling out of 2.5 times their respective standard deviations. Average of duplicates 9 Missing value

Data products
The data product is available in Excel, CSV, MATLAB, and NetCDF formats at NOAA/NCEI with a DOI of 360 [10.25921/531n-c230] and NCEI Accession Number of [0219960] (Jiang et al., 2021). All parameters in Table 1, along with their Cruise_flags (Table 3) and primary level QC flags (Table 4) are presented. The chosen primary level QC flag convention is the same as the GLODAPv2 project (Olsen et al., 2020). Note the difference between the WOCE primary level QC flags (e.g., 2, 3, 4, 9, etc.) and the Secondary QC flags as used by the GLODAPv2 (a choice of either 0 or 1). In the current version (v2021) of the CODAP-NA, there are 3,391 discrete chemical oceanographic profiles, and a total of 28,206 365 data points. They were collected on 61 cruises in the ocean margins of North America from December 6, 2003 to November 22, 2018. There are on average eight sampling depth levels (a median of seven) for each profile. The total count of data points for each parameter and their minimum, maximum, and mean values are listed in Table 5.

370
Refer to Table 2 for their full parameter names and units. Of the 3,391 profiles, 2,869 have both DIC and TALK measurements, thus the full list of carbonate system parameters (pH, fCO2, [CO3 2-], aragonite saturation state, calcite saturation state, and Revelle Factor) can be calculated (Figure 3). In addition,

375
there are 1,501 profiles with discrete pH measurements from a spectrophotometer-based method (Byrne and Breland, 1989;Clayton and Byrne, 1993;Dickson, 1993), 412 profiles with discrete carbonate ion measurements (Byrne and Yao, 2008;Sharp and Byrne, 2019), and 278 profiles with discrete fCO2 measurements (Wanninkhof and Thoning, 1993). There is also good coverage of oxygen and nutrients measurements (Figure 3).  Table 1  One major difference between the CODAP-NA and the GLODAPv2 is the shallower sampling depths of the former ( Figure   4). About 80% of the 3,391 profiles have a maximum sampling depth of < 250 m, and 30% of them have maximum sampling depth of < 25 m, with a lot of them being surface-only measurements. Only 195 profiles (< 6% of the total 3,391 390 profiles) have at least one sampling depth level below 1500 m, which has commonly been used as a threshold for subsurface cross-over analyses (Figure 4).  Another distinctive feature of coastal oceans is their large magnitude of seasonal variation. For a lot of parameters, their seasonal variation, along with the diel and intertidal variations often eclipse their long-term variation. Understanding the seasonal variation and de-seasonalizing the observation data are often critical steps in the process of deciphering the longterm change. Like most data products, this version of the CODAP-NA is summer-and fall-biased, with spring, summer, fall and winter having 676, 1554, 1059, and 102 profiles, respectively ( Figure 5). All coasts have good summer data coverage,

405
but the only area with meaningful winter data coverage is the northeastern U.S. coast ( Figure 5, Table 6).

Longitude
Latitude To demonstrate the large seasonal amplitude (defined here as the difference between the maximum and minimum values of a variable on an annual cycle) in the study area, an analysis was conducted to group surface stations (with at least one sampling depth < 25 m) that are within 1   To present a rough estimate of the measurement uncertainties of these variables, a similar approach was used to group deep 430 water stations with a maximum sampling depth of >1500 m. Due to the scarcity of deep-water stations, a radius of 10 km and 200 m depth difference were used to find the comparison pairs. This analysis is limited to certain cruises with deep water sampling (~5% of the data) only, thus the uncertainty estimates only hold true for these "reference" cruises, mostly with a cruise flag of A (Table 3). They do not apply to the rest of the cruises. Results show that the DIC and TA uncertainties (0.1% and 0.2%, respectively) are about the same as previously reported by the GLODAPv2 group (Figure 7, Table 7) by this 435 metric. Some variables like Nitrite and Ammonium show uncertainties as large as ~70% with this mertic due, primarily, to the low average values of these measurements at depth. The average CTDTEMP precision of 0.06 °C is significantly higher than that of 0.01 °C as previously reported for the GLODAPv2 (Olsen et al., 2020). The measurement uncertainties could be overestimated, because this analysis includes natural gradients due to the large radius and depth differences, as well as any temporal changes within the 1 to 10 years (average 6 years) period.  For aragonite and calcite saturation states, uncertainty comes primarily from the use of an empirical equation to approximate the real-world apparent solubility product (Ksp'). Despite the 3% number shown in Table 7, the real uncertainty of aragonite and calcite saturation states is likely >5% (Mucci, 1983;Jiang et al., 2015a;Orr et al., 2018). Best practices for oceanic carbonate system calculations have been recommending the dissociation constants of Lueker et al., (2000) (Dickson et al., than expected with respect to calcium carbonate mineral (CaCO3) saturation states (Sulpis et al., 2020). This applies to many Alaska coast stations. In brackish water (salinity < 20), the relative uncertainty in carbonate ion concentration is worse than that in open ocean water (Dickson et al., 2007;Orr et al., 2018). In addition, due to the way calcium concentration is derived 465 in the CO2SYS (Riley and Tongudai, 1967;Millero, 1995), the calculated saturation states could suffer from uncertainties up to 12% for not directly measuring the calcium concentration in certain very low-salinity regions (Beckwith et al., 2019;Dillon et al., 2020).

Figure 8. Comparison plots of dissolved oxygen measured from sensors mounted on CTD (CTDOXY) and dissolved
470 oxygen that is measured from Winkler titration.

480
Note the above uncertainty analyses are based on deep water stations only, and these data are usually collected from cruises with a Cruise_flag of A or B (Table 3). The uncertainties of data points from cruises with Cruise_flags of C, and D are expected to be much larger. Internal consistency checks of measured versus calculated values and validation checks of values measured using different methods show that differences increase quickly towards the surface (Figure 8-11). Some surface conditions in the coastal ocean. We contend that these Winkler and CTD values are likely "good" data from the measurement point of view, so, for such instances, the QC flags are kept as "2", despite their poor internal consistency.

490
The Coastal Ocean Data Analysis Product for North America (CODAP-NA) is available as a merged data product in the formats of Excel, CSV, MATLAB, and NetCDF [doi:10.25921/531n-c230, NCEI Accession: 0219960], and can be accessed with the link: https://www.ncei.noaa.gov/data/oceans/ncei/ocads/metadata/0219960.html (Jiang et al., 2021). An Excel spreadsheet listing all of the QC related changes is also included as part of the data package. The original cruise data files 495 have also been updated with data providers' consent and summarized in a table with the link: https://www.ncei.noaa.gov/access/ocean-acidification-data-stewardship-oads/synthesis/NAcruises.html.

500
In this study, we relied on consistency checks performed in direct collaboration with the data providers who originally collected and measured the samples to QC and synthesize two decades of discrete measurements of inorganic carbon system parameters, oxygen, and nutrient chemistry data from North America's coastal oceans. The generated data product is called

510
as an additional measurable parameter of the seawater CO2 system (Byrne and Yao, 2008;Sharp and Byrne, 2019).
Uncertainty analyses suggest that cross-over adjustments could be applied to future coastal data QC. All major coastal cruises in the future are recommended to take deep water samples (>1500 m) when feasible, ideally at agreed-upon reference stations for QC purposes.