the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
A high-resolution synthesis dataset for multistressor analyses along the US West Coast
Meghan Zulian
Sara L. Hamilton
Tessa M. Hill
Manuel Delgado
Carina R. Fish
Brian Gaylord
Kristy J. Kroeker
Hannah M. Palmer
Aurora M. Ricart
Eric Sanford
Ana K. Spalding
Melissa Ward
Guadalupe Carrasco
Meredith Elliott
Genece V. Grisby
Evan Harris
Jaime Jahncke
Catherine N. Rocheleau
Sebastian Westerink
Maddie I. Wilmot
Download
- Final revised paper (published on 11 Jan 2024)
- Supplement to the final revised paper
- Preprint (discussion started on 31 May 2023)
Interactive discussion
Status: closed
-
RC1: 'Comment on essd-2023-205', Anonymous Referee #1, 13 Jul 2023
This manuscript is a well-written description of a data compilation/synthesis effort for the California Current System (CCS). The main value added over existing data products is: 1) the inclusion of nearshore data sets that are missing in many of the larger-scale data compilation/QC products, and 2) the inclusion of data sets that explicitly address CCS temperature and O2 data along with the carbonate chemistry data that are the focus of the larger-scale data products I’ll discuss further below. I think these two things make this data compilation a valuable contribution that will advance the state of coastal multi-stressor work in the CCS after the paper’s weaker points are adequately addressed. However, I am a bit challenged by several aspects of the paper and data set in their current form.
Major concerns:
For starters, modelers and other scientists doing ocean acidification-related coastal analyses would likely still be best advised to use the existing Surface Ocean CO2 Atlas (SOCAT.info, see Dorothee Bakker et al. 2016 ref in ESSD) and Coastal Ocean Data Analysis Product for North America (CODAP-NA, see Li-Qing Jiang et al. 2021 ref in ESSD), which have compiled all global surface ocean CO2 data and coastal carbonate system data for North America, respectively. In my view, both of these projects provided a more rigorous secondary QC of the data, or at least a more detailed description of the secondary QC processes, while also providing the other benefits of the MOCHA data effort (consistent formatting and data treatment, etc.). This makes me wonder who the envisioned target end user group is for the MOCHA data products beyond scientists. To be clear, I can certainly see broad utility for the data, as amassed here, for the added nearshore data and T and O2 data. But if the end users are not scientists and capable with programming, I do think it might be useful to provide some additional data products that would be easy to create and archive at NCEI and much more accessible to less technically savvy end users. I say this because I was unable to open the nearly 3 GB data file in Excel on my computer. I was able to open it in R, but it was still very slow and cumbersome to use there. SOCAT and CODAP-NA provide a useful model for one way I’d imagine you might subset this data product: 1) one product including observations that reflect the surface conditions (where I would likely use 10-25 m as the depth cutoff, rather than 50 m as they used for their TA-S analysis), and 2) another smaller subset of observations that were depth-resolved and included a broader range of parameters (nutrients, chl, etc.). This would eliminate the need for millions of empty/“NA” entries, that I presume slow down operations with the data set in R (and make it inoperable in Excel [in my experience]). Further, the authors provided this massive data compilation, and the code they used to create the greatly reduced data summary, around which much of the discussion was written. I would suggest also directly providing this summary data product (along with the surface and full water column data subsets) via the NCEI webpage where the original data set is logged. I do also think that the authors need to discuss this data product in relation to the SOCAT and CODAP-NA data products and how they are related and pros/cons of each. This comment got long, so to recap, what I’d like to see addressed here are: 1) providing alternate data products to facilitate accessibility for various end users, including possibly splitting out surface and depth-resolved data product subsets and providing the data summary directly; 2) put this data product into the context of existing high-quality data products like SOCAT and CODAP-NA.
Second, as a person whose livelihood comes from producing data sets such as those included here, it was disappointing to not see reference to data providers and original data set DOIs (for those that have them) in Table 1 in the main paper. Yes, this information is in the very similar metadata file at NCEI, but in my experience, no one reads the metadata, so the major amount of work data providers do is not going to be appreciated/cited/acknowledged. It is important for the on-going funding and ability to do observations that data producers are able to find and report papers that rely on their data for subsequent publications. On a related note, I completely agree with the authors that there is a significant need for not just continued observational coverage, but expanded observational coverage, particularly for the carbonate system, in a future world with accelerating rates of change, marine carbon dioxide removal, etc., etc. To that end, it is critical that data creators get fair acknowledgement of their products. Consequently, it would be ideal to see all of the data sets appear as citations in the main article of this paper. I also encourage the authors to consider including a “fair data use statement” in their data availability section regarding the data product.
They can see the SOCAT statement here: https://socat.info/wp-content/uploads/2023/06/2023_SOCATv2023_Data_Use_Statement.pdf
And the GLODAP statement here (GLODAP is the open ocean data product that CODAP-NA was modeled after):
https://glodap.info/index.php/fair-data-use-statement/
To recap, 1) it would be nice to see better inclusion of main data provider information and DOIs in Table 1, along with citations for all data sets in the main manuscript if possible; 2) I encourage the authors to consider adding a “fair data use statement” that would encourage end users to cite both the MOCHA paper/data set AND authors of any major subset of the data used for follow on publications and information products, as appropriate, to help support the long-term stability of observational programs.
The TA-S analysis was not fully described or discussed. It was unclear why they would have used the upper 50 m rather than a smaller part of the upper water column, as other authors have done. Maybe they determined this experimentally, but how they arrived at this decision should at least be described. However, if one is expecting to discern the influence of freshwater, this should likely be a shallower depth range. Further, there were some really strange results—e.g., offshore of SF Bay mouth—that were not discussed adequately. Also, these results were not placed in the context of other publications by Andrea Fassbender (and references therein) or Kitack Lee. Along these lines, and because of the importance of salinity data to the carbonate system (as reflected by the TA-S work discussed just above), I will note that mention of salinity felt a bit inconsistent throughout (e.g. one place it was noted that DO and pH data were not included if they did not have accompanying temperature data, but it left me wondering—what about salinity data (lines 135-136)? Also on line 298 where T and DO are mentioned as having the widest coverage—presumably also S? I ask because low S events associated with flooding may also be associated with coastal multistressor events (e.g., potential importance in kicking off HAB events).
Finally, the “detailed metadata” file referred to in the text at NCEI I think is the one actually called “SubmissionForm_carbon_v1_428.xlsx” and is NOT the one called “MOCHA_dataset_metadata_table.csv”. This is confusing and should be clarified by either adding the name of the actual file intended to be referenced here the main text (probably in parentheses) or by asking NCEI to rename the file at NCEI to “detailed metadata…” (or whatever the final name used in the manuscript is).
Less major concerns:
--On a positive note: I do like the simple data QC flagging routine they used. If a major portion of end users are non-technical, this will greatly facilitate the uptake and correct use of this data product. That said, another benefit of directly providing the data summary product, beyond its vastly smaller file size is that it only includes the “reliable” data. So non-technical users should definitely be steered toward that sub-product.
--It would be nice to use the recommended/best practices column headers recommended by Jiang et al. 2022.
--Along similar lines, Jiang et al. 2022 recommend using different carbonate system coefficients and would be worth a look for future use. I do not believe there would be a noticeable difference in your results, so am not necessarily suggesting you re-do anything here, because you don’t submit or show the calculated parameters.
--Jiang et al 2022 also point out that units of µmol/kg refer to “substance content” rather than “concentrations,” which are in µmol/L units. This should be corrected in the “Submission form_carbon_v1_428.xlsx” at NCEI and in the text as well.
--In Table 1, is #68 a gridded data set? I got that impression, and if so, I’d argue it’s not appropriate to include here. The language should be clarified around this.
--It would be useful to state more decisively in the early text that the data were limited to within US border. It’s alluded to a few times, but because I happen to know that some of the data sets span the Canadian and/or Mexican border, as does the CCS, I didn’t initially catch it. Easy enough to justify.
--I don’t believe they mentioned which pH scale they used in the text, although it is in the “Submission form” file. Please add to the text, and for any original files that used a different pH scale, whether/how they converted to the same scale.
--In Table 1, ship names should be italicized and 2s in CO2 or O2 should be subscripted.
--Finally, as noted previously, I completely agree with the authors about the importance of the coastal multi-stressor observations, and particularly carbonate system observations, needing to be sustained or expanded rather than contracted, but there was an incorrect statement in the conclusions section regarding the NOAA West Coast Ocean Acidification (WCOA) cruises. Unfortunately it’s also mislabeled on the NCEI WCOA web page here: https://www.ncei.noaa.gov/access/ocean-carbon-acidification-data-system/oceans/Coastal/WCOA.html
Specifically the 2017 cruise was not a WCOA cruise. Rather it was a collaborative effort led by NOAA HABs scientists and an added cruise-of-opportunity for OA sampling. I think they also sampled OR on that cruise, but I haven’t looked at the data for a long time so the authors should double check this (the title said PNW, so I assume Oregon was included). However, NOAA did have another full US West Coast OA cruise in 2021. It was delayed from 2020 due to COVID. Thus, please edit that sentence to not state that WCOA cruises have contracted.
Minor concerns:
--The DOI in the abstract doesn’t go to the data set.
--Figure 1—why are the a and b panels smaller than c? It lends some confusion when all could be the same size and fit nicely across the page.
--I don’t think “carbonate system” needs to be hyphenated. I am familiar with how this works with adjectives vs nouns, but it is not used consistently throughout the manuscript in any case. Also, there was at least one place where one might hyphenate dissolved oxygen where carbonate system was hyphenated (e.g., line 119).
--L. 123—specifies data collected *before* 2020 but there are at least two places in the dataset metadata table that say either 2020 or 2021.
--dataset ID 2 in Table 1 and in the Excel metadata table—There’s a space before the text that makes a gap appear in the excel file.
--dataset 5 in Excel file—Greeley is misspelled
--Table 1 dataset 25—should be to 2020 not “present”
--dataset 41—didn’t this also include Oregon? (It says Pacific NW)
--dataset 52—La Push is two words
--dataset 68—again, the words “gridded” and “monthly climatologies” make me think this data set may not be right for inclusion in MOCHA
--Lines 224-226—It might be useful to differentiate between the # of samples dropped as questionable data vs. those dropped due to daily averaging, because this sentence gives the impression that there were more 3s than there were.
--line 232—I’m not sure that “high-quality” was defined anywhere. Uncertainties definitely were not adequately spelled out across the data sets, and I’m almost certain the uncertainties would have varied across the 71 distinct data sets used. This information doesn’t seem to be in either the submission form or the metadata file on NCEI.
--I encountered some confusion between “handheld” sensor measurements vs. those collected “by hand” (hand collected—line 155)—maybe making the latter not use “hand” would prevent others’ confusion when thinking back to what earlier categories of observations and instruments were.
--L. 162—TA is not “extrapolated” from S measurements—please reword
--Throughout—the word “data” always gets a “plural” verb tense
--Table 2: missing value in reliability column for calculated pH
--Lines 296-297—Please indicate on the figure where Pt Arena, CA, and central OR are for readers’ convenience. It could just be asterisks along the axes or similar.
--Line 312—Should say July through September (it’s correct in the figure caption, but the caption doesn’t include May, which it should).
--I liked the discussion of the co-occurrence of stressful DO and pH conditions—I have been looking at similar occurrence statistics myself. And I agree with the conclusion about this pointing to a need for expanded CO2 system observations. It may be useful in this discussion to give DO results in alternate units also (mg/L and mL/L) for our colleagues and end users who use different units.
--Figures 5 and 6 (and elsewhere)—again, should be DO content rather than concentration
--Line 382 and Table 3 caption—the p values do not agree.
--Table 3—again, the offshore relationship with the r squared of 0 seems to require further explanation than given. Specifically, while I would buy that the effect of urban runoff could be strong outside SF Bay, none of the #s in the offshore box make any sense—they are all SO different from all other boxes, including the nearshore SF Bay one, that it makes me wonder if there was an error in the analysis or a typo.
--Lines 418-421—Really seems like the authors are not aware of the wealth of surface CO2 data in SOCAT. This is one of the places where SOCAT might be drawn into the discussion.
Citation: https://doi.org/10.5194/essd-2023-205-RC1 -
CC1: 'Reply on RC1', Esther Kennedy, 13 Jul 2023
Dear reviewer,
Thank you very much for your thorough and insightful review! I am leaving for remote field work tomorrow, but I look forward to engaging sustantively with your suggestions when I return on July 24th. Please excuse the delay,
- Esther Kennedy
Citation: https://doi.org/10.5194/essd-2023-205-CC1 -
RC2: 'Reply on CC1', Anonymous Referee #1, 13 Jul 2023
Terrific. I look forward to seeing your revisions. As I said, I think the paper will be a very nice contribution once the issues I raised have been addressed. I wanted to add that I did not see in the paper any mention of how you handled the discrete pH data in terms of the measurement vs. in situ temperature. Typically they are measured at 25 C and reported at that temperature. If you converted the temp to in situ for those in the MOCHA data product (which would in itself be a service to many end users), please add to the text how you did this. If these conversions were not made, it would be critical to make this clear in the text. In the "Submission form" file, it does say "In-situ pH on the total scale" so I am assuming the temp conversion was made. Thanks... Enjoy the field work! Hope it goes well.
Citation: https://doi.org/10.5194/essd-2023-205-RC2
-
RC2: 'Reply on CC1', Anonymous Referee #1, 13 Jul 2023
- AC1: 'Reply on RC1', Esther Kennedy, 08 Sep 2023
-
CC1: 'Reply on RC1', Esther Kennedy, 13 Jul 2023
-
RC3: 'Comment on essd-2023-205', Anonymous Referee #2, 18 Jul 2023
This paper documents the development of a large dataset of observations of dissolved oxygen, pH (and other carbonate chemistry parameters), and temperature in addition to a few other low priority ad hoc variables (e.g., nutrients). The dataset will be very useful to the broader scientific community and the paper is generally well-written. I inspected the data posted the public repository and it is in excellent shape. I think the paper is ready to be accepted after some minor comments listed below.
Major comments
The only thing approaching a major comment is that I got confused about the number of observations in the data. At one point in the methods, it sounds like the aggregation of the data into daily averages reduced the dataset from 12.7 million rows to 1.2 million rows but then in the results it sounds like there are 12.7 milliow rows and the aggregation wasn’t done. Please be very careful about this and report accurately how many observations are in the final (data available to user) dataset and propagate throughout.
The metadata table (MOCHA_dataset_metadata_table.csv) is easy to understand and seems largely complete though dataset 2 does not have a name. The other fields missing data make sense.
I examined a subset of the data (“47_to_49_pre_2015.csv”) and it is in great shape. The column names are all super intuitive and the values in the columns are all correctly formatted. All of the data that you would expect to be complete is complete. Wonderful.
Minor note for future submissions: please use continuous line numbering (not 5 line intervals). Do everything you can to make the reviewers job easy – this will keep them happy!
Minor comments
Abstract
24 – Stressful or favorable, plus what’s stressful for one organism might not be stressful for another
31 – could you work the focus on hypoxia and ocean acidification risk a little earlier in abstract?
32 – stats on the time span of observations should get mentioned
Introduction
43 – could shorten “effluent from coastal settlements and agriculture” to “coastal runoff”
43 – it’s not clear to me the mechanism for “diverse and highly productive ecological communities” to drive local deviations from global patterns
52 – “e.g.” is missing a comma after it (like in line 40); ensure comma is added throughout
61 – Free et al. (2023) (https://doi.org/10.1111/faf.12753) provides an update to Cavole et al. 2016 paper and is explicitly about this region
71 – Can you make it clear that conditions have gotten shallower without using the word “shoaled”? It might not be familiar to everyone.
89 – What regions do they apply to?
97 - “…for the CCS and is newly archived and available at…”
102 – Hoping to see that the unincorporated sources of info get mentioned later
107 – no need to capitalize MPAs
108 – The stats on number of observations, sources, and tie spant should get mentioned in last paragraph of info
Methods
119 – how was the literature search conducted?
138 – suggest adding (1), (2), (3), and (4) here to orient reader
153 - suggest adding (1), (2), (3), and (4) here to orient reader
158 – this should at least be a supplemental table in this paper; its annoying to have to go look elsewhere for info on the dataset documented in this paper
191 – What does “as normal” mean here?
199- suggest adding (1), (2), (3) here to orient reader
205 – Can you give examples in the supplement? Reads as vague now
210 – Examples drawing from this would be useful
224 – “i.e.” should be followed by comma – correct throughout
Results and Discussion
244 – Again, time range would be helpful.
244 – Isn’t 12.7 million incorrect? Didn’t you reduce down to 1.2 million by aggregating to daily level as stated in Line 226. I’m skeptical of all the sample sizes reported here b/c of this.
273 – “malfunction, 2)”
273 – I think either means between two options
Conclusion
474 – No need to capitalize MPAs
Tables and Figures
Figure 1. The figure would be more useful if it showed the density of points along a raster grid (potentially hexagonal) so that the reader understands data density spatially. The panels should all be the same size, 1 row, 3 columns would be an improvement. The density could be the number of points within a cell or the number of unique year-months in a cell. I leave it to the authors.
Figure 2. Y-axis is a proportion, not a percentage. Align the word choice with what is shown. Spell out acronyms in caption.
Figure 3. It would be nice if the panels were labeled with the parameter so the reader doesn’t even have to read the caption. The width of the latitude should be stated. Eyeballing the figure. Data looks to be most common between 2015-2020 and not 2010-2015 at the authors state, Bar plots of annual totals would be a good way to examine the temporal bias alone.
Figure 6. The caption is confusing about what the points are. Are these all observations with 50 km of shore in the top 50 m? State what it shows. Currently, it’s written like a results section. Define the lines but exclude all of this results interpretation.
Figure 7 caption also includes lots of results interpretation.
Figure 4. Y-axis should read “Percent of observations.”
Table 1. Define acronyms in parameters column in caption. Consider making this a supplemental figure given its size.
Table 2. Add comma to 3rd column. Eliminate 2nd decimal spot in fourth column. Spell our Parameter acronyms in caption.
Table 3. This would be more compelling as a multi-panel figure of scatter plots with regression fits. Caption is mostly results interpretation.
Citation: https://doi.org/10.5194/essd-2023-205-RC3 - AC2: 'Reply on RC3', Esther Kennedy, 08 Sep 2023
-
RC4: 'Comment on essd-2023-205', Anonymous Referee #3, 27 Jul 2023
Kennedy et al. collate, quality control, and synthesize temperature, salinity, and biogeochemical data in the nearshore region of the U.S. portion of the California Current Ecosystem. This data product does show promise for addressing temporal and spatial variability and multistressor dynamics within this region, however, the associated manuscript does not provide enough information for a potential data user to fully understand the appropriate applications for the data product or how the data are manipulated. It also does not provide fair credit for the contributions of the original data providers and funders.
Major comments:
The authors claim the science applications of this data product are broad, including characterizing seasonal variability and spatial variability along the U.S. West Coast. However, the results illustrating variability only focus on the portion of the data sets within 50 km of the coast and < 25 m depth. Either the results need to be expanded to include analysis of the entire data product, or the data product should be restricted to the shallow, nearshore environment and the title and introduction should reflect that the product is focused on the nearshore. Given the data set itself is not interoperable or compatible with other products that include offshore biogeochemical data (e.g. gridded NetCDF files of the Surface Ocean CO2 Atlas or Biogeochemical Argo), it would be difficult for a user to combine and utilize them for assessment of biogeochemistry spanning offshore to nearshore.
Given upwelling- and respiration-driven low pH and low oxygen conditions manifest first in bottom waters, the way these conditions are explained in section 3.5 as within 50 m of the surface is confusing. It would be more intuitive to assess these conditions in the entire nearshore water column based on a bathymetric definition of nearshore, rather than defining nearshore as 50 km from the coast. "Surface" and "near-surface" are used interchangeably, both defined in different parts of the manuscript as < 25 m. "Nearshore", "surface", and "near-surface" should all be defined early on in the results and used consistently throughout.
The description (and potentially the application) of the secondary data quality control is inadequate. First, the original non-QC’d OOI data sets (section 2.3) need to be QC’d using recommended best practices specifically developed for OOI biogeochemical data sets (doi.org/10.25607/OBP-1865). Second, the description of the QC for the remaining data sets (section 2.4) sounds qualitative as written, as if the QC’er simply looked at property-property plots and flagged data points that looked bad. What the authors consider an “outlier” needs to be defined. Were outliers identified as a certain number of standard deviations of the linear (or some non-linear) relationship between parameters? What were the criteria for identifying “suspicious observations” (line 214)? Data QC routines need to be well documented and applied consistently throughout the data product using statistical analyses and thresholds to characterize quality.
It is also a best practice to state the constants used in carbonate chemistry calculations. In addition to Dickson et al. 2007, the authors should refer to more recent best practices for the use of constants in a broader range of temperature and salinity: doi.org/10.1016/j.marchem.2018.10.006; doi.org/10.1016/j.gca.2021.02.008; doi.org/10.1016/j.marchem.2014.07.004.
Lastly, the data products I am most familiar with all have a substantial acknowledgements section including funders of the observations, a long list of citations, and many coauthors because they include the major data providers in the data product development. At minimum, Kennedy et al. should include all the data citations in the list of references. That requires referring to the metadata for each of the original data sets and including a data citation in the references if the data provider requests one be cited. I see citations provided in a table within NCEI Accession 0277984, but that is not trackable by the data providers. Those data citations are critical metrics that funders use to make decisions about what observational programs to support.
Minor comments:
Line 44: Given this data product excludes seawater pH values derived using glass electrodes (for good reasons) they should consider referencing here the many other papers discussing coastal biogeochemical variability and change and not papers that utilize glass electrode data for estuarine and coastal pH monitoring. Many of the providers of the original data sets have published papers on this topic that could be cited instead.
Line 133: From my review of the original data sets, the product likely includes sensors using a membrane-based spectrophotometric method (the SAMI-CO2 as cited) and an equilibration-based method paired with an infrared gas analyzer (the MAPCO2). The phrase “autonomous equilibrium-based spectrophotometric pCO2 sensors” is a mix of the two.
Line 138: What is meant by “devices”? Are these sensors integrated into a CTD-rosette equipment package?
Table 1: Entry 52 title references the Cha Ba buoy, but the product only includes the cruise data for validating the buoy, but not the buoy data?
Lines 198-200: If the data product is going to propose and use a new set of flags, this section should explain why the authors chose to deviate from community-developed and widely-used standardized flagging schemes.
Section 2.5: Since nearshore biogeochemistry is heavily influenced by sub-daily processes, how do the authors account for potential bias in daily means when data are missing or flagged bad for a portion of the day?
Top of page 21: First continued table entry looks incomplete.
Line 329: Figure 5 illustrates the ability of the data product to capture a monthly climatology. Seasonal variability could be interpreted as capturing all seasons over the entire time range of the data sets.
Lines 335-337: Could differences in data density between those time periods be impacting this result?
Table 3 and associated discussion: I was surprised to not see a comparison to, or at minimum a mention of, previously-published TA-S relationships for these regions.
Line 419: “Saildrone” is a company. These types of oceanographic platforms are commonly called Uncrewed Surface Vehicles (USVs).
Line 418: This product should be named and cited.
Line 423: The mention of considering deeper water here is confusing because there is no mention of a desire to assess bottom water earlier in the manuscript. The analysis is focused on varying definitions of surface water.
Citation: https://doi.org/10.5194/essd-2023-205-RC4 - AC3: 'Reply on RC4', Esther Kennedy, 08 Sep 2023