the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Reprocessing of eXpendable BathyThermograph (XBT) profiles from the Ligurian and Tyrrhenian seas over the time period 1999–2019 with a full metadata upgrade
Simona Simoncelli
Franco Reseghetti
Claudia Fratianni
Lijing Cheng
Giancarlo Raiteri
Download
- Final revised paper (published on 03 Dec 2024)
- Preprint (discussion started on 03 Jan 2024)
Interactive discussion
Status: closed
-
RC1: 'Comment on essd-2023-525', Rebecca Cowley, 15 Feb 2024
General comments:
The manuscript describes the re-processing of historical XBT data collected in the Mediterranean Sea from 1999-2019. The reprocessing involved not only creation of automated quality control procedures, addition of a test canister 'calibration' offset and new linear interpolation method, but addition of complete metadata information. The addition of accurate metadata is vital for end users to be able to correct data as new research becomes available.
The presentation of data via the ERRDAP server system makes it accessible to all, although there are some improvements to the data format that need to be made (see comments below). The text itself is well written, but could do with some grammar checks and re-wording to make quite complex sentences simpler. I have given some specific comments on this, but it doesn't cover all of them. Overall, the structure of the manuscript is very good and quite thorough.
The re-processing of such a valuable historical dataset is critical for accurate estimates of ocean heat content and improved accuracies of data assimilation into models. I recommend the publication of this paper after some changes to the text and the data files.
Manuscript specific comments:
A big improvement to the dataset and manuscript would be the inclusion of uncertainty values with the data. Uncertainties are mentioned, but there is no development of or inclusion of the uncertainty data. Some brief discussion of derivation of the uncertainties should also be included.
Line 139-145 discusses measurement accuracy and uncertainties. A comment here about using manufacturer accuracies as an estimate of uncertainties would be beneficial, rather than use both terms interchangeably.For further corrections to the dataset, did you consider implementing the launch height correction from Bringas and Goni, 2015? You have all the launch heights from the vessels and by adding the offset to the depths, you will get quite a different result in your comparisons. Bringas and Goni's results are robust and it will be a first for these corrections to be implemented in your dataset. Some of the vessels are 25m launch heights which equates to a 4m depth offset, a considerable impact on the thermocline depths. You could make the correction and include it in the data file to allow the user to remove it if required, in the same way as the calibration value has been done.
You also could implement the fall rate corrections from Cheng et al 2014. Although this could be complicated if the launch height corrections are also implemented, I'm not aware of any investigations into how these two corrections would interact with each other.
Line 171-175: Does this mean that the test canister results were applied to the strip chart recording system at the time of data collection as an offset? Or were they simply used as a check, as is currrently the case? I'm not sure that the practice has been disregarded, certainly not in Australian data collection. I can't speak for other countries, but I wonder if it is more accurate to say that it is not included as an offset correction in routine data collection. I don't know if it is neccesary to do this correction at all. My understanding is that the test canister allows you to check the system is earthed correctly, any failure in earthing will result in poor data collection and the test canister will show this. Also to show any faults in the launcher cable, launcher gun etc. Are there any references to suggest that the test canister offset should be applied (eg Reseghetti et al 2018, others)?
Line 200: Great that you've got some numbers for evaluation of disposal of these materials at sea. Instead of including it in the Summary/Conclusions section only, it would be good to have a few sentences in the results section perhaps about the total amounts for this dataset. Or even in table 1, which I think only has values on a per probe basis. The table headings need to indicate it is per-probe, not total.
Line 246: What is the 'evident "noise"' referring to? Is it electrical interference (high frequency noise) perhaps? Perhaps it was an earthing fault? Did your AQC pick up this noise in a particular test? Perhaps a quick sentence suggesting a source for the noise or what it looked like? Or remove this reference.
Line 307: Change 'indexes' to 'flags' and 'those repeated during data taking' to 'duplicates'. Or is it 'replicates'? If duplicates (ie, the same data copied), these should definitely be removed. If replicates (ie, two different profiles taken close together in time), these should not be removed. I also think removing data that fails in less than 50m is not the right thing to do. The data should be flagged where it is bad, but if it contains some good data, it should be kept.
Line 322-326: Is there any attempt to correct positions or do you lose these data becuase they are 'on land'? Since you have re-processed from the start, you should be able to correct positions for at least some of the profiles. There is no mention of a date/time check (or ship speed check) - was this investigated?
Table 2 and Surface Check: The exit values of 49-52 are not very descriptive compared to the other tests. I would suggest updating them to include 'Surface check' or similar so it doesn't get confused with the SDN flag codes and is clearer for the user of the data. It will mean updating the data file variable attributes too.
Line 515-516 and Figure 7. The intervals described in the text are not clear in the figures. The bias figures perhaps need some lines showing these intervals? And I think that the figure needs 'Dec-May' and 'Jun-Nov' instead of 'mixed' and 'stratified' to be consistent with the text. Line 518 should also refer to the month period rather than 'mixed'. The brackets in the x-axis label are not clear.
Line 647-649: the depth in figure 12c is 500 to 800 m and shows a negative average bias, not consistent with the positive bias in the deeper part of figure 11a (which is 800-2000m and solely due to T5 probes). I don't think the steps have any impact on the average bias, but do cause the positive spikes in the figure 12c T difference plot.
Editing corrections/grammar suggestions:
line 24: insert SDN acronym after the Seadatanet name.
Line 25: The link does not work when clicked, seems to be a different link to what is written.
Line 29: June to November perhaps, as written later in the text?
Line 34: Is there a full name for the ERRDAP acronym? If so, include here.
Line 44: 'to re-analyze' change to 'reanalysis of'
Line 68: WOD link does not work, doesn't match the text.
Line 66-70: this sentence is missing something grammatically, please review.
Line 85: Add full words for DAQ acronym (Data Acquisition System)
Line 88: is there a reference for latest documented QC procedures?
Line 99: link fails when clicked.
Line 119: Add "Negative Temperature Coefficient" for NTC acronym
Line 124: '1960's' is stated as 1990's in appendix A, Line 776. Please check, probably it's since probes were built, so 1960's is likely correct.
Line 126: change 'clockwise in' to 'clockwise from' and 'counterclockwise in' to 'counterclockwise from'
Line 127: suggest rewording to 'decouples the XBT vertical motion from the translational ...'
Line 131: suggest removing 'phenomenological', not required here.
Line 135: replace 'thus' with 'then'
Line 144: 'and slightly' doesn't make sense in this sentence.
Line 146-150: This sentence is too long and confusing and I'm not sure what the purpose is of including it is. I also don't understand 'in order to have a practically unchanged measurand' in this sentence. Please review.
Line 150: Include 'standard deviation' with SD acronym
Line 152: I disagree that it is a 'few tens of meters', equivalent to more than 20 meters. The reference states only a few meters to reach stable velocity.
Line 159: replace 'depends on specific FRE with actual' to 'has'
Line 160: remove 'reading'
Line 164-166: This sentence is a little confusing, please review.
Line 170: 'is binding for subsequent optimal use' could be better written
Line 171-175: the grammar is quite awkward and could be better written.
Line 182: Are you using accuracy and uncertainty terms interchangably here?
Line 184: suggest changing 'thanks to a' to 'with a'
Line 185: remove 'of'
Line 189: remove 'of'
Line 193: Why would ship speed affect duration of the data acquisition? The probes are designed to take the ship speed out, as you have stated earlier.
Line 198: Include full name for ZAMAK
Line 266: remove 'only'
Line 280: change 'masses' to 'mass'
Line 290: add 'Delayed Mode' for DM acronym
Line 289 & 292: Link fails when clicked
Line 306: change 'eventually remove it' to 'remove it if required'. Also change 'profiles' to 'data' and 'eliminated' to 'deleted'.
Line 310: remove 'implemented'
Line 316: Suggest changing to 'Automated Quality Control overview' or 'Automatic Quality Control procedure'
Line 319: Change 'flag' to 'exit value' to be consistent with Table 2.
Line 325-326, 329: suggest using the same terminology as is in table 2, rather than 'GOOD' and 'BAD'.
Line 355: Change 'It' to 'The Gross range check'
Line 343-345: suggest changing to "The XBT measurements close to the sea surface are usually considered unreliable due to the time taken to reach terminal velocity (Bringas and Goni, 2015) and due to the time taken for the probe to reach thermal equilibrium (need a reference here) and are thus excluded from further analysis (e.g. Bailey et al., 1994; Cowley and Krummel, 2022).
Line 345-347: Sugggest changing to: 'We here implement a surface test that flags data and retains all orginal measurements.'
Line 347: remove 'proposed' as you have implemented it.
Line 348: need a reference for the 'first value currently considered acceptable'
Line 390: should be figure 4a
Line 467: suggest removing 'k-th' as it's confusing. Or re-write?
Line 489-490: suggest changing 'crossing the water column and measuring' to 'deployed' or 'recording'
Line 491: suggest removing '("hot" or "cold" probe or possible troubles during the acquisition)' as it is unnecessary and leads to questions about what is meant by hot and cold probes.
Line 545: Suggest 'The QC algorithms applied to the dataset are not capable of catching all erroneous values.'
Line 549: remove 'deeply'. Change 'by visual check' to 'using visual checks'. Change 'In specific' to 'Specifically'.
Line 550: Change 'tuned by' to 'using'
Line 551: change 'minimize to flag as BAD data the GOOD ones.' to 'minimize flagging of BAD data as GOOD.'
Line 553: 'from a visual'
Line 555: change to 'flagging of GOOD data as BAD, as shown....'
Line 558: remove 'instead'
Line 559: change to 'true positive spikes (a) and false positive spikes (b)'
Line 561: 'features that the automatic'
Line 563: remove 'happened or'. Change 'The indispensable premise is the' to 'The decision is based on the'
Line 575: suggest removing 'non-zero' as if there is wind it is of course non-zero
Line 578: remove 'also', already used earlier in the sentence.
Line 582: remove '"cleanliness" of the'
Figure 8 & 9 titles have underscores in them which have turned text into subscripts. They are also a bit cryptic for the reader, perhaps useful in analysis but could be improved for the manuscript.
Figure 8 caption for (b) should 'true' be 'false', as in the text?
Figure 9 caption: change '(a) true spikes; (b) false spike' to '(a) true positive spikes; (b) false positive spike'
Line 599-608: too many words in quotations. Suggest removing all the quotation marks.
Line 599: change to '...identify the external influences that cause high frequency noise in the T profile...'
Line 606 - 607: change to: 'In some cases, the automated QC BAD attribution was changed to GOOD after the comparison with adjacent profiles that present similar characteristics.'
Line 612: hyperlink is not the same as the text
Figure 10: I suggest these axes be re-shaped to portrait mode and increase the scale for temperature to avoid viewing the 0.01 resolution steps in the data. The way they are set out at the moment makes it very difficult for the reader to see the features that are talked about. The main focus is the noise, not the inversions.
Line 631: 'profiles without correction' and 'non-corrected profiles' are the same thing. One should be 'corrected'.
Line 635: change 'quite constant' to 'consistent'
Figure 11: What is the black line in the plots. Only two of the colours are referenced, but there are three.
Line 645: is 'relative' differences referring to the dt/dz within a profile?
Line 669: 'The adoption of a Gaussian filter...' is this referring to the SDN dataset?
Line 736: Rebecca Cowley is not a Dr.
Line 758-759: remove 'because it is an essential component to get good quality XBT measurements.'
Line 822: is 'URN' defined anywhere, if not please define.
Line 870 & 879: Remove 'Moreover'
Line 884-886: change to: 'Ship speed, wind speed, and probe mass (available since 2018) have been added to this metadata section, when available.' Remove the rest.Data file comments:
Thank you for including raw data and the temperature calibration values that can be subtracted. This makes the data file versatile for the user.
- I suggest replacing the 'TEMPE01' name in the variables with 'TEMPERATURE' to make it easier to read.
Is the TEMPET01_TEST_QC variable additive? I suspect so since there are values of 581 in the variable, which means that more than one test is failed at a given depth. If so, please review this information about when to use 'flag_values' and 'flag_masks' attributes. I think it needs to be 'flag_masks' if it is additive. And, the 'flag_values' attribute should be bit values that can be decoded unambiguously into the individual 'flag_meanings' associated with each bit. https://cfconventions.org/Data/cf-conventions/cf-conventions-1.7/cf-conventions.html#flags - There are many global attributes that need to be made into variables. I downloaded a netcdf file via the ERDDAP server and it created one netcdf file with many profiles dimensioned by 'row'. That means that all of the global variables that would apply to ONE profile now do not apply and need to be made into variables. For example: fall_rate_equation_Coeff_1, fall_rate_equation_Coeff_2, probe_type information, launch_height information, serial numbers, platform codes and so on. Similarly, if I look at an ascii line dump of the data, all of the attribute names contained in 'global attributes' section of the netcdf files are missing. Please review these global attributes carefully. The text in the appendix will also need updating if these items are moved to variables.
- Attributes for the variables: please check these. Some have incorrect attributes (eg, DEPTH_*_QC variables have a 'standard_name' attribute of 'depth' where it should be '* status_flag'). The 'TEMPE01' variables are missing a standard_name attribute ('sea_water_temperature').
Citation: https://doi.org/10.5194/essd-2023-525-RC1 -
AC1: 'Reply on RC1', Simona Simoncelli, 14 Jun 2024
The comment was uploaded in the form of a supplement: https://essd.copernicus.org/preprints/essd-2023-525/essd-2023-525-AC1-supplement.pdf
- I suggest replacing the 'TEMPE01' name in the variables with 'TEMPERATURE' to make it easier to read.
-
RC2: 'Comment on essd-2023-525', Anonymous Referee #2, 28 Feb 2024
GENERAL COMMENTS:
The manuscript presents a reprocessed version of previously published XBT data. The previous version(s) did not have the best available calibrations and/or quality control applied, and this new version apparently does. As such, I find the effort worthwhile - it is always good to have a version of a dataset that can be considered "final".The manuscript could be made stronger by:
- Adding an uncertainty estimate against an independent data source (Argo?) that validates the new data to be in better agreement with such reference than the previous version;
- Adding a use case that shows what can be done with the new version that could not already be done with the old one(s). E.g. can we detect temperature trends now with better confidence (or after fewer years) than before?
These two items should make the points that yes, the new data is better, and yes, it was actually worth the effort. The present manuscript describes the methods in sufficient detail, but does not make these points.
The clarity of the manuscript could be improved by a copy/line editor authorized to make more than just minor language editing. Can the journal provide such services, for a fee if need be?
There are substantial problems with the dataset and the metadata that comes with it. None of these problems are unusual or difficult to correct, but they do need correction. For a manuscript that lays claim to high-quality metadata, the present state of the underlying dataset is not acceptable. Comments below list my findings in detail.
SPECIFIC COMMENTS:
Manuscript:
Ll. 25-34: I recommend taking the URLs out of the text and putting them in footnotes. The one that breaks on the end-of-line cannot be used "as is", but requires hand-editing - that too should be corrected.
I had to read these sentences twice to understand which dataset was the original, and which was the new one that was being described in this article, and why there were three links instead of two. I recommend clarifying by e.g.:- assigning names "ORIGINAL" and "REPROCESSED" to these, and using these names throughout the manuscript
- removing one of the two links to the reprocessed data (keep the doi one)
L. 30: Bias and RMS difference against what - between the old and the new versions? Is there any evidence that the new dataset is better than the old one, i.e. that bias and RMS against the truth is now smaller?
L. 5: Define acronym ENEA.
L. 28: Define acronym SDN.
L. 85: Define acronym DAQ.
L. 198: Define acronym ZAMAK.
L. 207: Define acronym CSIRO properly.
L. 211: Define acronym CNR-ISMAR
There are inconsistencies between the numbers of profiles reported in table 1 versus what is in the actual dataset. Please correct or clarify:- Summing up the second-to-last column of table 1, I expect 3917 profiles in the SeaDataNet repository. Clicking on the link in the abstract led me to a data download that ultimately gave me 3662 individual files. Is there a reason these numbers do not match?
- Summing up the last column of table 1, I expect 3757 profiles. The downloaded REP dataset seems to contain 3754.
Ll. 146-150: Edit this sentence/paragraph for better English language
Ll. 140-145: These quoted uncertainties are consistent with each other. State so.
Ll. 146-150: Are these uncertainty estimates for XBT data using the dataset presented here, or are these different data? Has such a comparison been made for the data presented here?
L. 311: What does the word "imported" mean? Imported where?
L. 312: What does the word "collection" mean? If there is a specific meaning that only ODV users can understand, please explain.
Dataset:The dataset comprises ~3800 ocean temperature profiles (i.e. observations of temperature as a function of depth). This is not a particularly large dataset, and it is therefore reasonable that a user would like to download everything at once. From a quick back-of-the-envelope calculation, I assume that the size of the entire dataset (incl. metadata) should be a few hundred megabytes. However, when I tried to download the entire dataset with the settings below, the server failed with either error message 500 or 502. I assume it ran out of memory when I requested:
- http://oceano.bo.ingv.it/erddap/tabledap/REP_XBT_1999_2019.html
- requesting every variable
- requesting full time period (1999-2019)
- requesting either file type .ncCF or .ncCFMA
I then downloaded subsets (final ~6 months) of data, and these files do not make prudent use of memory (100-300 MB for 34 profiles). In particular, data types were unnecessarily large (e.g. floating-point variables when smaller integers would suffice), and there were many cases were information was redundant (e.g. ship names repeated for every data point, rather than once per profile). I recommend making changes that will reduce the total file size to less than ~500 MB, such that a user can "get everything at once". I recommend the following changes to save space (but please use your own good judgment - not all of this might work as intended):
- Convert to 8-bit integers (instead of 16): DEPTH_FLAGS_QC, POSITION_SEADATANET_QC, TEMPET01_FLAGS_QC, TIME_SEADATANET_QC. You only ever have values 0-9, why make space for 32000?
- Convert to 32-bit floats (instead of 64): depth, DEPTH_INT, TEMPET01, TEMPET01_INT. Single-precision is sufficient for temperature (~millionth of a degree) and depth (~0.1 mm).
- Convert to 16-bit integer (and apply corrections below) to TEMPET01_TEST_QC
- The following variables should have the same dimensionality as latitude and longitude (i.e. one per profile; should not be repeated for every data point):
- POSITION_SEADATANET_QC, SDN_BOT_DEPTH, SDN_CRUISE, SDN_EDMO_CODE, TIME_SEADATANET_QC, cruise_id, institution, institution_edmo_code, pi_name, platform_code, platform_name, platform_type, source, wmo_platform_code, url_metadata
- Consider eliminating the following:
- DEPTH_INT_SEADATANET_QC and TEMPET01_INT_SEADATANET_QC (the ...INT variable doesn't really need a QC flag, assuming that only "good" input data were used for the interpolation)
- DEPTH_TEST_QC (you already have DEPTH_FLAGS_QC, one is enough)
- area should be a single global attribute, not a 17*Nobs array (!!!)
- The profile_id variable should be replaced with a 16-bit integer in ragged array representation (instead of 136 bits of redundant text)
The naming of the variables in the dataset can be improved:- Some names are capitalized, others are not. Can you make all the same, or have a logic which ones are capitalized (e.g. the lat/lon/time coordinates plus depth and temperature)?
- TEMPET01 seems to be the primary scientific variable, but its name is not human-readable. Can this be changed to "TEMPERATURE"?
- Some variables seem to copy input data from SeaDataNet. If they are just duplicates (I have not checked if they are), is it really necessary to include these here? If we want to include them, can they at least be named consistently (at present, some start with "SDN_..." while others have "...SEADATANET..." somewhere in the middle)?
There is some poor wording in the metadata of the primary temperature data, which ought to be improved. This is about the use of the words, "raw" and "calibrated". In my understanding, TEMPET01 is calibrated data at the original resolution in space/time, and TEMPET01_INT is data with the same calibration but interpolated onto a consistent depth grid. There are no "raw" data in these files. How about:- TEMPET01:long_name = 'Calibrated seawater temperature at original vertical resolution'
- TEMPET01_INT:long_name = 'Calibrated seawater temperature interpolated on standard depth levels'
- The 'comment' attributes under CALIB and TEMPET01 should reflect this wording (i.e. get rid of "raw").
- Also, the equations shown in the 'comment' attributes should use the actual variable names.
In the CF conventions, the attribute "standard_name" is the preferred mechanism by which a user (human or computer) finds out which physical quantity is in a variable. Therefore, I recommend that all variables for which such name definitions exist, should use one. In the present version, this is not done consistently. In particular:- All temperature variables should have a standard_name attribute set to, "sea_water_temperature"
- The variables DEPTH_FLAGS_QC, DEPTH_INT_SEADATANET_QC, DEPTH_TEST_QC have a wrong standard_name. Should be corrected as per next item.
- All QC variables should have one of two options for standard name:
- either simply, "quality_flag"
- or the standard_name of the corresponding data variable, followed by " status_flag", as in, "sea_water_temperature status_flag" or, "depth status_flag"
I was expecting the depth and DEPTH_INT variables to have different lengths (and likewise for the matching temperature data and QC flags). You could save some disk space by not zero-padding the (shorter) interpolated ones.
TECHNICAL CORRECTIONS:
Manuscript:L. 35: Switch order of "Interoperable", "Accessible"
L. 151: This is not "hard to describe". Bringas and Goni did it, didn't they? Were their findings used, e.g. by correcting the fall rate equation to account for the initial velocity estimated from drop height? If so, where is this documented in the metadata? Or were their reported depth errors used in some sort of error estimate?
L. 788: Change "point" to "profile"Dataset:
Something is wrong with the TEMPET01_TEST_QC variable. I assume it should encode, bitwise, the various QC tests. Assuming that there are <=16 tests, the variable type should then be a 16-bit integer (not a 64-bit float). The meaning of each bit should be explained in an attribute "flag_masks", not "flag_meanings", see http://cfconventions.org/cf-conventions/v1.6.0/cf-conventions.html#flags (section 3.5) for the difference between the two, and the values need to be re-computed (maybe I misunderstood, but the present values make no sense to me). In addition, it is unclear how these correspond to the values listed in table 2 of the manuscript, and in my data version, the content of flag_masks and flag_meanings have different lengths (15 vs. 13 entries).
The present files have the 'cf_role' attribute assigned to the time variable. Since you actually have a variable "profile_id", I think this variable should have the cf_role attribute instead of time, or am I missing the logic here?
There are metadata entries that are presently global attributes, but they should be variables (or attributes) specific to each profile. These are factually incorrect at present:
bathymetric_information, IMO_number, last_good_depth_according_to_operator, last_latitude_observation, last_longitude_observation, launching_height, max_acquisition_depth, max_recorded_depth, probe_manufacturer, probe_serial_number, recorder_types, ship_speed, fall_rate_equation_Coeff_1, fall_rate_equation_Coeff_2I am unsure if this also applies to the following global attributes (please check): ices_platform_code, id, source_platform_category_code, sourceUrl, wmo_inst_type
I recommend changing the dataset title (on the website and inside the files) exactly as follows (remove "of", correct "Tyrrhenian", and capitalize "Seas"):
Reprocessed XBT dataset in the Ligurian and Tyrrhenian Seas (1999-2019)
In the global attribute 'summary', correct spelling of "Expendible" to "Expendable"
Reg. the fall rate coefficients:- They are presently given in global attributes, as if one set of coefficients applied to all probes. These change between probes; they have to be profile-specific.
- The units are spelled wrong in the global attributes:
- There needs to be a space between "m" and "s" (else, it is milliseconds)
- The exponents behind "s" need to be negative
The "coordinates" attributes are used incorrectly:- DEPTH_INT is a coordinate and should not have such an attribute (I think, but correct me if I am wrong)
- TEMPET01_INT (and the other _INT variables except DEPTH_INT) should not list "depth" in the coordinates attribute, but rather "DEPTH_INT". This is important, because it defines the vertical position of the data - the way it is right now, you are telling the user that TEMPET01_INT data are coming from the wrong depth!
Citation: https://doi.org/10.5194/essd-2023-525-RC2 -
AC2: 'Reply on RC2', Simona Simoncelli, 14 Jun 2024
The comment was uploaded in the form of a supplement: https://essd.copernicus.org/preprints/essd-2023-525/essd-2023-525-AC2-supplement.pdf