the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Climate-Biogeochemistry Interactions in the Tropical Ocean: Data collection and legacy
Abstract. From 2008 through 2019, a comprehensive research project, SFB 754, Climate – Biogeochemistry Interactions in the Tropical Ocean, was funded by the German Research Foundation to investigate the climate-biogeochemistry interactions in the tropical ocean with a particular emphasis on the processes determining the oxygen distribution. During three 4-year long funding phases, a consortium of more than 150 scientists conducted or participated in 34 major research cruises and collected a wealth of physical, biological, chemical, and meteorological data. A common data policy agreed upon at the initiation of the project provided the basis for the open publication of all data. Here we provide an inventory of this unique data set and briefly summarize the various data acquisition and processing methods used.
This preprint has been withdrawn.
-
Withdrawal notice
This preprint has been withdrawn.
-
Preprint
(4773 KB)
-
Supplement
(421 KB)
-
This preprint has been withdrawn.
- Preprint
(4773 KB) - Metadata XML
-
Supplement
(421 KB) - BibTeX
- EndNote
Interactive discussion
Status: closed
-
RC1: 'Comment on essd-2020-308', Giuseppe M.R. Manzella, 19 Apr 2021
General comment
The authors have provided a list of campaigns conducted as part of a project conducted from 2008 to 2019. The data collected during these campaigns are certainly useful for understanding many aspects related to climate-biogeochemical interactions. The data is accessible and available to every user, according to the FAIR concepts that animate the scientific community today.
The description of the sampling and quality control methodologies is well done. The presentation is therefore complete and satisfying.
Specific Comment
The authors assert that the final publication of the data in PANGEA was made after 3 years to allow the analysis of the data to the members of the work team. In addition, some data sets have been published in specialized databases. Campaign information can be found in other metadatabases. For example, the ATA_IFMGEOMAR campaign can also be found in
https://campagnes.flotteoceanographique.fr/campaign
BRANDT Peter (2008) GEOMAR4 cruise, RV L'Atalante, https://doi.org/10.17600/8010030
Is submitting a list of datasets within ESSD's publication policy? In an article devoted to data, some of which are certainly published in articles, I would expect answers to questions such as:
- What was the original purpose of the data collection?
- What were the applied quality assurance / quality control protocols?
- How were the data originally used?
- Have the data been reused?
- How has the data been assessed for fitness of use?
The first two questions have excellent answers in the article, but the others are absent. Furthermore, in an ambition project such as the one presented by the authors, I would have expected indications on the analysis of multidisciplinary data. The article seems to me essentially a technical report.
Citation: https://doi.org/10.5194/essd-2020-308-RC1 -
AC1: 'Reply on RC1', Gerd Krahmann, 30 Apr 2021
Dear Guiseppe Manzella,
thank you very much for reviewing our manuscript!
Under 'Specific Comment' you raise five questions.
What was the original purpose of the data collection?
What were the applied quality assurance / quality control protocols?
How were the data originally used?
Have the data been reused?
How has the data been assessed for fitness of use?The first two questions you deem well answered, but for the following three you find no appropriate answer in the text.
We somehow have the impression that the three questions are not fully appropriate to the submitted manuscript category. The manuscript was submitted in the category 'Review articles'. See
https://www.earth-system-science-data.net/about/manuscript_types.htmlAccording to the ESSD descriptions such a review article may
'describe recombinations of existing (historic) datasets' .
That is where we locate our manuscript.We will try to answer the questions here and state why they are not answered in the manuscript. We have added two sentences to make the intent and scope of the manuscript clearer.
How were the data originally used?
We have modified the end of the introduction to make clear how the data was originally used:
To date 793 peer-reviewed scientific papers, theses and other publications have been generated by the members of the SFB 754 (enter ‘sfb754‘ into the search field at https://oceanrep.geomar.de). In many of these publications observational data sets are fully described, assessed, used and originally published. The aim of this article is to summarize and list these published observational data sets collected by the SFB 754 all together in a clear structured way for easier access and find-ability.
Have the data been reused?
The more than 1000 underlying datasets have been reused many times.
The question seems to be more suitable to the ESSD manuscript type 'Data description paper' where one would like to avoid duplicate descriptions of the same data. In the manuscript we summarize and organize a large number of existing data sets that most often have been used and described before. For all methods subsections and datasets we give the references in which they are fully described and used.How has the data been assessed for fitness of use?
This question also seems to be directed to smaller data sets when they are first published. We are describing already published and in most cases used and described data sets. For our manuscript we ourselves have not assessed any data for fitness of use but rely on the assessments that were made when the single data sets were first published in scientific papers and in data centers. We will add a sentence to the beginning of section 4 that will make this clearer:
The resulting datasets have been described, assessed, used and published in a large number of publications. Here we briefly summarize the methods used and refer to the relevant publications in which the methods have been described in detail.
Citation: https://doi.org/10.5194/essd-2020-308-AC1
-
RC2: 'Comment on essd-2020-308', Anonymous Referee #2, 26 Apr 2021
General comment
This manuscript gives an overview of, and describes the scientific activities and resulting datasets of the SFB754. The report contains the metadata of all cruise conducted within this project, as well as details on the methods used and the treatment of the data.
This overview is a good entry point into the vast array of data that were collected within 2008-2019 in the frame of this project and contains important information for data re-use.
Specific Comments
Page 13, line 95: Link to https://www.sfb754.de/sfb754-osis is misleading, as it shows everything in OSIS and one has to go to “context” – SFB754 to get the real subset, is there no direct link to SFB754 in OSIS? If OSIS is able to store the data I am missing the explanation why the data was additionally published in other data centres.
Page 13, line 111 and following (=Table 2):
Link https://doi.org/10.1594/PANGAEA.926545 does not work, the dataset is still in review (https://doi.pangaea.de/10.1594/PANGAEA.926545) and it is the only one which starts with “SFB 754” (with empty space) instead of SFB754, please correct. Also in this collection quite a few datasets appear as “unpublished” and are access restricted. I would like to have more transparency on how many datasets are actually published/accessible and why some are not. (even those that are 5 years old, see e.g https://doi.pangaea.de/10.1594/PANGAEA.861224 )
A few collections could maybe be combined or more clearly ordered, I am not sure e.g why the collection https://doi.pangaea.de/10.1594/PANGAEA.926794 contains those two entries, they don`t seem to contain similar data. Also, one of the entries in itself is a collection (https://doi.pangaea.de/10.1594/PANGAEA.903023) which makes the whole thing really confusing. Maybe reduce the number of collections and make it a little bit more general (e.g. only one collection for all datasets of the BIGO lander).
Page 26, line 398: watercontent should be “water content”
Page 27, line 425 and page 31, line 540: “analyzed” – in all other cases you write “analysed”
Page 40, line 749: “rhizone” probably is rhizome?
Page 41: In my opinion, a chapter (after chapter 4) on the most important results (with links to the respective publications) and an outlook is missing. The following questions are unanswered:
- What are the most prominent results of this project, which datasets have been combined to obtain these results?
- Did the project reach it`s goal, did it surpass it or is there anything left unclear?
- How can the data best be used, can it be combined?
Reference list: in the citation of the datasets you always include “Dataset.”. As far as I can see this is not part of the original citation, is this intentionally done to separate them from the other references?
Citation: https://doi.org/10.5194/essd-2020-308-RC2 -
RC3: 'Comment on essd-2020-308', Anonymous Referee #3, 29 Apr 2021
This manuscript describes a catalog of datasets obtained during a long-time research project. At this stage, I do not recommend to publish it for the following reasons:
The information about the long-time SFB project is not sufficient. The reader does not get a feeling why such a large project was done and what connects the different disciplines. Dissolved oxygen is mentioned and some textbook knowledge of oxygen distribution and their drivers are given. This is not enough to take the reader by the hand and show what great science has been done in the project.
The manuscript reads like a technical report.
The list of datasets is not complete, as the authors write. Therefore, this report comes too early. Some of the data appears in other data centers than Pangaea, which is also mentioned section 5. However, that summary does not say which data is concerned. The authors did not explain why some of the data were submitted elsewhere. Couldn’t they have mirrored the data in Pangaea to really have all data from the SFB together?
The information about the data is found in section 4. However, in this section mainly the descriptions of the methods used are presented. This is not the same as a description of the data. What I expected to find here is a real description of the data and what has been done with that data: What kind of quality control was applied, how much data had to be discarded, which methods for data management were used, etc. A mere methods description is not useful in an ESSD paper.
Even when it is useful to have a list with all the cruises and datasets of this large project, this information can easily be provided in a technical report.
Some minor comments:
Figure 2 and 3: Please add some geographical names to the figures: countries, cities and in particular those names occurring in the text (e.g. ports)
L63-65 “The three 4-year long phases allowed for the development and adaptation of the observational and experimental program. Questions arising from the data already collected were incorporated into new sub-projects for the subsequent project phases.” It would be nice to see some examples of this. As it is here, it is very abstract.
L88-89 “One of the first steps after the inception of the SFB 754 was the development and implementation of a common data policy (https://oceanrep.geomar.de/47369).” This is a sentence I expect in a technical report.
L91-92 “This data policy and its strict application is one of the reasons for the success of the SFB 754 with 421 peer reviewed publications at the time of writing.” ditto
L97-98 “In the final step the data was published and made freely available at the World Data Center PANGAEA (https://www.pangaea.de) or at other more specific data centers.” Why wasn’t it part of the data management plan to have all data together in one data center? This is not in conducive to the FAIR data principles, in particular the findability.
L99-100 “the rules of the data policy were quite generic.” I do not quite understand that. Rules should not be different for different data fields. Maybe the meaning behind “rules” as used here is not clear. Please modify the sentence to make clear what exactly you mean.
L102 “initial versions” of the data I suppose
L109-110 “Some of the data sets have been published elsewhere on more specialized databases. These are explicitly mentioned in the text below.” It would be better to have all data sets here in this table. It would be easy in the table to discern the data sets not in Pangaea. Thus all data sets would be in overview together.
Citation: https://doi.org/10.5194/essd-2020-308-RC3 -
RC4: 'Comment on essd-2020-308', Anonymous Referee #4, 02 May 2021
General comments:
This manuscript provides an “inventory” (line 27) to the data legacy of a major research program („SFB 754“). In fact, it states (line 58) that “The aim of this article is to describe and list the published observational data sets collected by the SFB 754 for easy access and find-ability.” Does it really describe a “recombination of existing (historical) datasets”? What, beyond providing a compact listing and cross-reference, is the added value of this inventory?
One could easily justify it as a highly valuable companion report, supporting the overall final report of SFB 754. It ties together and identifies some 1,000 datasets from 34 cruises performed over more than 10 years, from a wide range of (oceanographic) disciplines (lines 51–53) and links them with their supporting documentation (e.g., cruise reports, best practice documents, etc.). The manuscript provides some interesting indications about the program’s data policies and practices (including the existence of data processing pipelines from raw to published data, lines 96-97, which appear, however, only accessible to SFB consortium members, lines 93-94).
According to the manuscript (line 92), 421 publications were already generated from that research. Due to their intimate knowledge of instrumentation and methods, the authors and reviewers of those publications were certainly convinced that the underlying data were indeed fit for their 421 purposes.
But as ESSD is about supporting the reuse of data in other contexts than that of their creators’ research, the most important criterion on the datasets and/or the ESSD articles is to provide clear (and, as far as possible, easily interpreted) information which helps to evaluate or assess the datasets’ fitness for the purposes of a third party (aka, its quality).
(ESSD review criteria ask „Are error estimates and sources of errors given (and discussed in the article)”)Does this manuscript support these aims and does it meet this criteria of ESSD? I think not, as none of the subsections on specific parameters does provide direct information on quality:
Each of the subsections, on individual observed parameters, describes the measurement methods explicitly – some very briefly (4.1.2, 4.2.4), some in much more detail (most subsections of 4.3 and 4.5). Many, but not all subsections (on physical oceanography) then refer to individual sections of the “GO-SHIP best practices”. Many subsections refer to individual research articles (as well). While we can only hope that the GO-SHIP website will survive until it will be needed by a user of these datasets, at least some of the articles referenced are behind paywalls: E.g., 4.1.3, line 176, Hahn et al. €37.40; 4.1.6, line 221, Foltz et al. $8-$49. As the Hahn article, in particular, is referred to in the context of measurement errors (line 179), this defies the dedication of ESSD to provide such information in Open Access (above the inconvenience of scrutinizing yet another article for the information which should have been provided explicitly here or in PANGAEA tables).ESSD review criteria further ask reviewers to: “Consider article and data set: are there any inconsistencies within these, implausible assertions or data,…” Specific comments, below, show that one will encounter – right from the beginning - problems doing this kind of consideration for this manuscript and its 1,000 (heterogeneous) datasets, and it appears completely out of the capacity of one or even 4 reviewers to comply with this, even on a random but representative sample from the 1,000. This would put any such manuscript out of the clearly delimited aims of ESSD – except it would provide an algorithmic way and an execution environment to perform these tasks (however that might work – but it appears to this reviewer as one possible benchmark of FAIRness: That data and metadata are machine-readable and interoperable).
As a last general comment, ESSD review criteria ask: “Is there any potential of the data being useful in the future?”. The thoroughness of the description of methods (and, presumably, their consistent execution), and the broad interdisciplinary application within the SFB research program hint at further applicability. Authors mention this in Conclusions, but do not provide examples beyond the (immediate) thematic realm of SFB 754. (Would it be possible and useful to compare these with similar (climate-BGC) data from non-tropical areas? Is the collection as a whole significant to track effects of climate change, and if so: when might – selective - repetition be advisable, thus setting a lower time limit of necessary preservation of this data collection and its interoperability?)
In summary, I do not see how this “review article” provides for the “recombination” of existing data, nor do I think that it meets the general aims of ESSD to help assess the usefulness of data for readers’ purposes. It is up to editors, if they wish to establish a new manuscript type or extend the scope of review articles to include the description of data inventories. In the absence of such decision, I do not suggest to accept this manuscript for ESSD, useful as it may be in another context.
Specific comment #1:
While the manuscript may be the best an inventory can do to describe the maze of individual datasets and their listings, ordered by parameter observed, cruise where such observation was performed, and of pertinent references to ancillary information (such as cruise reports, best practices), it makes it truly hard for someone, say, interested in ocean currents, 25 years from now:
- Section 4.1.2 goes so far as to name the instruments used, but why does it not tell us that ADCP-specific datasets at PANGAEA actually do carry a “+/-“-column in their data table (it does for cruises AT08-04, https://doi.org/10.1594/PANGAEA.811565, and M80/1, https://doi.org/10.1594/PANGAEA.811718, – but also for all others?).
(The ADCP example demonstrates how important this information is, as in the above cases the median of the relative error is 60 and 100%, resp., and just 20 and 10% of all values, respectively, claim less than 30% relative error. A “naïve” future user - e.g., from another discipline, or from a time when there might be a much more accurate instrument - would not suspect as much.) - Instead, section 4.1.2 sends us on a journey through references:
- First, at line 157, to Krahmann and Mertens 2021b, which is a PANGAEA reference, https://doi.org/10.1594/PANGAEA.926065, listing, however, all datasets from all “CTD data and additional sensors used on the CTDO system“. Actually, clicking on list item 13 (on the Atalante cruise), one is directed to a dataset which does not include any LADCP data!
- Second, also line 157, it directs us to table 2 (of this manuscript), which indeed has a row 4.1.2 with a PANGAEA reference to all SFB LADCP data. Why don’t authors provide this reference here, directly?
- Fun fact: PANGAEA names “[+/-]“ as the unit of the absolute error, which of course should be [cm/s].
Specific comment #2:
Immediately on going into the data reference, I encountered another irritant: While the main chapter 4 directs us, in its first paragraph (line 122), to “cruise reports where additional information about the data collected and methods used can be found”, there appears to be an inconsistency about error margins between the first of these reports (Atalante cruise AT08-04) and the associated first CTDO dataset: On the end of p.10 the cruise report, http://dx.doi.org/10.3289/ifm-geomar_rep_19_2008, appears to claim an error (“rms difference”) of 0.052 ml/l in the oxygen data, while at PANGAEA, the CTDO data for this cruise, https://doi.org/10.5194/bg-10-5079-2013, appears to claim a “nc_uncertainty_o = 1.000000” in the metadata “comment” item, which one might, tentatively, amend with the unit for oxygen from the “parameter(s)” table, namely µmol/kg.
As these claims appear to be inconsistent (https://ocean.ices.dk/tools/unitconversion.aspx), one is forced to think: Should I take the PANGAEA uncertainty value as the most authoritative, since the somewhat substantial description of data processing and calibration in this article seems to refer to it – and disregard the (preliminary?) estimates from the cruise report? Should I do this in general, that is: for all cruises and parameters? Will I then find nc_uncertainty_xy values or error columns in all datasets?
Citation: https://doi.org/10.5194/essd-2020-308-RC4 - Section 4.1.2 goes so far as to name the instruments used, but why does it not tell us that ADCP-specific datasets at PANGAEA actually do carry a “+/-“-column in their data table (it does for cruises AT08-04, https://doi.org/10.1594/PANGAEA.811565, and M80/1, https://doi.org/10.1594/PANGAEA.811718, – but also for all others?).
-
EC1: 'Comment on essd-2020-308', David Carlson, 04 May 2021
Authors as well as followers of this discussion will know that we received careful thoughtful comments from four very-experienced reviewers. In my charge to those reviewers, I asked each reviewer to assess specific questions: 1) Have authors organized their list in a useful manner?; 2) Does this represent a good summary of their SFB activities?; 3) Will the catalog prove useful to oceanographers looking at marine oxygen?; and 4) Does the product belong in ESSD?
Reviewers have responded with an overwhelmingly negative assessment. These reviewers either insist on major revisions (two of four) or call for outright rejection (remaining two). In no case does any of us diminish the long decade of important accomplishments under SFB funding to GEOMAR. We do however find that a post-hoc catalog does not add value or certify quality as ESSD and its readers expect.
I recommend that lead author Gerd Krahmann consult with co-authors. They should not attempt to prepare rebuttals to all evaluations nor should they make any further effort on substantial revision.
Citation: https://doi.org/10.5194/essd-2020-308-EC1
Interactive discussion
Status: closed
-
RC1: 'Comment on essd-2020-308', Giuseppe M.R. Manzella, 19 Apr 2021
General comment
The authors have provided a list of campaigns conducted as part of a project conducted from 2008 to 2019. The data collected during these campaigns are certainly useful for understanding many aspects related to climate-biogeochemical interactions. The data is accessible and available to every user, according to the FAIR concepts that animate the scientific community today.
The description of the sampling and quality control methodologies is well done. The presentation is therefore complete and satisfying.
Specific Comment
The authors assert that the final publication of the data in PANGEA was made after 3 years to allow the analysis of the data to the members of the work team. In addition, some data sets have been published in specialized databases. Campaign information can be found in other metadatabases. For example, the ATA_IFMGEOMAR campaign can also be found in
https://campagnes.flotteoceanographique.fr/campaign
BRANDT Peter (2008) GEOMAR4 cruise, RV L'Atalante, https://doi.org/10.17600/8010030
Is submitting a list of datasets within ESSD's publication policy? In an article devoted to data, some of which are certainly published in articles, I would expect answers to questions such as:
- What was the original purpose of the data collection?
- What were the applied quality assurance / quality control protocols?
- How were the data originally used?
- Have the data been reused?
- How has the data been assessed for fitness of use?
The first two questions have excellent answers in the article, but the others are absent. Furthermore, in an ambition project such as the one presented by the authors, I would have expected indications on the analysis of multidisciplinary data. The article seems to me essentially a technical report.
Citation: https://doi.org/10.5194/essd-2020-308-RC1 -
AC1: 'Reply on RC1', Gerd Krahmann, 30 Apr 2021
Dear Guiseppe Manzella,
thank you very much for reviewing our manuscript!
Under 'Specific Comment' you raise five questions.
What was the original purpose of the data collection?
What were the applied quality assurance / quality control protocols?
How were the data originally used?
Have the data been reused?
How has the data been assessed for fitness of use?The first two questions you deem well answered, but for the following three you find no appropriate answer in the text.
We somehow have the impression that the three questions are not fully appropriate to the submitted manuscript category. The manuscript was submitted in the category 'Review articles'. See
https://www.earth-system-science-data.net/about/manuscript_types.htmlAccording to the ESSD descriptions such a review article may
'describe recombinations of existing (historic) datasets' .
That is where we locate our manuscript.We will try to answer the questions here and state why they are not answered in the manuscript. We have added two sentences to make the intent and scope of the manuscript clearer.
How were the data originally used?
We have modified the end of the introduction to make clear how the data was originally used:
To date 793 peer-reviewed scientific papers, theses and other publications have been generated by the members of the SFB 754 (enter ‘sfb754‘ into the search field at https://oceanrep.geomar.de). In many of these publications observational data sets are fully described, assessed, used and originally published. The aim of this article is to summarize and list these published observational data sets collected by the SFB 754 all together in a clear structured way for easier access and find-ability.
Have the data been reused?
The more than 1000 underlying datasets have been reused many times.
The question seems to be more suitable to the ESSD manuscript type 'Data description paper' where one would like to avoid duplicate descriptions of the same data. In the manuscript we summarize and organize a large number of existing data sets that most often have been used and described before. For all methods subsections and datasets we give the references in which they are fully described and used.How has the data been assessed for fitness of use?
This question also seems to be directed to smaller data sets when they are first published. We are describing already published and in most cases used and described data sets. For our manuscript we ourselves have not assessed any data for fitness of use but rely on the assessments that were made when the single data sets were first published in scientific papers and in data centers. We will add a sentence to the beginning of section 4 that will make this clearer:
The resulting datasets have been described, assessed, used and published in a large number of publications. Here we briefly summarize the methods used and refer to the relevant publications in which the methods have been described in detail.
Citation: https://doi.org/10.5194/essd-2020-308-AC1
-
RC2: 'Comment on essd-2020-308', Anonymous Referee #2, 26 Apr 2021
General comment
This manuscript gives an overview of, and describes the scientific activities and resulting datasets of the SFB754. The report contains the metadata of all cruise conducted within this project, as well as details on the methods used and the treatment of the data.
This overview is a good entry point into the vast array of data that were collected within 2008-2019 in the frame of this project and contains important information for data re-use.
Specific Comments
Page 13, line 95: Link to https://www.sfb754.de/sfb754-osis is misleading, as it shows everything in OSIS and one has to go to “context” – SFB754 to get the real subset, is there no direct link to SFB754 in OSIS? If OSIS is able to store the data I am missing the explanation why the data was additionally published in other data centres.
Page 13, line 111 and following (=Table 2):
Link https://doi.org/10.1594/PANGAEA.926545 does not work, the dataset is still in review (https://doi.pangaea.de/10.1594/PANGAEA.926545) and it is the only one which starts with “SFB 754” (with empty space) instead of SFB754, please correct. Also in this collection quite a few datasets appear as “unpublished” and are access restricted. I would like to have more transparency on how many datasets are actually published/accessible and why some are not. (even those that are 5 years old, see e.g https://doi.pangaea.de/10.1594/PANGAEA.861224 )
A few collections could maybe be combined or more clearly ordered, I am not sure e.g why the collection https://doi.pangaea.de/10.1594/PANGAEA.926794 contains those two entries, they don`t seem to contain similar data. Also, one of the entries in itself is a collection (https://doi.pangaea.de/10.1594/PANGAEA.903023) which makes the whole thing really confusing. Maybe reduce the number of collections and make it a little bit more general (e.g. only one collection for all datasets of the BIGO lander).
Page 26, line 398: watercontent should be “water content”
Page 27, line 425 and page 31, line 540: “analyzed” – in all other cases you write “analysed”
Page 40, line 749: “rhizone” probably is rhizome?
Page 41: In my opinion, a chapter (after chapter 4) on the most important results (with links to the respective publications) and an outlook is missing. The following questions are unanswered:
- What are the most prominent results of this project, which datasets have been combined to obtain these results?
- Did the project reach it`s goal, did it surpass it or is there anything left unclear?
- How can the data best be used, can it be combined?
Reference list: in the citation of the datasets you always include “Dataset.”. As far as I can see this is not part of the original citation, is this intentionally done to separate them from the other references?
Citation: https://doi.org/10.5194/essd-2020-308-RC2 -
RC3: 'Comment on essd-2020-308', Anonymous Referee #3, 29 Apr 2021
This manuscript describes a catalog of datasets obtained during a long-time research project. At this stage, I do not recommend to publish it for the following reasons:
The information about the long-time SFB project is not sufficient. The reader does not get a feeling why such a large project was done and what connects the different disciplines. Dissolved oxygen is mentioned and some textbook knowledge of oxygen distribution and their drivers are given. This is not enough to take the reader by the hand and show what great science has been done in the project.
The manuscript reads like a technical report.
The list of datasets is not complete, as the authors write. Therefore, this report comes too early. Some of the data appears in other data centers than Pangaea, which is also mentioned section 5. However, that summary does not say which data is concerned. The authors did not explain why some of the data were submitted elsewhere. Couldn’t they have mirrored the data in Pangaea to really have all data from the SFB together?
The information about the data is found in section 4. However, in this section mainly the descriptions of the methods used are presented. This is not the same as a description of the data. What I expected to find here is a real description of the data and what has been done with that data: What kind of quality control was applied, how much data had to be discarded, which methods for data management were used, etc. A mere methods description is not useful in an ESSD paper.
Even when it is useful to have a list with all the cruises and datasets of this large project, this information can easily be provided in a technical report.
Some minor comments:
Figure 2 and 3: Please add some geographical names to the figures: countries, cities and in particular those names occurring in the text (e.g. ports)
L63-65 “The three 4-year long phases allowed for the development and adaptation of the observational and experimental program. Questions arising from the data already collected were incorporated into new sub-projects for the subsequent project phases.” It would be nice to see some examples of this. As it is here, it is very abstract.
L88-89 “One of the first steps after the inception of the SFB 754 was the development and implementation of a common data policy (https://oceanrep.geomar.de/47369).” This is a sentence I expect in a technical report.
L91-92 “This data policy and its strict application is one of the reasons for the success of the SFB 754 with 421 peer reviewed publications at the time of writing.” ditto
L97-98 “In the final step the data was published and made freely available at the World Data Center PANGAEA (https://www.pangaea.de) or at other more specific data centers.” Why wasn’t it part of the data management plan to have all data together in one data center? This is not in conducive to the FAIR data principles, in particular the findability.
L99-100 “the rules of the data policy were quite generic.” I do not quite understand that. Rules should not be different for different data fields. Maybe the meaning behind “rules” as used here is not clear. Please modify the sentence to make clear what exactly you mean.
L102 “initial versions” of the data I suppose
L109-110 “Some of the data sets have been published elsewhere on more specialized databases. These are explicitly mentioned in the text below.” It would be better to have all data sets here in this table. It would be easy in the table to discern the data sets not in Pangaea. Thus all data sets would be in overview together.
Citation: https://doi.org/10.5194/essd-2020-308-RC3 -
RC4: 'Comment on essd-2020-308', Anonymous Referee #4, 02 May 2021
General comments:
This manuscript provides an “inventory” (line 27) to the data legacy of a major research program („SFB 754“). In fact, it states (line 58) that “The aim of this article is to describe and list the published observational data sets collected by the SFB 754 for easy access and find-ability.” Does it really describe a “recombination of existing (historical) datasets”? What, beyond providing a compact listing and cross-reference, is the added value of this inventory?
One could easily justify it as a highly valuable companion report, supporting the overall final report of SFB 754. It ties together and identifies some 1,000 datasets from 34 cruises performed over more than 10 years, from a wide range of (oceanographic) disciplines (lines 51–53) and links them with their supporting documentation (e.g., cruise reports, best practice documents, etc.). The manuscript provides some interesting indications about the program’s data policies and practices (including the existence of data processing pipelines from raw to published data, lines 96-97, which appear, however, only accessible to SFB consortium members, lines 93-94).
According to the manuscript (line 92), 421 publications were already generated from that research. Due to their intimate knowledge of instrumentation and methods, the authors and reviewers of those publications were certainly convinced that the underlying data were indeed fit for their 421 purposes.
But as ESSD is about supporting the reuse of data in other contexts than that of their creators’ research, the most important criterion on the datasets and/or the ESSD articles is to provide clear (and, as far as possible, easily interpreted) information which helps to evaluate or assess the datasets’ fitness for the purposes of a third party (aka, its quality).
(ESSD review criteria ask „Are error estimates and sources of errors given (and discussed in the article)”)Does this manuscript support these aims and does it meet this criteria of ESSD? I think not, as none of the subsections on specific parameters does provide direct information on quality:
Each of the subsections, on individual observed parameters, describes the measurement methods explicitly – some very briefly (4.1.2, 4.2.4), some in much more detail (most subsections of 4.3 and 4.5). Many, but not all subsections (on physical oceanography) then refer to individual sections of the “GO-SHIP best practices”. Many subsections refer to individual research articles (as well). While we can only hope that the GO-SHIP website will survive until it will be needed by a user of these datasets, at least some of the articles referenced are behind paywalls: E.g., 4.1.3, line 176, Hahn et al. €37.40; 4.1.6, line 221, Foltz et al. $8-$49. As the Hahn article, in particular, is referred to in the context of measurement errors (line 179), this defies the dedication of ESSD to provide such information in Open Access (above the inconvenience of scrutinizing yet another article for the information which should have been provided explicitly here or in PANGAEA tables).ESSD review criteria further ask reviewers to: “Consider article and data set: are there any inconsistencies within these, implausible assertions or data,…” Specific comments, below, show that one will encounter – right from the beginning - problems doing this kind of consideration for this manuscript and its 1,000 (heterogeneous) datasets, and it appears completely out of the capacity of one or even 4 reviewers to comply with this, even on a random but representative sample from the 1,000. This would put any such manuscript out of the clearly delimited aims of ESSD – except it would provide an algorithmic way and an execution environment to perform these tasks (however that might work – but it appears to this reviewer as one possible benchmark of FAIRness: That data and metadata are machine-readable and interoperable).
As a last general comment, ESSD review criteria ask: “Is there any potential of the data being useful in the future?”. The thoroughness of the description of methods (and, presumably, their consistent execution), and the broad interdisciplinary application within the SFB research program hint at further applicability. Authors mention this in Conclusions, but do not provide examples beyond the (immediate) thematic realm of SFB 754. (Would it be possible and useful to compare these with similar (climate-BGC) data from non-tropical areas? Is the collection as a whole significant to track effects of climate change, and if so: when might – selective - repetition be advisable, thus setting a lower time limit of necessary preservation of this data collection and its interoperability?)
In summary, I do not see how this “review article” provides for the “recombination” of existing data, nor do I think that it meets the general aims of ESSD to help assess the usefulness of data for readers’ purposes. It is up to editors, if they wish to establish a new manuscript type or extend the scope of review articles to include the description of data inventories. In the absence of such decision, I do not suggest to accept this manuscript for ESSD, useful as it may be in another context.
Specific comment #1:
While the manuscript may be the best an inventory can do to describe the maze of individual datasets and their listings, ordered by parameter observed, cruise where such observation was performed, and of pertinent references to ancillary information (such as cruise reports, best practices), it makes it truly hard for someone, say, interested in ocean currents, 25 years from now:
- Section 4.1.2 goes so far as to name the instruments used, but why does it not tell us that ADCP-specific datasets at PANGAEA actually do carry a “+/-“-column in their data table (it does for cruises AT08-04, https://doi.org/10.1594/PANGAEA.811565, and M80/1, https://doi.org/10.1594/PANGAEA.811718, – but also for all others?).
(The ADCP example demonstrates how important this information is, as in the above cases the median of the relative error is 60 and 100%, resp., and just 20 and 10% of all values, respectively, claim less than 30% relative error. A “naïve” future user - e.g., from another discipline, or from a time when there might be a much more accurate instrument - would not suspect as much.) - Instead, section 4.1.2 sends us on a journey through references:
- First, at line 157, to Krahmann and Mertens 2021b, which is a PANGAEA reference, https://doi.org/10.1594/PANGAEA.926065, listing, however, all datasets from all “CTD data and additional sensors used on the CTDO system“. Actually, clicking on list item 13 (on the Atalante cruise), one is directed to a dataset which does not include any LADCP data!
- Second, also line 157, it directs us to table 2 (of this manuscript), which indeed has a row 4.1.2 with a PANGAEA reference to all SFB LADCP data. Why don’t authors provide this reference here, directly?
- Fun fact: PANGAEA names “[+/-]“ as the unit of the absolute error, which of course should be [cm/s].
Specific comment #2:
Immediately on going into the data reference, I encountered another irritant: While the main chapter 4 directs us, in its first paragraph (line 122), to “cruise reports where additional information about the data collected and methods used can be found”, there appears to be an inconsistency about error margins between the first of these reports (Atalante cruise AT08-04) and the associated first CTDO dataset: On the end of p.10 the cruise report, http://dx.doi.org/10.3289/ifm-geomar_rep_19_2008, appears to claim an error (“rms difference”) of 0.052 ml/l in the oxygen data, while at PANGAEA, the CTDO data for this cruise, https://doi.org/10.5194/bg-10-5079-2013, appears to claim a “nc_uncertainty_o = 1.000000” in the metadata “comment” item, which one might, tentatively, amend with the unit for oxygen from the “parameter(s)” table, namely µmol/kg.
As these claims appear to be inconsistent (https://ocean.ices.dk/tools/unitconversion.aspx), one is forced to think: Should I take the PANGAEA uncertainty value as the most authoritative, since the somewhat substantial description of data processing and calibration in this article seems to refer to it – and disregard the (preliminary?) estimates from the cruise report? Should I do this in general, that is: for all cruises and parameters? Will I then find nc_uncertainty_xy values or error columns in all datasets?
Citation: https://doi.org/10.5194/essd-2020-308-RC4 - Section 4.1.2 goes so far as to name the instruments used, but why does it not tell us that ADCP-specific datasets at PANGAEA actually do carry a “+/-“-column in their data table (it does for cruises AT08-04, https://doi.org/10.1594/PANGAEA.811565, and M80/1, https://doi.org/10.1594/PANGAEA.811718, – but also for all others?).
-
EC1: 'Comment on essd-2020-308', David Carlson, 04 May 2021
Authors as well as followers of this discussion will know that we received careful thoughtful comments from four very-experienced reviewers. In my charge to those reviewers, I asked each reviewer to assess specific questions: 1) Have authors organized their list in a useful manner?; 2) Does this represent a good summary of their SFB activities?; 3) Will the catalog prove useful to oceanographers looking at marine oxygen?; and 4) Does the product belong in ESSD?
Reviewers have responded with an overwhelmingly negative assessment. These reviewers either insist on major revisions (two of four) or call for outright rejection (remaining two). In no case does any of us diminish the long decade of important accomplishments under SFB funding to GEOMAR. We do however find that a post-hoc catalog does not add value or certify quality as ESSD and its readers expect.
I recommend that lead author Gerd Krahmann consult with co-authors. They should not attempt to prepare rebuttals to all evaluations nor should they make any further effort on substantial revision.
Citation: https://doi.org/10.5194/essd-2020-308-EC1
Data sets
SFB754 data collections Krahmann, G., Arévalo-Martínez, D. L., Dale, A. W., Dengler, M., Engel, A., Glock, N., Grasse, P., Hahn, J., Hauss, H., Hopwood, M., Kiko, R., Loginova, A., Löscher, C. R., Maßmig, M., Roy, A.-S., Salvatteci, R., Sommer, S., Tanhua, T., and Mehrtens, H. https://www.pangaea.de/?q=@sfb754
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
1,536 | 708 | 88 | 2,332 | 168 | 85 | 87 |
- HTML: 1,536
- PDF: 708
- XML: 88
- Total: 2,332
- Supplement: 168
- BibTeX: 85
- EndNote: 87
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
Cited
Damian L. Arévalo-Martínez
Andrew W. Dale
Marcus Dengler
Anja Engel
Nicolaas Glock
Patricia Grasse
Johannes Hahn
Helena Hauss
Mark Hopwood
Rainer Kiko
Alexandra Loginova
Carolin R. Löscher
Marie Maßmig
Alexandra-Sophie Roy
Renato Salvatteci
Stefan Sommer
Toste Tanhua
Hela Mehrtens
This preprint has been withdrawn.
- Preprint
(4773 KB) - Metadata XML
-
Supplement
(421 KB) - BibTeX
- EndNote