the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Best Practices for Data Management in Marine Science: Lessons from the Nansen Legacy Project
Abstract. Large, multidisciplinary projects that collect vast amounts of data are becoming increasingly common in academia. Efficiently managing data across and beyond such projects necessitates a shift from fragmented efforts to coordinated, collaborative approaches. This article presents the data management strategies employed in the Nansen Legacy project (Wassmann, 2022), a multidisciplinary Norwegian research initiative involving over 300 researchers and 20 expeditions into and around the northern Barents Sea. To enhance consistency in data collection, sampling protocols were developed and implemented across different teams and expeditions. A searchable metadata catalogue was established, providing an overview of all collected data within weeks of each expedition. The project also mandated a policy for immediate data sharing among members and publishing of data in accordance with the FAIR guiding principles where feasible. We detail how these strategies were implemented and discuss the successes and challenges, offering insights and lessons learned to guide future projects in similar endeavours.
- Preprint
(2575 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 17 May 2025)
-
RC1: 'Comment on essd-2025-56', Anonymous Referee #1, 05 May 2025
reply
This reviewer thoroughly enjoyed reading this manuscript, and found the presented data management approach and strategies to be commendable and significantly valuable; outlining the technical and cultural aspects considered for the creation, management and publication of FAIRer data resulting from large, coordinated, oceanographic research efforts. Telling this story, which includes successes and lessons learned, can serve as a model for future efforts, further facilitating culture change toward a more FAIR and open oceanographic research community.
The manuscript is also well organized and easy to read. This reviewer has general and minor specific comments that may help improve the readability of the text as it relates to the underlying FAIR/Best Practices theme being conveyed. These are outlined below:
General:
- The manuscript frames the strategies and implementations in the context of addressing the FAIR principles. The authors mention how FAIR is often referenced in abstract or incomplete ways in section 5.1 (lines 214-216), however the text itself does not relate each strategy or implementation to the specific principles themselves in a holistic or comprehensive manner.
While this reviewer agrees with all statements that predicate the principles on machine readability or improving reuse as an ultimate goal, only superficial references were made as to how the project’s implementations satisfied or related to each principle or sub-principle. It would be helpful in justifying or substantiating these statements by the authors to either organize the text along the principles, or link the statements as they occur in the text to the explicit principle or sub-priniciples they address and how. This is just a suggestion as it is realized this could require substantial revision.
- This reviewer cautions against the use of ‘fair-compliance’. The FAIR principles are just that, principles. As such, there are a variety of strategies to employ that improve the alignment or achievement of data toward their end goal. But implying data can meet some metric to be FAIR or unFAIR is incorrect. Rather, data can be _more_ FAIR along a continuum, or relative to other data. The term ‘fair-compliance’ implies a pass/fail measurement against some metric.
This relates especially to section 5.3.1, where lines 325-331, implying some data centers do not comply with mentioned metadata standards such as ISO 19115, or GCMD DIF, thus reducing their catalog’s findability. While it is agreed that the selection process the Nansen Legacy project employed was robust and laudable, the standards referenced are not the only ones available, and if a center employs, for example, schema.org or EML, they may still achieve findability. The challenge with FAIR is that multiple implementations can be community or center -specific, thus, commonly used formats may still need crosswalks between them to improve ‘center interoperability’.
- In reference to the conclusions drawn by the Nansen Legacy project, that a two pronged blend of technology and supportive cultural environment is needed for success, in combination with the description of a committed leadership team and a dedicated data management committee, it is curious that no findings or recommendations of the cultural side of changing a research group to a more data management savvy one was provided. Likewise, it would be helpful to understand (with even a brief description) the process of the governance development process for the data management effort. It is truly impressive and worth understanding its early evolution to serve as an example for future projects and programs.
Specific:
- Section 4.3 beginning on line 179 describes lessons learned from the project’s data storage and sharing strategies, citing the challenges of governing data sharing across a large collaborative multi-institutional project, but offers no recommendations for future improvements. The international GEOTRACES program also has a data management committee that coordinates data sharing and the branding of GEOTRACES data is based upon project requirements being met. Researchers who were invested in their work being labeled as GEOTRACES strived to meet the requirements, which include the sharing of datasets for inclusion into the branded GEOTRACES Intermediate Data Product (IDP). Perhaps the visibility and gravitas of one’s research output holding the project brand could assist in the data sharing culture shift. I wonder if the Nansen Legacy project considered such branding strategies.
- Section 5.2, line 221, in the statement “Nansen Legacy data should be published in FAIR-compliant data formats such as NetCDF files compliant with the Climate and Forecast conventions (Eaton et al., 2024) or Darwin Core Archives (Darwin Core Community, 2010) whenever possible (The Nansen Legacy, 2024).” it should be noted that NetCDF format is not necessarily FAIR in and of itself, but can be when semantically enabled terms such as CF are used in combination. This sentence is a bit leading, and would suggest removing the second instance of ‘compliant’ after NetCDF files: “... such as NetCDF files when accompanied by the Climate and…”
- Line 224 “... which aims to make all data relevant to Svalbard available…” should be edited to “which aims to make all data relevant to Svalbard discoverable…” since the sentence is referring to data ‘findability’, unless the authors are also claiming accessibility via SIOS as well, in which case it can be edited to do so.
- In section 5.3.3 Granularity (lines 391-392), the statement regarding avoidance of mixing feature types and vertical dimensions could use some clarity. For example, can surface observations at depth=0 not be combined with vertical profile observations? This seems perfectly acceptable. Likewise, there exist many time series datasets that consist of vertical profiles aggregated over time that are still easily usable in tabular or other formats. Although the thought is agreed with, the examples do not seem to convey or substantiate the thought.
Citation: https://doi.org/10.5194/essd-2025-56-RC1
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
180 | 22 | 9 | 211 | 11 | 9 |
- HTML: 180
- PDF: 22
- XML: 9
- Total: 211
- BibTeX: 11
- EndNote: 9
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1