Best Practices for Data Management in Marine Science: Lessons from the Nansen Legacy Project

Marsden, Luke Harry; Godøy, Øystein; Gabrielsen, Tove Margrethe; Ellingsen, Pål Gunnar; Reigstad, Marit; Marquardt, Miriam; Morvik, Arnfinn; Sagen, Helge; Tronstad, Stein; Ferrighi, Lara

doi:https://doi.org/10.5194/essd-2025-56

Preprints

https://doi.org/10.5194/essd-2025-56

Preprints

10 Mar 2025

| 10 Mar 2025

Status: this preprint is currently under review for the journal ESSD.

Best Practices for Data Management in Marine Science: Lessons from the Nansen Legacy Project

Luke Harry Marsden, Øystein Godøy, Tove Margrethe Gabrielsen, Pål Gunnar Ellingsen, Marit Reigstad, Miriam Marquardt, Arnfinn Morvik, Helge Sagen, Stein Tronstad, and Lara Ferrighi

Abstract. Large, multidisciplinary projects that collect vast amounts of data are becoming increasingly common in academia. Efficiently managing data across and beyond such projects necessitates a shift from fragmented efforts to coordinated, collaborative approaches. This article presents the data management strategies employed in the Nansen Legacy project (Wassmann, 2022), a multidisciplinary Norwegian research initiative involving over 300 researchers and 20 expeditions into and around the northern Barents Sea. To enhance consistency in data collection, sampling protocols were developed and implemented across different teams and expeditions. A searchable metadata catalogue was established, providing an overview of all collected data within weeks of each expedition. The project also mandated a policy for immediate data sharing among members and publishing of data in accordance with the FAIR guiding principles where feasible. We detail how these strategies were implemented and discuss the successes and challenges, offering insights and lessons learned to guide future projects in similar endeavours.

Received: 29 Jan 2025 – Discussion started: 10 Mar 2025

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Luke Harry Marsden, Øystein Godøy, Tove Margrethe Gabrielsen, Pål Gunnar Ellingsen, Marit Reigstad, Miriam Marquardt, Arnfinn Morvik, Helge Sagen, Stein Tronstad, and Lara Ferrighi

Status: final response (author comments only)

RC1:
'Comment on essd-2025-56', Anonymous Referee #1, 05 May 2025
This reviewer thoroughly enjoyed reading this manuscript, and found the presented data management approach and strategies to be commendable and significantly valuable; outlining the technical and cultural aspects considered for the creation, management and publication of FAIRer data resulting from large, coordinated, oceanographic research efforts. Telling this story, which includes successes and lessons learned, can serve as a model for future efforts, further facilitating culture change toward a more FAIR and open oceanographic research community.
The manuscript is also well organized and easy to read. This reviewer has general and minor specific comments that may help improve the readability of the text as it relates to the underlying FAIR/Best Practices theme being conveyed. These are outlined below:

General:
The manuscript frames the strategies and implementations in the context of addressing the FAIR principles. The authors mention how FAIR is often referenced in abstract or incomplete ways in section 5.1 (lines 214-216), however the text itself does not relate each strategy or implementation to the specific principles themselves in a holistic or comprehensive manner.

While this reviewer agrees with all statements that predicate the principles on machine readability or improving reuse as an ultimate goal, only superficial references were made as to how the project’s implementations satisfied or related to each principle or sub-principle. It would be helpful in justifying or substantiating these statements by the authors to either organize the text along the principles, or link the statements as they occur in the text to the explicit principle or sub-priniciples they address and how. This is just a suggestion as it is realized this could require substantial revision.
This reviewer cautions against the use of ‘fair-compliance’. The FAIR principles are just that, principles. As such, there are a variety of strategies to employ that improve the alignment or achievement of data toward their end goal. But implying data can meet some metric to be FAIR or unFAIR is incorrect. Rather, data can be _more_ FAIR along a continuum, or relative to other data. The term ‘fair-compliance’ implies a pass/fail measurement against some metric.

This relates especially to section 5.3.1, where lines 325-331, implying some data centers do not comply with mentioned metadata standards such as ISO 19115, or GCMD DIF, thus reducing their catalog’s findability. While it is agreed that the selection process the Nansen Legacy project employed was robust and laudable, the standards referenced are not the only ones available, and if a center employs, for example, schema.org or EML, they may still achieve findability. The challenge with FAIR is that multiple implementations can be community or center -specific, thus, commonly used formats may still need crosswalks between them to improve ‘center interoperability’.

In reference to the conclusions drawn by the Nansen Legacy project, that a two pronged blend of technology and supportive cultural environment is needed for success, in combination with the description of a committed leadership team and a dedicated data management committee, it is curious that no findings or recommendations of the cultural side of changing a research group to a more data management savvy one was provided. Likewise, it would be helpful to understand (with even a brief description) the process of the governance development process for the data management effort. It is truly impressive and worth understanding its early evolution to serve as an example for future projects and programs.

Specific:
Section 4.3 beginning on line 179 describes lessons learned from the project’s data storage and sharing strategies, citing the challenges of governing data sharing across a large collaborative multi-institutional project, but offers no recommendations for future improvements. The international GEOTRACES program also has a data management committee that coordinates data sharing and the branding of GEOTRACES data is based upon project requirements being met. Researchers who were invested in their work being labeled as GEOTRACES strived to meet the requirements, which include the sharing of datasets for inclusion into the branded GEOTRACES Intermediate Data Product (IDP). Perhaps the visibility and gravitas of one’s research output holding the project brand could assist in the data sharing culture shift. I wonder if the Nansen Legacy project considered such branding strategies.

Section 5.2, line 221, in the statement “Nansen Legacy data should be published in FAIR-compliant data formats such as NetCDF files compliant with the Climate and Forecast conventions (Eaton et al., 2024) or Darwin Core Archives (Darwin Core Community, 2010) whenever possible (The Nansen Legacy, 2024).” it should be noted that NetCDF format is not necessarily FAIR in and of itself, but can be when semantically enabled terms such as CF are used in combination. This sentence is a bit leading, and would suggest removing the second instance of ‘compliant’ after NetCDF files: “... such as NetCDF files when accompanied by the Climate and…”

Line 224 “... which aims to make all data relevant to Svalbard available…” should be edited to “which aims to make all data relevant to Svalbard discoverable…” since the sentence is referring to data ‘findability’, unless the authors are also claiming accessibility via SIOS as well, in which case it can be edited to do so.

In section 5.3.3 Granularity (lines 391-392), the statement regarding avoidance of mixing feature types and vertical dimensions could use some clarity. For example, can surface observations at depth=0 not be combined with vertical profile observations? This seems perfectly acceptable. Likewise, there exist many time series datasets that consist of vertical profiles aggregated over time that are still easily usable in tabular or other formats. Although the thought is agreed with, the examples do not seem to convey or substantiate the thought.
Citation: https://doi.org/10.5194/essd-2025-56-RC1
RC2: 'Comment on essd-2025-56', Anonymous Referee #2, 30 Jun 2025

This manuscript is well written and provides a clear and generally well described overview of the methodologies and lessons learned from the data acquisition and management processes implemented as part of the Nansen Legacy Project. It also includes some well-considered and important recommendations for future research programmes.

The information provided is potentially very useful to those seeking to initiate a similar activity, which is demonstrated by the fact that the data management framework adopted by the Nansen Legacy Project has already been shown to be transferable to other similar activities within other organisations.

The approach to data management adopted by the Nansen Legacy project highlights the potential impact of fostering and implementing good data stewardship practices, which not only encourages cultural change among researchers but also ensure that data is open and accessible for wider reuse.

The manuscript could benefit from being made more concise in some places. There is also some minor duplication of information between sections that might be removed to improve the readability of this paper.

Specific comments and suggested edits for improving the manuscript are given below:

Suggested language (phrasing / grammar) edits:

Line 7: Suggested edit for clarity: “ The project also implemented a policy that mandates immediate data sharing….”

Line 18: Suggested edit to improve phrasing / readability “………..to be compared, synthesised and reused…..”

Line 44: Is there a reason that the first letter of “Publishing” is capitalized. This is not consistent with the form used elsewhere in the manuscript

Line 62: The sentence “Detailed protocols were developed in collaboration and published, coordinated by a senior engineer who worked closely with each researcher and group” is not well constructed and therefore difficult to read. Consider rephrasing to improve the readability and clarity.

Line 119: Construction of the sentence is rather awkward. Suggested edit for clarity: “Scientists included contact details for the principal investigator, estimated timeline for publication and details of any relevant embargo period for each dataset …….”

Line 121: Suggested edit to correct grammar: “These tables were included in the project’s data management plan……”

Line 167: Suggested edit to improve sentence construction and clarity: “However, permission from the relevant principal investigator was required before any use of the data.”

Line 173: Suggested edit to improve sentence construction: “Scientists were also encouraged to share their own Nansen Legacy data via the project area.”

Line 180: Suggested edit to correct grammar: “….data were instead often shared between project members via other methods.”

Line 183: Suggested edit to correct grammar: “…..unfamiliarity with using the NIRD platform for many project members……”

Line 186: Suggested edit to improve sentence construction: “However, governing this sensitive topic at scale within the project was deemed to be challenging and impractical and was instead managed on a case-by-case basis when data access was requested by a project member.”

Line 325: Suggested edit to improve phrasing / readability:” It is not practical for each data access portal to provide custom workflows to harvest metadata from each individual data centres.”

Line 351: It Is not clear what is meant by the word “opening” in this statement. Do you mean supporting open access and visualisation?
Technical remarks

Line 99: Although the Darwin Core and CF Standard Names are implemented “where possible” it is unclear if a standard vocabulary has been implemented alongside these standardised terms to ensure consistency of terminology across all metadata records. It would also be useful if the metadata standard that has been implemented is clearly identified.

Line 204 – 219: The purpose of including these selected project references is unclear. It does not enhance the paper and is superfluous in this context.
A number of references are made to FAIR compliance. It should be noted that the FAIR Principles are intended to be a set of guidelines that provide the framework for making data findable, accessible, interoperable and reuseable . Line 239 states that data should “meet the FAIR Principles”, suggesting that data is either compliant or it is not, which is not the case – it is not a binary assessment. Data that is FAIR is aligned with these guiding principles to some degree, and this alignment can be both assessed and improved. There are several FAIR assessment tools available that allow the “FAIRness” of a dataset to be determined and also indicate how this might be improved.

Line 234 suggests that there is a lack of tools to support delivery of FAIR data. However, there are several initiatives currently providing tools that support the adoption of the FAIR principles and delivery of data that is findable, accessible, interoperable and reusable. For example, the GO FAIR initiative has developed the FAIR Implementation Profile (FIP), a methodology that allows research communities to express their specific choices and practices for making data and metadata FAIR, which might be useful in this context.

Citation: https://doi.org/10.5194/essd-2025-56-RC2

Luke Harry Marsden, Øystein Godøy, Tove Margrethe Gabrielsen, Pål Gunnar Ellingsen, Marit Reigstad, Miriam Marquardt, Arnfinn Morvik, Helge Sagen, Stein Tronstad, and Lara Ferrighi

Viewed

Total article views: 489 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
404	62	23	489	37	57

HTML: 404
PDF: 62
XML: 23
Total: 489
BibTeX: 37
EndNote: 57

Views and downloads (calculated since 10 Mar 2025)

Month	HTML	PDF	XML	Total
Mar 2025	111	13	5	129
Apr 2025	55	5	3	63
May 2025	67	13	5	85
Jun 2025	60	8	1	69
Jul 2025	64	13	7	84
Aug 2025	47	10	2	59

Cumulative views and downloads (calculated since 10 Mar 2025)

Month	HTML	PDF	XML	Total
Mar 2025	111	13	5	129
Apr 2025	55	5	3	63
May 2025	67	13	5	85
Jun 2025	60	8	1	69
Jul 2025	64	13	7	84
Aug 2025	47	10	2	59

Viewed (geographical distribution)

Total article views: 467 (including HTML, PDF, and XML) Thereof 467 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 24 Aug 2025

Short summary

This article presents the data management strategies of the Nansen Legacy project, developed to handle data from 300+ researchers and 20 expeditions in the northern Barents Sea. Data collection protocols were documented and followed for consistency, and a searchable data overview was available soon after each cruise. The project required early data sharing and publishing in line with FAIR principles where possible. This article details these strategies to guide future projects.


Total:	0
HTML:	0
PDF:	0
XML:	0