A long term (1965-2015) ecological marine the LTER- Italy site Northern Adriatic Sea: plankton and oceanographic 2 observations 3

In this paper we describe a 50 years (1965-2015) ecological database containing data collected in the Northern Adriatic Sea (NAS), one of the 25 research parent sites belonging to the Italian Long Term Ecological Research Network (LTER-Italy, 19 http://www.lteritalia.it). LTER-Italy is a formal member of the international (https://www.ilter.network) and European 20 (http://www.lter-europe.net/) LTER networks. The NAS is undergoing a process, led by different research institutions and 21 projects, for the establishment of a marine ecological observatory, building on the existing facilities, infrastructures, and long- term ecological data. Along this process, the implementation of the Open Access and Open Science principles has started, by 23 creating an open research lifecycle that involves sharing ideas and results (scientific papers), data (raw and processed), 24 metadata, methods, and software. The present data paper is framed within this wider context. The database is composed of observations on abiotic parameters, phyto- and zooplankton abundances, collected during 299 cruises in we


33
We describe in this paper a 50 years  ecological database containing data on plankton communities and related

129
• Data treatment: some data are basically raw, e.g., data registered by CTD are reported into the database as they are 130 delivered from the instrument; other data need some elaboration to obtain specific parameters' value (e.g., nutrients, 131 chlorophyll-a, plankton abundance);

132
• Methodologies and units of measurements: e.g., changes of methodologies due to the introduction of CTD 133 measurements; change of the units of measure of salinity, which passed from g l -1 to a dimensionless parameter.

134
• Data format: data collected between 1965 and 1990 were registered only on paper archives, while those from 1990 135 onwards on spreadsheets.

136
In particular, methodological protocols and associated documentation changed through time. Several sensors are described and

153
In Figure 3a, the geographical coverage of the entire database is shown. Red dots represent the real observation points, while In the following years, when the GPS allowed a better precision of the sampling position, researchers often continued referring 163 to the nodes of the grid for the station names and they adopted a nomenclature coherent with the one of the original grid also 164 for new sampling stations. For example, the new sampling point located eastward of the "09/2E" station is named "10/2E", since it is located at the same longitude (2E), but different latitude of "09/2E" station ( Figure 3b). In Figure 3c, a 3D view of 166 the entire database is shown.

167
Due to transcription errors occurred during the oldest cruises, some data were misplaced, falling on land or outside the NAS.

168
A Python script (available under GNU GPL v.3 license here: https://github.com/CNR-ISMAR/econaos/tree/master) has been 169 written in order to correct this kind of errors. The same script implemented also a routine to homogenize different names of 170 the same sampling station (e.g. station "020D" could appear as well as station "02-0D" or "02/0D" or "020D_07/07/1968).
We selected the name reported on the original stations' network grid ( Figure 2) and we created from these stations a vector 172 layer (black crosses in Figure 3). Finally, since some stations changed their name through time, in order to maintain coherence 173 with the same sampling point, we appointed them with the last, most recently used name.

202
Instruments and sensors changed over the 50 year period, due to technological and scientific progress. Furthermore,

203
instruments are also subject to degradation and need to be replaced. It is essential to preserve the information about these 204 instrument changes and upgrading, to track the reliability of the measurements.

205
In order to appropriately document data and guarantee the consistency of data within the database, we collected most ancillary

211
Plankton data are particularly sensitive to the skill of the operators, in particular during the microscope analyses of the samples.

212
The change of the operators, which necessarily occurred during 50 years, actually could hamper the data comparison across 213 time. To deal with this issue, internal education and recurring calibration of taxonomic competence were carefully considered,

214
with training periods and intercalibrations phases.
The phytoplankton was gathered and analyzed with the same method (Utermohl, 1958) across the years. In the database we 222 report the total phytoplankton abundances and the following main groups: diatoms, dinoflagellates (naked and armoured cells), 223 coccolithophorids and "others", which include the sum of cells belonging to cryptophyceans, crysophyceans, 224 prymnesiophyceans (except coccolithophorids), prasinophyceans and chlorophyceans, whose sizes lie between 4 and 20 μm 225 and often remain undetermined. Mesozooplankton was always identified under a stereo-microscope and expressed as the total 226 number of organisms per cubic meter. Compared to phytoplankton, the mesozooplankton data are much fragmented over time: 227 they cover a 28 year period, from 1987 to 2015, for a total of 372 observations.

242
Around 89% of the observations of the database refers to the years 1999-2015 and the remaining 11% covers the previous 33 243 years (see Figure 5a for details). This is mainly due to the adoption of CTD probes since 1999 for measuring abiotic parameters

249
The database presents a heterogeneous number of observations for each parameter, mainly due to: (i) parameter priority for

278
Following the OGC Sensor Web Enablement (SWE) web service, each instrument or procedure has to be filled out as a

303
Observations can upload using the graphical interface or, for the skilled people, using an XML language directly into SOS

304
(Sensor Observation Service) web service. For the upload from the interface, data have to be formatted in a table with datetime 305 and parameter value (Figure 9). Since the speed of the process largely depends on the browser used to upload data, most of the 306 data have been uploaded, through a Python script, by formatting specific .xml files, containing information about the sensor's 307 ID, sampling station, and date time and following SWE standard. In both cases, the data upload begins with the selection of 308 the sensor we want to upload data from and, then, with the selection of the sampling station from the map, if already available, 309 or by creating a new one.
IDentifier) to the uploaded dataset. Thanks to an agreement between the eLTER Research Infrastructure and the EUDAT data are immediately fully available for download and reuse upon citation, without embargo rules or any further limitations.

335
The 50-year database of plankton and abiotic parameters in the NAS may contribute to an in-depth comprehension of plankton 336 dynamics required not only to manage aquatic resources but also to predict and tackle future environmental changes. Long-

337
term site-based studies on plankton may provide an invaluable opportunity to assess common or contrasting patterns of 338 variability, to understand how those patterns change at different scales and to hypothesize about causes and consequences.

339
Wide availability of the data on long-term variations of the planktonic system allows large scale studies that obviously go 340 beyond the local use, representing a source of information for cross-system analysis, allowing comparison among ecosystems 341 as well as new approaches in data analysis and in the development of water quality indicators.

342
However, these potential uses appear constrained by issues that are intrinsic to long-term series and that are related to the

350
In EcoNAOS we involved, since its start, both LTER and data management researchers in a joint partnership. In particular, the 351 elaboration of the 50-year datasets has been worked out by a small group of plankton ecologists and data management experts,

352
with the aim of sharing and harmonizing as well the different experiences, needs and points of view. This participatory process 353 is recognized to be crucial to contribute overcoming cultural differences, barriers and fragmentation that might represent an technical evolution of OpenStreetMap in response to humanitarian events. In Proceedings of the 33rd annual ACM conference