Preprints
https://doi.org/10.5194/essd-2026-215
https://doi.org/10.5194/essd-2026-215
02 Apr 2026
 | 02 Apr 2026
Status: this preprint is currently under review for the journal ESSD.

PlanktoShare: A large (50k+) and FAIR learning set for the Plankton Imager (Pi-10) for the Greater North Sea and NE Atlantic, based on a new flexible classification protocol

Lodewijk van Walraven, James Scott, Sophie Pitois, Joseph Ribeiro, Hayden Close, James Pettigrew, Cecilia M. Liszka, Elaine Fileman, Jeroen Hoekendijk, Pieter Hovenkamp, Robbert Jak, Joost van Dalen, and Dick van Oevelen

Abstract. The use of imaging techniques for the study of particles and plankton is a rapidly advancing field in marine sciences. The data the tools produce require automated classification solutions that are trained on learning sets of manually labelled images. In this study we present PlanktoShare, a comprehensive (50k+ images) database with manually labelled images captured by the vessel-mounted Plankton Imager (Pi-10), in the 200 – 2,000 μm size range and including phytoplankton, holoplankton, meroplankton and various gelatinous taxa. The Pi-10 images particles continuously in a flow-through mode and can operate alongside research operations and during transits making it a popular choice for plankton monitoring. PlanktoShare provides a robust resource for  training classifiers as an open resource. A key challenge in developing classifiers such as these is that commonly arises when merging learning sets from different sources, because images are often organized in a folder-like structure with incompatible or inconsistent nomenclature. To address this, we propose a database approach which separates the taxonomic information from descriptive attributes. Each image is assigned to one of the classes ‘Organism’ (whole organism), ‘Taxo_particle‘ (particle with taxonomic information, such as exuvia) and ‘Non_taxo_particle’ (particle without taxonomic information, such as marine snow aggregates). Taxonomic information is standardised using the aphiaID system from the WOrld Register of Marine Species while additional descriptive information (e.g. Life_stage) is stored as attributes. This database approach ensures full interoperability across learning sets from research groups allowing rapid expansion of geographical coverage and improved classification performance. Finally, we provide open-source code to apply pre-trained classifiers for users and outline future directions for collaborative plankton imaging.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.
Share
Lodewijk van Walraven, James Scott, Sophie Pitois, Joseph Ribeiro, Hayden Close, James Pettigrew, Cecilia M. Liszka, Elaine Fileman, Jeroen Hoekendijk, Pieter Hovenkamp, Robbert Jak, Joost van Dalen, and Dick van Oevelen

Status: open (until 09 May 2026)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
Lodewijk van Walraven, James Scott, Sophie Pitois, Joseph Ribeiro, Hayden Close, James Pettigrew, Cecilia M. Liszka, Elaine Fileman, Jeroen Hoekendijk, Pieter Hovenkamp, Robbert Jak, Joost van Dalen, and Dick van Oevelen

Data sets

PlanktoShare: A large (50k+) and FAIR learning set for the Plankton Imager (Pi-10) for the Greater North Sea and NE Atlantic, based on a new flexible classification protocol Lodewijk van Walraven et al. https://doi.org/10.5281/zenodo.19119233

Model code and software

Plankton Imager Classifier Joost van Dalen https://github.com/geoJoost/planktoshare

Lodewijk van Walraven, James Scott, Sophie Pitois, Joseph Ribeiro, Hayden Close, James Pettigrew, Cecilia M. Liszka, Elaine Fileman, Jeroen Hoekendijk, Pieter Hovenkamp, Robbert Jak, Joost van Dalen, and Dick van Oevelen
Metrics will be available soon.
Latest update: 02 Apr 2026
Download
Short summary
PlanktoShare offers a large collection of carefully labelled plankton images collected using the Pi-10 Plankton Imager in the North-East Atlantic. It was created to overcome inconsistent naming across datasets and to support reliable classification. By standardizing taxonomic details and storing extra traits separately, the database enables sharing and combining learning sets. This helps expand global monitoring efforts and strengthens future plankton imaging research.
Share
Altmetrics