PlanktoShare: A large (50k+) and FAIR learning set for the Plankton Imager (Pi-10) for the Greater North Sea and NE Atlantic, based on a new flexible classification protocol
Abstract. The use of imaging techniques for the study of particles and plankton is a rapidly advancing field in marine sciences. The data the tools produce require automated classification solutions that are trained on learning sets of manually labelled images. In this study we present PlanktoShare, a comprehensive (50k+ images) database with manually labelled images captured by the vessel-mounted Plankton Imager (Pi-10), in the 200 – 2,000 μm size range and including phytoplankton, holoplankton, meroplankton and various gelatinous taxa. The Pi-10 images particles continuously in a flow-through mode and can operate alongside research operations and during transits making it a popular choice for plankton monitoring. PlanktoShare provides a robust resource for training classifiers as an open resource. A key challenge in developing classifiers such as these is that commonly arises when merging learning sets from different sources, because images are often organized in a folder-like structure with incompatible or inconsistent nomenclature. To address this, we propose a database approach which separates the taxonomic information from descriptive attributes. Each image is assigned to one of the classes ‘Organism’ (whole organism), ‘Taxo_particle‘ (particle with taxonomic information, such as exuvia) and ‘Non_taxo_particle’ (particle without taxonomic information, such as marine snow aggregates). Taxonomic information is standardised using the aphiaID system from the WOrld Register of Marine Species while additional descriptive information (e.g. Life_stage) is stored as attributes. This database approach ensures full interoperability across learning sets from research groups allowing rapid expansion of geographical coverage and improved classification performance. Finally, we provide open-source code to apply pre-trained classifiers for users and outline future directions for collaborative plankton imaging.