Articles | Volume 18, issue 2
https://doi.org/10.5194/essd-18-945-2026
© Author(s) 2026. This work is distributed under the Creative Commons Attribution 4.0 License.
Benchmark of plankton images classification: emphasizing features extraction over classifier complexity
Download
- Final revised paper (published on 05 Feb 2026)
- Supplement to the final revised paper
- Preprint (discussion started on 06 Jun 2025)
- Supplement to the preprint
Interactive discussion
Status: closed
Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor
| : Report abuse
- RC1: 'Comment on essd-2025-309', Kaisa Kraft, 03 Sep 2025
- RC2: 'Comment on essd-2025-309', Jeffrey Ellen, 08 Sep 2025
- AC1: 'Comment on essd-2025-309', Thelma Panaïotis, 07 Nov 2025
Peer review completion
AR – Author's response | RR – Referee report | ED – Editor decision | EF – Editorial file upload
AR by Thelma Panaïotis on behalf of the Authors (07 Nov 2025)
Author's response
Author's tracked changes
Manuscript
ED: Referee Nomination & Report Request started (13 Nov 2025) by Sebastiaan van de Velde
RR by Kaisa Kraft (27 Nov 2025)
ED: Publish subject to minor revisions (review by editor) (30 Nov 2025) by Sebastiaan van de Velde
AR by Thelma Panaïotis on behalf of the Authors (15 Dec 2025)
Author's response
Author's tracked changes
Manuscript
ED: Publish subject to technical corrections (12 Jan 2026) by Sebastiaan van de Velde
AR by Thelma Panaïotis on behalf of the Authors (14 Jan 2026)
Manuscript
There is a huge number of studies conducted on the topic of plankton classification. The authors of the manuscript are trying to provide a baseline comparison dataset that different methods could be compared against, and a dataset that could be used in order to enable better comparison of results between the numerous studies. The aims of the study have good grounds, giving justification for the study. The authors present an interesting case with a well-designed manuscript. There are two minor topics of improvements: the authors emphasize and acknowledge only large datasets, in my opinion, smaller efforts in trying to provide public datasets should also be given some acknowledgement in the introduction, and another minor issue is in the way the authors present alternative methods, with for example no mentioning on open set classification methods and e.g., autoencoders, that have been demonstrated as rather promising methods for different tasks.
Specific comments with numbers referring to lines:
53-55: Could add some references to demonstrate the imbalance
57-58: True, but also, if a class has very distinguishable morphology, the required number of training images for the class to perform well will be much less. See e.g. Kraft et al. 2022 https://doi.org/10.3389/fmars.2022.867695
83-84: There is a mention of plankton traits, but the topic of traits has not been touched on previously in the introduction and would require a description earlier.
120-122: Two recent review articles are covered, but you are missing a third, even more recent review on plankton classification methods, Eerola et al. 2024 https://doi.org/10.1007/s10462-024-10745-y , in particular fig 3.
135-136: This is especially true for studies that concentrate on machine learning methods. If looking at the studies that concentrate more on the implications of the results in ecological, taxonomical, or operational contexts, the class-specific metrics are more often published. How did you come up with the list in Table S1? The 10 most cited are based on the Irisson et al. 2022, right? Is the rest cited in this paper the total of citations in your manuscript (sorry, I was too lazy to count them)? If so, are you sure they are representative of the entire literature on the topic, as you are not covering all recent publications with plankton recognition using CNNs? I would not draw this type of conclusion unless you have tried to ensure you cover all recent publications. A number of studies has been published since Irisson et al. 2022.
297-299: Why is this? It goes against the gut feeling, so in addition to references, could you also mention why?
335: change the term pure into something else e.g., how false positive-free the predicted plankton classes are
Table 4: It would be good to add the class-specific n for each class, as it most often will/can, but not exclusively, affect/explain the class-specific performance. And also, to the tables with other datasets. What means the Plankton? Do the colors mean something? Please add more information to the table caption. Yes, after scrolling down, there is also non plankton. Could the word classes be added after those, i.e., Plankton classes and Non plankton classes? I didn’t find the corresponding tables for the other datasets, where are they?
Header 3.3: Please change this from revealing the results into something like Model performance on small classes
Figure 2: Why did you choose to show accuracy and not F1 score in the first panel (the same comment goes also for the subsequent figures)? What is the Random classifier? It was mentioned in a paragraph starting from line 350, but it would require a better explanation.
375-385: Wouldn’t it be important to find a harmonic mean between precision and recall rather than emphasize the importance of precision and detection of rare classes over recall?
Header 3.4: I would rephrase this as well rather to be i.e., Model performance of a small CNN in plankton image classification
Header: 3.5: Importance of features and classifier
420-421: You mean recall and precision? You did not show F1 in the figures you are referring to.
421-423: This is in line with the results from Kraft et al. 2022 where there was almost no confusion between different taxonomical groups.
467-469: A lightweight CNN has proven to reach very good performance in classifying plankton also previously, e.g., Kraft et al. 2022.
470-475: Yes, I do agree with this partly, however, for example, the MATLAB codes available for IFCB data processing and classification purposes are still easier to adopt by new groups, as they don’t actually require much knowledge of any programming. That is why so many groups with IFCB still actively use the MATLAB-based RF implementation https://github.com/hsosik/ifcb-analysis
510-511: The comment comes a bit out of the blue and without context. If you want to add this information, I suggest rephrasing and tying it better to the content.
545-547: Do you have statistics/ figure to support this? If it is said like this, the results should be added as supplementary, otherwise, this phrase should be removed.
548-550: The concept dataset shift is indeed a problem. However, the nature of plankton makes it very hard to have a representative distribution when classifying real datasets. The training data is ideally constructed based on data from multiple occasions, seasons, and covering several years. Still, when classifying data, the samples to be classified are from a specific time point, i.e., a spring sample, which will not have the same data distribution as the training data, as the class composition and the share of very heterogeneous images vary. So, an interesting question is, how much does the class distribution actually matter in the case of plankton?
555-561: Open set classification methods are also a very promising approach to classify plankton, as plankton data includes a lot of difficult-to-classify images that normally end up in very heterogeneous classes with little common good features to describe those classes, and which often have very poor performance. Still, often those classes, at least in the case of phytoplankton instruments triggered by chlorophyll a, contain phytoplankton, i.e., are of interest, but which are difficult or impossible to identify taxonomically well (e.g., a small flagellate).
570-572: I don’t see this as a new and interesting thing, but rather an already well-known fact. I didn’t see the point of showing accuracy in figures as well, as the fact that it is a poor metric is already known.