The Plankton Lifeform Extraction Tool: a digital tool to increase the discoverability and usability of plankton time-series data

Ostle, Clare; Paxman, Kevin; Graves, Carolyn A.; Arnold, Mathew; Artigas, Luis Felipe; Atkinson, Angus; Aubert, Anaïs; Baptie, Malcolm; Bear, Beth; Bedford, Jacob; Best, Michael; Bresnan, Eileen; Brittain, Rachel; Broughton, Derek; Budria, Alexandre; Cook, Kathryn; Devlin, Michelle; Graham, George; Halliday, Nick; Hélaouët, Pierre; Johansen, Marie; Johns, David G.; Lear, Dan; Machairopoulou, Margarita; McKinney, April; Mellor, Adam; Milligan, Alex; Pitois, Sophie; Rombouts, Isabelle; Scherer, Cordula; Tett, Paul; Widdicombe, Claire; McQuatters-Gollop, Abigail

doi:https://doi.org/10.5194/essd-13-5617-2021

Articles | Volume 13, issue 12

https://doi.org/10.5194/essd-13-5617-2021

© Author(s) 2021. This work is distributed under
the Creative Commons Attribution 4.0 License.

https://doi.org/10.5194/essd-13-5617-2021

© Author(s) 2021. This work is distributed under
the Creative Commons Attribution 4.0 License.

Articles | Volume 13, issue 12

Data description paper

|

06 Dec 2021

Data description paper |

| 06 Dec 2021

The Plankton Lifeform Extraction Tool: a digital tool to increase the discoverability and usability of plankton time-series data

Clare Ostle, Kevin Paxman, Carolyn A. Graves, Mathew Arnold, Luis Felipe Artigas, Angus Atkinson, Anaïs Aubert, Malcolm Baptie, Beth Bear, Jacob Bedford, Michael Best, Eileen Bresnan, Rachel Brittain, Derek Broughton, Alexandre Budria, Kathryn Cook, Michelle Devlin, George Graham, Nick Halliday, Pierre Hélaouët, Marie Johansen, David G. Johns, Dan Lear, Margarita Machairopoulou, April McKinney, Adam Mellor, Alex Milligan, Sophie Pitois, Isabelle Rombouts, Cordula Scherer, Paul Tett, Claire Widdicombe, and Abigail McQuatters-Gollop

Download

Final revised paper (published on 06 Dec 2021)
Preprint (discussion started on 21 Jul 2021)

Interactive discussion

Status: closed

RC1:
'What an incredible solution! (Comment on essd-2021-171)', Todd OBrien, 12 Aug 2021

Dear Authors,

Wow! What an excellent tool/dataset and a super clever solution to the challenge of comparing disparate time series programs and their data! By aggregating species into a cluster of "lifeform" groupings, you solve methodlogical intercomparison challenges, while also creating a product that is more readly understandable to ecosystem- and policy- level users. You have been very careful and thorough in your design, and I especially appreciate the confidence ratings that you applied to the different lifeform categories. The PLET trait lookup table by itself is super valuable and useful, and being able to apply it directly to time series via the BASSH/PLET tool is even better. Overall, everything about your manuscript was excellent, and it was a pleasure to read.   You provided a really well written paper and methodology, and I had only a tiny few questions after reading through it.

First Question:  You mention that each time series data set is preserved via a DOI. Is this doi/data the "original raw unaggregated data" (e.g., species counts by month by year before being translated into lifeform categories) or is it the data after it has been aggregated into lifeform categories (and individual species data is likely removed from that doi)? I ask because the reason IGMETS (and ICES WGZE/WGPME) only worked with totals (total copepods, total diatoms) was/is because some time series holders were hesitant to share their full raw species data. If you are sharing only the aggregated data, that would sooth most contributors (by not releasing the full raw data itself) and greatly increase/encourage more participation. That is an excellent solution to this ongoing challenge, and I think you should talk to ICES WGZE and ICES WGPME about getting more data sets into your tool.

Second Question: I really like Figures 5, 6, 7, and 8. Is it possible to get the BASSH/PLET tool to automatically generate those? (If it already does, I could not get figure it out.) Or perhaps you can pre-generate them, for the fixed site time series at least? This is very useful summary information about the time series, with or without the interactive tool component.

Third Question: The PLET Trait Look-Up Table is probably most important part of the entire database/tool "ecosystem". With what frequency do you hope to maintain and expand that table? On a related note, while you say you are marine-focused, adding Baltic Sea species would greatly expand your area of coverage in Europe. Surely HELCOM has most of the Trait info you need to make this expansion in the Look-up Table?

Fourth Question: While you say (in the manuscript) you can't really compare different time series, you actually did .. in Figures 5,6,7,8, by using the Z-score. If you add these graphics to PLET somewhere (Question 2 above), multi-site comparisons or overviews should also be possible. Maybe not "live" (via the tool), but perhaps as pre-generated products elsewhere in the BASSH/PLET web page?

I do not have anything negative to say, but two suggestions:

Suggestion #1: For me, the BASSH/PLET tool will usually "timeout" on the CPR data unless I subset the geographic region and/or time period. (This is not a problem with the single site time series, as they are much much smaller.) Are you using raw, full-geographic-resolution CPR data (i.e., the original silk locations)? For performance, you may want to subset those into geographic average boxes, perhaps 0.5 x 0.5 or 1.0 x 1.0 degree boxes. It would greatly reduce the number of data points the tool would have to process "on the fly".

Suggestion #2:  Table A1 is super long ... as in 29 pages in the review PDF. Since I am guessing that listing will change fairly regularly, why not make it an online file and only give a one page example of its content in the manuscript? I find the 29 pages distracting as I am trying to get down to Table A2 ...

I am really excited to see where this will go! Please reach out to ICES WGZE/WGPME to expand the coverage of this tool!

Todd O'Brien

Citation: https://doi.org/10.5194/essd-2021-171-RC1
- AC1: 'Reply on RC1', Clare Ostle, 02 Sep 2021
  
  The comment was uploaded in the form of a supplement: https://essd.copernicus.org/preprints/essd-2021-171/essd-2021-171-AC1-supplement.pdf
  
  Citation: https://doi.org/10.5194/essd-2021-171-AC1
RC2:
'Comment on essd-2021-171', Aleksandra Lewandowska, 20 Sep 2021
This is an important and much needed tool with a great potential for further development. The manuscript is very well written, all functions of the database are clearly described and justified. I can only hope that this tool finds its way to the stakeholders and the policy makers in Europe.

I especially admire the functional groups look-up table and the confidence rating. It is easy to expand and continuously improve when more data are added. The spatial coverage of the database can be rapidly expanded, especially if SMHI extends access to their time-series from the Baltic Sea. If this happens, it would make sense to expand the lifeforms table by filamentous cyanobacteria to track their blooms in the Baltic Sea. Such information would be highly relevant to policy makers in the Baltic Sea region and some other coastal areas in Europe.

I also appreciate that data from different sources are not aggregated. This gives a lot of freedom for the users, who can apply and develop their own statistical techniques to make generalisations.

Figures 5-8 are wonderful examples how to use the PLET and what kind of information can be extracted. I do not expect that the database developers will offer such visualisation tool, but this content of the manuscript is a great source of inspiration for the users.

Figures 2-3 on the sampling effort are extremely important from the point of view of statistical diagnostics. It would be great if such figures could be included in the metafile description on the website, so that the user can easily see where are the potential gaps in each dataset and what are the limitations.

Although the manuscript and the database are impressive, below are my suggestions for some improvement.

Regarding the manuscript:

It might be a good idea to highlight the advantage of PLET over satellite derived information in the introduction. There is a short sentence in the discussion about the limitation of bulk indices, such as total chlorophyll a concentration, but I think it would be good to have it earlier in text.

There is no mention of current development of plankton trait databases, such as nutrient utilisation traits database (Edwards et al. 2015 - Ecological Archives), Baltic Sea phytoplankton traits database (Klais et al. 2017 - Functional Ecology), French phytoplankton traits database (Treyture et al. 2020 - Scientific Data). It would be good to place the PLET in their context. Maybe it would be worth adding the links to such trait databases in the future, if they exist for individual datasets, e.g. in the metafile description. This would be especially valued in those cases where taxonomic lists are made available.

Please add a short information how the PLET is dealing with synonyms and updates in plankton taxonomy. Is there an automatic check applied (e.g. with WORMS or AlgaeBase) or does it need to be made manually by data providers? In general, is there a systematic data quality check performed upon submission of the time-series? How often such quality check should be performed? I wonder how to ensure the consistent data quality among PLET database, if the data quality check is the responsibility of data providers. I am sure this is not a problem at the moment, but how to guarantee it in the future when the tool expands?

Please add a link to the SMHI portal in the chapter 3.1.8 similarly as you did for the other time series (https://www.smhi.se/en/services/open-data/national-archive-for-oceanographic-data/download-data-1.153150).

Regarding the PLET:

The website performance needs significant optimisation. I believe the problem is not in PLET, but rather in the host server, but this should be fixed before the tool expands. I tired different browsers and different computers, but the problem persists and the service website jams easily, even when I’m trying to limit my search and download data in small pieces. If this causes problems now, it will grow in the future.

The short description of sampling methodologies (chapter 3.1) is excellent and could be added to the metafile together with the information on sampling effort (see my comment to Fig 2-3). This would make the service more user friendly.

As the tool is meant for biodiversity assessment, it might be good to add some basic information on changes in taxonomic resolution. For example, species accumulation curves for each time series could give a clue on significant change in resolution, which can affect interpretation of the outcomes. As many diversity indices, including the most popular species richness, are sensitive to changes in sample size, this is an important information on data quality. Depending on the visualisation, such curves could have annotations with information on changes in methodology or instrumentation, which correspond to observed inconsistency.
Citation: https://doi.org/10.5194/essd-2021-171-RC2
- AC2: 'Reply on RC2', Clare Ostle, 06 Oct 2021
  
  The comment was uploaded in the form of a supplement: https://essd.copernicus.org/preprints/essd-2021-171/essd-2021-171-AC2-supplement.pdf
  
  Citation: https://doi.org/10.5194/essd-2021-171-AC2

Peer review completion

AR: Author's response | RR: Referee report | ED: Editor decision | EF: Editorial file upload

AR by Clare Ostle on behalf of the Authors (06 Oct 2021) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (08 Oct 2021) by David Carlson

RR by Todd OBrien (19 Oct 2021)

ED: Publish as is (25 Oct 2021) by David Carlson

AR by Clare Ostle on behalf of the Authors (26 Oct 2021)

Short summary

Plankton form the base of the marine food web and are sensitive indicators of environmental change. The Plankton Lifeform Extraction Tool brings together disparate plankton datasets into a central database from which it extracts abundance time series of plankton functional groups, called lifeforms, according to shared biological traits. This tool has been designed to make complex plankton datasets accessible and meaningful for policy, public interest, and scientific discovery.