the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
TraitCH: a multi-taxa functional trait dataset for Switzerland and Europe
Abstract. Functional traits of species are becoming increasingly used in ecological research, providing key insights into organisms-environment interactions, ecosystem functions, and responses to environmental changes. In recent years, substantial initiatives have generated major open-access datasets of species' functional traits. However, these resources typically concentrate on a handful of well-studied biological groups—such as plants, birds, and fishes—and less on specific biogeographic regions limiting their applicability in regional biodiversity assessments and conservation planning. Here, we present TraitCH, a comprehensive dataset of functional traits spanning over 71,874 species (≥ 1 functional trait) across 17 major taxonomic groups: Apocrita (2,278), Arachnida (3,728), Coleoptera (8,565), Ephemeroptera/Plecoptera/Trichoptera (1,349), Lepidoptera (3,757), Odonata (234), Orthoptera (1,283), Bryobiotina (2,285), Fungi (12,469), Lichen (2,435), Mollusca (7,493), Pisces (838), Amphibia (151), Aves (1,356), Mammalia (522), Reptilia (298), and Tracheophyta (22,833). Compiled from 43 published and unpublished sources, TraitCH provides a robust representation of total species richness and composition for Switzerland and Europe. For each species, we compiled their taxonomic hierarchy, existing synonymy, geographic origin, conservation status, micro- and macro-habitat types, global range size and available ecological trait values. TraitCH consists of 17 trait tables (one per major taxonomic group), each available in two formats: (1) original and (2) completed versions with missing trait values imputed using a tree-based modelling method. TraitCH was also embedded within a comprehensive checklist of European species from the same groups (~210,000 taxa), encompassing authoritative Swiss and European checklists, with the exception of Fungi and Lichen, for which only Swiss checklists were available. TraitCH is available on Zenodo: https://doi.org/10.5281/zenodo.15063844 (Chauvier et al., 2025).
- Preprint
(1962 KB) - Metadata XML
- BibTeX
- EndNote
Status: closed
- RC1: 'Comment on essd-2025-754', Stef Bokhorst, 06 Mar 2026
-
RC2: 'Comment on essd-2025-754', Anonymous Referee #2, 10 Apr 2026
April 10, 2026
This manuscript describes the compilation of the TraitCH dataset which combines the traits datasets and databases for species primarily found in Switzerland and extending to other species in Europe. The manuscript documents how datasets were subset to fit the study area, how taxonomy was clarified, how trait records were curated, and how missing values were imputed. Evaluations on data coverage taxonomically, regionally, and between traits were then reported.
I wish to highlight a few points for clarification on the motivation and introduction of the work, on the analysis and presentation of the data, and on the quality of the data.
Motivation and introduction
- I suggest that the authors explicitly declare why this dataset was created. Was this dataset created to support a regional ecology project perhaps? Regarding the statement in line 25 that taxa-specific trait resources concentrate on biological groups and “less on specific biogeographic regions limiting their applicability in regional biodiversity assessments and conservation planning”, I think that trait databases being agnostic to biogeographic regions is actually their strength. I do not agree that the regional specificity of trait data resources can be used as a metric for how it can be applied. I would argue that the application of traits in biodiversity assessments and conservation planning should actually motivate compilations towards a global coverage, rather than region-specific as what the statement in line 25 seems to imply. Measuring traits and compiling trait data are expensive and time-consuming work and are often driven by taxa-specific community-driven efforts and the focus on specific taxa is needed to ensure the quality of the trait datasets. Presenting this as a limitation, for instances, when raw trait databases did not fit what is needed for another independent study, is not ideal recognizing the work it took have trait datasets to extract data from.
- I suggest that the authors use caution when narrating the comparison of the taxonomic groups and their respective trait compilation efforts. For example, line 59: “biased toward “charismatic” or well-studied taxa” and line 70: “neglected taxa such as…”. The fact that this study had trait databases to draw from means that the taxonomic group was not “neglected”. The spider traits is an excellent example of a group with have rich trait data. I think that the amount of trait data and size of the trait databases could not be simply generalized to these taxa being “charismatic” or “well-studied”. Yes, there are a lot of plant and fish trait data and these domains have popular databases, but there are many historical factors that could influence that. Note that the reference in line 61 (Troudet et al. 2017) pertains to bias in species occurrence data in GBIF, it is not about traits.
Analysis and presentation of the data
- I suggest that the authors clarify in the text and figures what the basis of the “trait completeness” and “trait coverage” were calculated from.
- Moreover, I strongly encourage the authors to rethink how the number of traits per taxonomic group is quantified, and consequently how the evaluation of data completeness is calculated. It seems like traits are counted according to the number of columns of the data table. Looking at the Amphibian trait table as an example. The number of traits reported in table 1 is 59. However, in the traits_metadata file, there are 34 columns of trait names. Additionally, the Diet_* composition trait is disaggregated to 7 rows/columns but this is a binary matrix of a single trait. Thus, for amphibians (and if habitat and environmental conditions characteristics are to be considered as traits), there are 28 traits to be counted, not 59.
- What are considered as unique traits in this manuscript influences how data completeness is calculated. Such that a single trait name with multiple categories disaggregated as a binary matrix with multiple columns (e.g., diet composition) would have disproportionate influence in the accounting of data completeness compared to individual traits with a single column. This may bias the perception towards making the dataset seem more complete.
- Regarding trait imputation. I recommend that the authors specify if the imputation was based on the compiled Switzerland/European trait databases or from the global trait datasets. If the imputation was based on the regional dataset only, why not use the all the trait records for a taxonomic group? The tree-based imputation might be more accurate if all the trait data for a group were used in the statistics. Even when data is sparse, non-region specific and global trait datasets are important to “capture evolutionary relationships and shared adaptive history” as described I line 188.
- I encourage the authors to discuss how the names of traits and trait categories were harmonized and report which traits in the dataset are similarly named across taxonomic groups and which are not. It would be useful to evaluate this comparison in the discussion.
Quality of the data
- I suggest that the authors justify why environmental or spatial traits such as “Habitat”, “Ecological Indicator Value”, “Proportion of species range map covered…”, average species range temperature/precipitation/etc are considered traits in this work. The definitions for functional traits in the introduction does not necessarily encompass environmental characteristics associated with species occurrence as organismal traits.
- I suggest that the motivation for using static trait records per species be explained in the methods section. The discussion section (lines 254-259) states that intraspecific variability is relevant and the trait datasets used in this study could have contained the information to capture intraspecific variability already but these were aggregated in the data processing for TraitCH, effectively removing the intraspecific variability that is then being recommended back into future research in line 259.
- I recommend that the authors evaluate and report in the Discussion section how this dataset adheres to FAIR data principles.
- Specifically regarding interoperability, I recommend that the authors report which controlled vocabularies and trait definitions were used in harmonizing the dataset.
- Specifically regarding reusability, I suggest that the authors modify how data provenance is preserved. Looking at the files in the folder outputs/raw_traits/, the references or sources of the information are not visible. Broadly listing the data sources in Table 1 is not sufficient. For this dataset to be FAIR, the data tables should contain the data source information and if there were aggregated trait records, list which sources were aggregated.
Citation: https://doi.org/10.5194/essd-2025-754-RC2 -
CC1: 'Comment on essd-2025-754', Markus K. Meier, 23 Apr 2026
Dear Yohann & al.
I detected some issues in bryophytes of x_TraitCH_2.0 which, in my opinion, makes the trait lists much less useful as they could be.
1. taxon names compilation and phylogentic information:
- Additionally to the two cited trait sources for bryophytes, you added names from some unknown sources called speed2zero or FaunaEuropeae. During the process of removing name duplicates, the sources were not aggregated (e.g. like --> "EAA,FaunaEuropeae").
- Although you have detected synonyms using the GBIF status, synonyms are not removed from the list (and accepted GBIF names were not added as additional rows with GBIF names in "species")
- the information in columns originEUR (mostly 'No') and originCH (some 'NA' although swissRedListCategory is assigned) is incomplete or lost.
- There are some inconsistencies between "species" and "scientificName", with a wrong rank in scientificName e.g. "Syntrichia ruralis var. ruraliformis, Syntrichia ruralis subsp. ruraliformis (Besch.) Düll", "Scapania aequiloba aggr.,Scapania aequiloba (Schwägr.) Dumort." – with unknown implicationssome implications:
- it is impossible to retrieve the number or a list of CH or EUR species or taxa according to either GBIF, infoSpecies or any other authoritative source, precluding e.g. an accurate overview of distributions of specific traits and also resulting in overestimates of the number of species covered (line 29: "Bryobiotina (2,285)" (while "distinct GBIF_accepted" returns 1945 taxa including many subspecies and varieties, and van Zuijlen & al. list "1816 European bryophyte species").
- aggregates: trait data for taxa in the rank of aggregate are lost, although probably present in the source (data for Hypnum cupressiforme and Hypnum cupressiforme aggr. are identical except for "swissRedListCategory")
- using different sources, names, and compilation processes, families are sometimes different for homotypic synonyms (duplicates) e.g. GBIF_accepted "Flexitrichum gracile (Mitt.) Ignatov & Fedosov", and worse, even order is inconsistent among similar or identical taxa (see e.g. species like "Hypnum cupressiforme%"), thus blurring phylogenetic information and making imputed data less relevant
- because synonyms were not removed, imputed traits are different even for homotypic synonyms, and therefore query results depend on the input taxonomy (e.g. (imputed )RangeSize_km2: Hypnum cupressiforme var. subjulaceum: 436113.807601576, Hypnum cupressiforme subsp. subjulaceum: 124471.444101485).
- all these problems cancel out the advantage of having multiple entry points (synonyms) in the "species" column for queries with different input taxonomies.2. compilation of other raw data (technical details)
- omission of aggregate data and dependencies. Although GBIF does not handle aggregate taxa, one of the data sources does (the information of which species belongs to which aggregate group is included in the Swiss Red List, "UV-2309-DFI_RLMoose_Anhang.xlsx"). This information, in addition to other phylogenetic data (genus, family, order, phylum) could be used to improve imputed traits, and of course, the data of aggregate taxa (e.g. [...]sb_substr) are useful anyway, for raw use and for imputation.
- poor resolution of habitat_xy. The sheet metadata.Delarze_habitat contains much less assignments to habitat categories then the very similar metadata.FloraIndicativa_habitat, leading to many NA values and imputed 0. Also the raw data (Delarze_habitat) are omitted, although (probably) similar raw data ([...]sb_substr) are included.
- the traits [...]sb_substr use "indivudual counts": probably it would be more useful to use relative values to express substrate affinity, and also, a dataqualitysb_substr trait would be useful, i.e. the proportion of individual counts assigned to any substrate class (which probably is often <0.5 or much less; or vice versa: the proportion of specimens (of a given taxon) with no substrate class assigned at all) and/or n (how many specimens of the taxon where evaluated, i.e. the sum of all [...]sb_substr)3. imputation of traits
The algorithms were apparently not instructed to consider any interdependencies between traits (or relevant taxa), and were unable to identify them themselves or only recognised them in a very basic form.
Regardless of how useful rather randomly imputed traits are for answering any specific questions, there are some traits whose imputation is particularly questionable. E.g.
- although in many cases Veg_propagules = 0, the size (Vp_max_size_µm) of the "absent propagules" is imputed for such taxa
- [...]sb_substr is available for CH-taxa only, but has been imputed for e.g. macaronesian genera with no close relatives in CH, giving any random results, based on anything, probably mainly overall sb_substr distribution
- habitat_xy has many NA in raw data (see above), and is imputed mostly to 0, making it useless for any practical questions
- some other traits are based on very few values, e.g. Seta_max_len_mm (assessed only for a few taxa in the Azores) resulting in meanlingless imputed values for the other 2000 names.
- life strategy is by definition based on other traits like spore size, but the missing data were probably imputed independently (data not checked).
- many other interdependencies are probably difficult to detect for machines using simple models without a list of restrictions or rules, but are obvious for humans: E.g. the min max values for any trait of a subspecies/variety have to be in the range of the min max of the corresponding species (or aggregate). The imputed data do not conform to such simple rules.implications:
- it is difficult for potential users to be aware of such shortcomings – even if "pred_error" and "data_completeness" values are given in missRanger_evaluations/S_EVAL_s2z_raw_traits_bryophytes.txt
- queries for imputed traits for a subset of taxa may largely return random data (i.e. maybe correct, maybe not), thus leading to inappropriate conclusions (e.g. "only 0 % species from sandy habitats") or hiding real patterns in the vast amount of random data. (e.g. Mating_type across different families, or ecosystems).Options to improve the situation somewhat (without improving the imputation algorithms) could be:
- remove traits with high prediction errors from S_MEAN_s2z_raw_traits_bryophytes.txt
- remove european taxa with non swiss origin, because all their [...]sb_substr data (and maybe other) are imputed from a swiss subset only, giving probably a high error rate (if checked against other data sources)
- identify and harmonize synonyms, etc.4. Finally, a small technical detail.
In 2.3. trait compilation (line 123-124), you cite Hofmann et al. 2011 as one of two data sources. This short article, of which I am a co-author, describes a website where data are available; however, it does not contain any data itself. The data source should be cited more precisely, for example as follows (: "Swissbryophytes (Nationales Daten- und Informationszentrum der Schweizer Moose), data shared on dd-mm-yyyy / xy@swissbryophytes“ or "data collected from www.swissbryophytes.ch dd-mm-yyyy / authors copyright granted dd-mm-yyyy / xy@swissbryophytes" (whichever applies).
Citation: https://doi.org/10.5194/essd-2025-754-CC1 -
RC3: 'Comment on essd-2025-754', Anonymous Referee #3, 24 Apr 2026
General comments
This manuscript presents a compilation of trait datasets for multiple taxonomic groups in Switzerland and Europe. It combines published data with unpublished sources and could therefore become a new precious reference for trait information. It is a great amount of work, very valuable for the whole community. However, I found that in the manuscript several aspects are a bit unclear or could be improved (see detail below). Perhaps my main concern is that the authors did not follow any standard or ontology or recommended workflow, which would have made their work even more valuable, impactful and reproducible. Some of this could still be done by e.g. using DarwinCore (https://dwc.tdwg.org/list/) as much as possible in the datasets.
Specific comments
1- Abstract - I found the abstract a bit unclear because the checklist is presented towards the end (L.35) so it is difficult to understand which species are present in the dataset. E.g. L.31 “robust representation of total species richness..” is not understandable without knowing that the authors started from checklists. Perhaps explaining better the process from checklists to trait retrieval would be more useful to understand the datasets.
2- I think that the geographical coverage should be described more precisely. It is for instance very important to describe what “Europe” is, what is the source of the checklists, how they were compiled and in which context and for which countries. E.g. do all countries have checklists for all taxa?
Fig.2 is useful to understand geographic coverage but given the lack of range information for many species in TraitCH (according to the legend), it might be misleading. Perhaps it would be worth adding the total % of species represented in the map for each group, in addition to the raw numbers given in the legend. In addition, I am not sure if it is fair to call this “trait coverage” given that it does not account for the NAs in the data (if I understood correctly).
3- I generally do not understand why the dataset is called TraitCH and not TraitEU. I understand that the authors probably started from CH and extended their dataset and it is maybe a detail, but it does not describe the dataset properly and makes things a bit confusing. For instance, I wondered why habitat was given only for Swiss species (and by the way, I am not sure that many readers would be familiar with the concept of “habitat guilds”).
4- I think that a bit more of discussion would be useful to understand the completeness of the dataset. For instance, when looking at Fig.1 I was surprised by the completeness and number of traits for bryophytes, which is comparable to the one of birds. Why do Ascomycota have so many traits? Which traits are they? Why more than vascular plants, which have so many traits in other databases such as TRY. This should be understandable from the text without having to dig into the datasets. Also, what is “unknown” in Fig1? How can a trait can be attributed to an unknown group?
5- In the case of multiple values coming from different datasets for a given trait and species I did not fully understand the procedure: L.154 the authors indicate that “precedence is given to values of datasets covering more traits”. But earlier in the text it is indicated that in case of duplicated names, trait values are aggregated by mean or mode. Why not taking the mean in both cases? There should be a better justification for these choices. I think the mean/mode (potentially with associated deviation) is a better solution because it includes more information.
6- L.160-165: I did not find this paragraph very clear. I also did not understand why the step of grouping names with GBIF names was not done before merging TraitCH and the European checklists, which would have avoided the iterative process. But again, I found the paragraph not very clear so I probably missed something.
7- Native / non native (L.199): It would be useful to have a definition here and to know more about the source of data (taken from range map or other databases?). Can a species be non-native to CH but native of Europe? Is the column originCH really providing more information than originEUR? This becomes clearer only after reading the metadata but the rationale could be explained in the text.
8- Standardisation and integration – today several ontologies and vocabularies provide definitions and guidelines for biological data and are easily accessible online (e.g. Darwin Core https://dwc.tdwg.org/list/). I would strongly recommend to follow them.
9- Future directions – it would be important for all readers to know whether the authors plan to continue populating the datasets in future or not. And also how other authors could contribute to TraitsCH without creating new parallel datasets. I understand these are static versions but a general idea of how the authors see the future of the datasets would be very important to have.
Technical corrections
L.45: the last part of the sentence is a bit odd and seems to belong to a different sentence.
L.60: “plants” or “vascular plants”
L.65: “but see for”, please expand, as this is not really informative
L.72: “IS embedded”
Please add a reference for the Noun project
L.97: taxonomic information is not always about co-adaptation. Perhaps simplify to “allowed extra taxonomic information”.
L.142: please shortly explain how the information about imputed/non imputed data in previous datasets was obtained
L.151: taxonomic groupS
Fig3: please add names near to icons, as in Fig.4
L.239: “taxonomic” rather than “phylogenetic”
Citation: https://doi.org/10.5194/essd-2025-754-RC3
Status: closed
-
RC1: 'Comment on essd-2025-754', Stef Bokhorst, 06 Mar 2026
This data paper compiled trait data across different taxonomic groups for Switzerland. Such trait databases can be very useful for modelling work and as such are valuable. It is less clear how this work addressed the main arguments raised at the end of the introduction (lines 60-70). Species names and synonyms are dealt with but it is unclear whether raw trait data was checked, standardized and retrieved from the hard to obtain grey literature.
In addition, I wonder how for instance, taxa morphology is comparable across these taxonomic groups? Is there any point in comparing the morphology of a lichen with that of a fish? The added value of this regional dataset would lie with comparable trait values, but these are currently limited to distribution patterns.
It would be helpful if the ‘significant advancement’ (line 243) of TraitCH is explained in greater detail and how it would outperform in comparison to for instance TRY? Or in other words, what questions can TraitCH address that are not possible with TRY?
Line 45 the ‘while increasingly collected in a standardized manner’ doesn’t link logically to the preceding part of the sentence.
Line 51 please explain abbreviation “TRY” and any others in the ms.
Line 69 unclear if and how this study addressed the issues mentioned above. Did this work address standardizing trait definitions (line 64)? Did this work trawl through the difficult to reach and grey literature (lines 64-65)? Based on the information provided for fungi this work simply used data provided by Zanne et al (a general paper on fungi traits) and Gross et al (a records database) – how does this resolve issues of standardization and grey literature data?
Lines 119-120 I think it could be very valuable if a table is included to explain which variables are included for each trait category. “morphology, life-history, ecological behaviour, environmental niche and habitat of each species” is useful information but can be interpreted in various ways and is not always directly comparable between taxa.
What is the ‘Noun’ project?
Lines 145-150: Was trait aggregation manually checked? Species names/synonyms can be misspelled or otherwise mistakenly labelled and simply averaging trait values can results in values that are incorrect for both species. Did you check for trait value units between studies?
Citation: https://doi.org/10.5194/essd-2025-754-RC1 -
RC2: 'Comment on essd-2025-754', Anonymous Referee #2, 10 Apr 2026
April 10, 2026
This manuscript describes the compilation of the TraitCH dataset which combines the traits datasets and databases for species primarily found in Switzerland and extending to other species in Europe. The manuscript documents how datasets were subset to fit the study area, how taxonomy was clarified, how trait records were curated, and how missing values were imputed. Evaluations on data coverage taxonomically, regionally, and between traits were then reported.
I wish to highlight a few points for clarification on the motivation and introduction of the work, on the analysis and presentation of the data, and on the quality of the data.
Motivation and introduction
- I suggest that the authors explicitly declare why this dataset was created. Was this dataset created to support a regional ecology project perhaps? Regarding the statement in line 25 that taxa-specific trait resources concentrate on biological groups and “less on specific biogeographic regions limiting their applicability in regional biodiversity assessments and conservation planning”, I think that trait databases being agnostic to biogeographic regions is actually their strength. I do not agree that the regional specificity of trait data resources can be used as a metric for how it can be applied. I would argue that the application of traits in biodiversity assessments and conservation planning should actually motivate compilations towards a global coverage, rather than region-specific as what the statement in line 25 seems to imply. Measuring traits and compiling trait data are expensive and time-consuming work and are often driven by taxa-specific community-driven efforts and the focus on specific taxa is needed to ensure the quality of the trait datasets. Presenting this as a limitation, for instances, when raw trait databases did not fit what is needed for another independent study, is not ideal recognizing the work it took have trait datasets to extract data from.
- I suggest that the authors use caution when narrating the comparison of the taxonomic groups and their respective trait compilation efforts. For example, line 59: “biased toward “charismatic” or well-studied taxa” and line 70: “neglected taxa such as…”. The fact that this study had trait databases to draw from means that the taxonomic group was not “neglected”. The spider traits is an excellent example of a group with have rich trait data. I think that the amount of trait data and size of the trait databases could not be simply generalized to these taxa being “charismatic” or “well-studied”. Yes, there are a lot of plant and fish trait data and these domains have popular databases, but there are many historical factors that could influence that. Note that the reference in line 61 (Troudet et al. 2017) pertains to bias in species occurrence data in GBIF, it is not about traits.
Analysis and presentation of the data
- I suggest that the authors clarify in the text and figures what the basis of the “trait completeness” and “trait coverage” were calculated from.
- Moreover, I strongly encourage the authors to rethink how the number of traits per taxonomic group is quantified, and consequently how the evaluation of data completeness is calculated. It seems like traits are counted according to the number of columns of the data table. Looking at the Amphibian trait table as an example. The number of traits reported in table 1 is 59. However, in the traits_metadata file, there are 34 columns of trait names. Additionally, the Diet_* composition trait is disaggregated to 7 rows/columns but this is a binary matrix of a single trait. Thus, for amphibians (and if habitat and environmental conditions characteristics are to be considered as traits), there are 28 traits to be counted, not 59.
- What are considered as unique traits in this manuscript influences how data completeness is calculated. Such that a single trait name with multiple categories disaggregated as a binary matrix with multiple columns (e.g., diet composition) would have disproportionate influence in the accounting of data completeness compared to individual traits with a single column. This may bias the perception towards making the dataset seem more complete.
- Regarding trait imputation. I recommend that the authors specify if the imputation was based on the compiled Switzerland/European trait databases or from the global trait datasets. If the imputation was based on the regional dataset only, why not use the all the trait records for a taxonomic group? The tree-based imputation might be more accurate if all the trait data for a group were used in the statistics. Even when data is sparse, non-region specific and global trait datasets are important to “capture evolutionary relationships and shared adaptive history” as described I line 188.
- I encourage the authors to discuss how the names of traits and trait categories were harmonized and report which traits in the dataset are similarly named across taxonomic groups and which are not. It would be useful to evaluate this comparison in the discussion.
Quality of the data
- I suggest that the authors justify why environmental or spatial traits such as “Habitat”, “Ecological Indicator Value”, “Proportion of species range map covered…”, average species range temperature/precipitation/etc are considered traits in this work. The definitions for functional traits in the introduction does not necessarily encompass environmental characteristics associated with species occurrence as organismal traits.
- I suggest that the motivation for using static trait records per species be explained in the methods section. The discussion section (lines 254-259) states that intraspecific variability is relevant and the trait datasets used in this study could have contained the information to capture intraspecific variability already but these were aggregated in the data processing for TraitCH, effectively removing the intraspecific variability that is then being recommended back into future research in line 259.
- I recommend that the authors evaluate and report in the Discussion section how this dataset adheres to FAIR data principles.
- Specifically regarding interoperability, I recommend that the authors report which controlled vocabularies and trait definitions were used in harmonizing the dataset.
- Specifically regarding reusability, I suggest that the authors modify how data provenance is preserved. Looking at the files in the folder outputs/raw_traits/, the references or sources of the information are not visible. Broadly listing the data sources in Table 1 is not sufficient. For this dataset to be FAIR, the data tables should contain the data source information and if there were aggregated trait records, list which sources were aggregated.
Citation: https://doi.org/10.5194/essd-2025-754-RC2 -
CC1: 'Comment on essd-2025-754', Markus K. Meier, 23 Apr 2026
Dear Yohann & al.
I detected some issues in bryophytes of x_TraitCH_2.0 which, in my opinion, makes the trait lists much less useful as they could be.
1. taxon names compilation and phylogentic information:
- Additionally to the two cited trait sources for bryophytes, you added names from some unknown sources called speed2zero or FaunaEuropeae. During the process of removing name duplicates, the sources were not aggregated (e.g. like --> "EAA,FaunaEuropeae").
- Although you have detected synonyms using the GBIF status, synonyms are not removed from the list (and accepted GBIF names were not added as additional rows with GBIF names in "species")
- the information in columns originEUR (mostly 'No') and originCH (some 'NA' although swissRedListCategory is assigned) is incomplete or lost.
- There are some inconsistencies between "species" and "scientificName", with a wrong rank in scientificName e.g. "Syntrichia ruralis var. ruraliformis, Syntrichia ruralis subsp. ruraliformis (Besch.) Düll", "Scapania aequiloba aggr.,Scapania aequiloba (Schwägr.) Dumort." – with unknown implicationssome implications:
- it is impossible to retrieve the number or a list of CH or EUR species or taxa according to either GBIF, infoSpecies or any other authoritative source, precluding e.g. an accurate overview of distributions of specific traits and also resulting in overestimates of the number of species covered (line 29: "Bryobiotina (2,285)" (while "distinct GBIF_accepted" returns 1945 taxa including many subspecies and varieties, and van Zuijlen & al. list "1816 European bryophyte species").
- aggregates: trait data for taxa in the rank of aggregate are lost, although probably present in the source (data for Hypnum cupressiforme and Hypnum cupressiforme aggr. are identical except for "swissRedListCategory")
- using different sources, names, and compilation processes, families are sometimes different for homotypic synonyms (duplicates) e.g. GBIF_accepted "Flexitrichum gracile (Mitt.) Ignatov & Fedosov", and worse, even order is inconsistent among similar or identical taxa (see e.g. species like "Hypnum cupressiforme%"), thus blurring phylogenetic information and making imputed data less relevant
- because synonyms were not removed, imputed traits are different even for homotypic synonyms, and therefore query results depend on the input taxonomy (e.g. (imputed )RangeSize_km2: Hypnum cupressiforme var. subjulaceum: 436113.807601576, Hypnum cupressiforme subsp. subjulaceum: 124471.444101485).
- all these problems cancel out the advantage of having multiple entry points (synonyms) in the "species" column for queries with different input taxonomies.2. compilation of other raw data (technical details)
- omission of aggregate data and dependencies. Although GBIF does not handle aggregate taxa, one of the data sources does (the information of which species belongs to which aggregate group is included in the Swiss Red List, "UV-2309-DFI_RLMoose_Anhang.xlsx"). This information, in addition to other phylogenetic data (genus, family, order, phylum) could be used to improve imputed traits, and of course, the data of aggregate taxa (e.g. [...]sb_substr) are useful anyway, for raw use and for imputation.
- poor resolution of habitat_xy. The sheet metadata.Delarze_habitat contains much less assignments to habitat categories then the very similar metadata.FloraIndicativa_habitat, leading to many NA values and imputed 0. Also the raw data (Delarze_habitat) are omitted, although (probably) similar raw data ([...]sb_substr) are included.
- the traits [...]sb_substr use "indivudual counts": probably it would be more useful to use relative values to express substrate affinity, and also, a dataqualitysb_substr trait would be useful, i.e. the proportion of individual counts assigned to any substrate class (which probably is often <0.5 or much less; or vice versa: the proportion of specimens (of a given taxon) with no substrate class assigned at all) and/or n (how many specimens of the taxon where evaluated, i.e. the sum of all [...]sb_substr)3. imputation of traits
The algorithms were apparently not instructed to consider any interdependencies between traits (or relevant taxa), and were unable to identify them themselves or only recognised them in a very basic form.
Regardless of how useful rather randomly imputed traits are for answering any specific questions, there are some traits whose imputation is particularly questionable. E.g.
- although in many cases Veg_propagules = 0, the size (Vp_max_size_µm) of the "absent propagules" is imputed for such taxa
- [...]sb_substr is available for CH-taxa only, but has been imputed for e.g. macaronesian genera with no close relatives in CH, giving any random results, based on anything, probably mainly overall sb_substr distribution
- habitat_xy has many NA in raw data (see above), and is imputed mostly to 0, making it useless for any practical questions
- some other traits are based on very few values, e.g. Seta_max_len_mm (assessed only for a few taxa in the Azores) resulting in meanlingless imputed values for the other 2000 names.
- life strategy is by definition based on other traits like spore size, but the missing data were probably imputed independently (data not checked).
- many other interdependencies are probably difficult to detect for machines using simple models without a list of restrictions or rules, but are obvious for humans: E.g. the min max values for any trait of a subspecies/variety have to be in the range of the min max of the corresponding species (or aggregate). The imputed data do not conform to such simple rules.implications:
- it is difficult for potential users to be aware of such shortcomings – even if "pred_error" and "data_completeness" values are given in missRanger_evaluations/S_EVAL_s2z_raw_traits_bryophytes.txt
- queries for imputed traits for a subset of taxa may largely return random data (i.e. maybe correct, maybe not), thus leading to inappropriate conclusions (e.g. "only 0 % species from sandy habitats") or hiding real patterns in the vast amount of random data. (e.g. Mating_type across different families, or ecosystems).Options to improve the situation somewhat (without improving the imputation algorithms) could be:
- remove traits with high prediction errors from S_MEAN_s2z_raw_traits_bryophytes.txt
- remove european taxa with non swiss origin, because all their [...]sb_substr data (and maybe other) are imputed from a swiss subset only, giving probably a high error rate (if checked against other data sources)
- identify and harmonize synonyms, etc.4. Finally, a small technical detail.
In 2.3. trait compilation (line 123-124), you cite Hofmann et al. 2011 as one of two data sources. This short article, of which I am a co-author, describes a website where data are available; however, it does not contain any data itself. The data source should be cited more precisely, for example as follows (: "Swissbryophytes (Nationales Daten- und Informationszentrum der Schweizer Moose), data shared on dd-mm-yyyy / xy@swissbryophytes“ or "data collected from www.swissbryophytes.ch dd-mm-yyyy / authors copyright granted dd-mm-yyyy / xy@swissbryophytes" (whichever applies).
Citation: https://doi.org/10.5194/essd-2025-754-CC1 -
RC3: 'Comment on essd-2025-754', Anonymous Referee #3, 24 Apr 2026
General comments
This manuscript presents a compilation of trait datasets for multiple taxonomic groups in Switzerland and Europe. It combines published data with unpublished sources and could therefore become a new precious reference for trait information. It is a great amount of work, very valuable for the whole community. However, I found that in the manuscript several aspects are a bit unclear or could be improved (see detail below). Perhaps my main concern is that the authors did not follow any standard or ontology or recommended workflow, which would have made their work even more valuable, impactful and reproducible. Some of this could still be done by e.g. using DarwinCore (https://dwc.tdwg.org/list/) as much as possible in the datasets.
Specific comments
1- Abstract - I found the abstract a bit unclear because the checklist is presented towards the end (L.35) so it is difficult to understand which species are present in the dataset. E.g. L.31 “robust representation of total species richness..” is not understandable without knowing that the authors started from checklists. Perhaps explaining better the process from checklists to trait retrieval would be more useful to understand the datasets.
2- I think that the geographical coverage should be described more precisely. It is for instance very important to describe what “Europe” is, what is the source of the checklists, how they were compiled and in which context and for which countries. E.g. do all countries have checklists for all taxa?
Fig.2 is useful to understand geographic coverage but given the lack of range information for many species in TraitCH (according to the legend), it might be misleading. Perhaps it would be worth adding the total % of species represented in the map for each group, in addition to the raw numbers given in the legend. In addition, I am not sure if it is fair to call this “trait coverage” given that it does not account for the NAs in the data (if I understood correctly).
3- I generally do not understand why the dataset is called TraitCH and not TraitEU. I understand that the authors probably started from CH and extended their dataset and it is maybe a detail, but it does not describe the dataset properly and makes things a bit confusing. For instance, I wondered why habitat was given only for Swiss species (and by the way, I am not sure that many readers would be familiar with the concept of “habitat guilds”).
4- I think that a bit more of discussion would be useful to understand the completeness of the dataset. For instance, when looking at Fig.1 I was surprised by the completeness and number of traits for bryophytes, which is comparable to the one of birds. Why do Ascomycota have so many traits? Which traits are they? Why more than vascular plants, which have so many traits in other databases such as TRY. This should be understandable from the text without having to dig into the datasets. Also, what is “unknown” in Fig1? How can a trait can be attributed to an unknown group?
5- In the case of multiple values coming from different datasets for a given trait and species I did not fully understand the procedure: L.154 the authors indicate that “precedence is given to values of datasets covering more traits”. But earlier in the text it is indicated that in case of duplicated names, trait values are aggregated by mean or mode. Why not taking the mean in both cases? There should be a better justification for these choices. I think the mean/mode (potentially with associated deviation) is a better solution because it includes more information.
6- L.160-165: I did not find this paragraph very clear. I also did not understand why the step of grouping names with GBIF names was not done before merging TraitCH and the European checklists, which would have avoided the iterative process. But again, I found the paragraph not very clear so I probably missed something.
7- Native / non native (L.199): It would be useful to have a definition here and to know more about the source of data (taken from range map or other databases?). Can a species be non-native to CH but native of Europe? Is the column originCH really providing more information than originEUR? This becomes clearer only after reading the metadata but the rationale could be explained in the text.
8- Standardisation and integration – today several ontologies and vocabularies provide definitions and guidelines for biological data and are easily accessible online (e.g. Darwin Core https://dwc.tdwg.org/list/). I would strongly recommend to follow them.
9- Future directions – it would be important for all readers to know whether the authors plan to continue populating the datasets in future or not. And also how other authors could contribute to TraitsCH without creating new parallel datasets. I understand these are static versions but a general idea of how the authors see the future of the datasets would be very important to have.
Technical corrections
L.45: the last part of the sentence is a bit odd and seems to belong to a different sentence.
L.60: “plants” or “vascular plants”
L.65: “but see for”, please expand, as this is not really informative
L.72: “IS embedded”
Please add a reference for the Noun project
L.97: taxonomic information is not always about co-adaptation. Perhaps simplify to “allowed extra taxonomic information”.
L.142: please shortly explain how the information about imputed/non imputed data in previous datasets was obtained
L.151: taxonomic groupS
Fig3: please add names near to icons, as in Fig.4
L.239: “taxonomic” rather than “phylogenetic”
Citation: https://doi.org/10.5194/essd-2025-754-RC3
Model code and software
TraitCH Yohann Chauvier-Mendes https://github.com/8Ginette8/TraitCH
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 575 | 296 | 43 | 914 | 42 | 59 |
- HTML: 575
- PDF: 296
- XML: 43
- Total: 914
- BibTeX: 42
- EndNote: 59
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
This data paper compiled trait data across different taxonomic groups for Switzerland. Such trait databases can be very useful for modelling work and as such are valuable. It is less clear how this work addressed the main arguments raised at the end of the introduction (lines 60-70). Species names and synonyms are dealt with but it is unclear whether raw trait data was checked, standardized and retrieved from the hard to obtain grey literature.
In addition, I wonder how for instance, taxa morphology is comparable across these taxonomic groups? Is there any point in comparing the morphology of a lichen with that of a fish? The added value of this regional dataset would lie with comparable trait values, but these are currently limited to distribution patterns.
It would be helpful if the ‘significant advancement’ (line 243) of TraitCH is explained in greater detail and how it would outperform in comparison to for instance TRY? Or in other words, what questions can TraitCH address that are not possible with TRY?
Line 45 the ‘while increasingly collected in a standardized manner’ doesn’t link logically to the preceding part of the sentence.
Line 51 please explain abbreviation “TRY” and any others in the ms.
Line 69 unclear if and how this study addressed the issues mentioned above. Did this work address standardizing trait definitions (line 64)? Did this work trawl through the difficult to reach and grey literature (lines 64-65)? Based on the information provided for fungi this work simply used data provided by Zanne et al (a general paper on fungi traits) and Gross et al (a records database) – how does this resolve issues of standardization and grey literature data?
Lines 119-120 I think it could be very valuable if a table is included to explain which variables are included for each trait category. “morphology, life-history, ecological behaviour, environmental niche and habitat of each species” is useful information but can be interpreted in various ways and is not always directly comparable between taxa.
What is the ‘Noun’ project?
Lines 145-150: Was trait aggregation manually checked? Species names/synonyms can be misspelled or otherwise mistakenly labelled and simply averaging trait values can results in values that are incorrect for both species. Did you check for trait value units between studies?