Table 3.
Criteria relevant for specimen-based taxonomic data repositories.
| Priority | Criterion | Explanation |
|---|---|---|
| 1 | Specimen-based data structure | As alpha-taxonomy is centered on specimens, the repository structure must allow for the identification of data from specimen numbers. Both submission and retrieval/search must include a specimen identifier option. |
| 2 | Sustainability—certainty of perpetual data storage | The naming of organisms is based on the principle of historical priority, and in taxonomy, publications and data do not lose importance over time. The long-term availability of taxonomic data is therefore a sine qua non condition for repositories. This include, but is not limited to, long-term funding (preferably permanent), adequate data backups and if possible, existence of mirrors and contingency strategies. |
| 3 | Adherence to the FAIR principles | The principles of findable, accessible, interoperable, and reusableare partly overlap with the more specific conditions listed in this table; still, overall adherence to the FAIR principles constitutes an important criterion, measurable by “Fair Metrics” (Wilkinson et al. 2018). |
| 4 | Free of charge for data submitters and open access for data users | Many taxonomists do not have access to institutional funds, and many taxonomic journals do not cover repository fees. To be successful in capturing an increasing proportion of taxonomy-related data, a repository must not charge data submission fees. |
| 5 | User-friendly low-complexity workflow for data submission | Time-consuming submission procedures act as strong deterrent in convincing the large community of taxonomists (including amateurs) of the value of making their data available. Furthermore, given the enormous differences among collections in defining and labeling specimens, data-deficient historical specimens, and nonstandardized collections across the world, the amount of mandatory data fields for submission should be minimal (specimen identifier, species name, geographic location). |
| 6 | Submission and storage of data packages from multicollection sets of specimens | Taxonomists typically revise a group of organisms by examining specimens from collections held by multiple institutions, often from different countries and continents. Repositories should allow for coherent data packages containing such multicollection data rather than institutional or national repositories restricting data to those from their collection or country. |
| 7 | Data submission portal with options for taxonomic (specimen-based) data | Even if a repository allows for specimen identifiers, the submission tools are often not optimized for taxonomy-related data. Ideally, a repository should allow bulk submissions of many kinds of data (e.g., DNA sequences, images), linked to specimen identifiers by a separate metadata table. |
| 8 | Machine-accessible for automated data retrieval | Given the prospect of machine-learning tools for species delimitation and species identification, the information in a repository should be automatically retrievable and readable through the web. |
| 9 | Link to taxonomic databases for species identifiers, synonymies, etc. | The assignment of species names to taxonomic data is secondary because these names are bound to change over time. Yet, to facilitate their retrieval, data should be associated as much as possible with accepted and valid genus and species names. Through dynamic links to taxonomic databases, entries can be assigned to species names even if originally entered under different synonyms, declensions, or combinations. |
| 10 | Compliance with taxonomic data standards | While allowing for flexibility and enforcing only a minimal number of metadata fields per data item is preferable, repositories for taxonomic data should ideally be structured in agreement with international taxonomy standards: metadata field names should agree with Darwin Core or ABCD terminology, specimen identifiers should allow for CETAF standards. |
| 11 | Manual search options tailored to the needs of taxonomists | To reflect variation of taxonomic questions, advanced, semantic, and fuzzy searches are desirable. |
| 12 | Data searches possible through other portals | Repositories should be favored for taxonomic data if they are linked to overarching data portals which can be used to search multiple repositories at once. |
| 13 | No limitation to number of data files | Since data packages for taxonomic monographs may contain data on hundreds or thousands of specimens, a repository should not enforce an a priori limit on the number of data items per submission. |
| 14 | Wide use and acceptance by the community | Reinventing the wheel should be avoided and repositories widely accepted and used by the community should be preferred, i.e., repositories (i) where many data have already been submitted by (ii) a large number of different submitters, and (iii) which are listed as standard and recommended by journals and publishers (e.g., Springer Nature and PLoS lists). |