Table 2.
Properties of different kinds of specimen-based taxonomic data
a. Raw vs. encoded taxonomic data. | ||
---|---|---|
These two categories differ by the quantitative and qualitative nature of the information they convey, and consequently by their ease and cost of storage. | ||
Description | Raw taxonomic data: One of the multiple facets characterizing a specimen captured by a sensor (e.g., camera, sound recorder, scanner, DNA sequencer). Allows one to represent the different properties of a virtual specimen (e.g., coloration, shape, size, structure, texture, chemical composition, bioacoustics properties). | Encoded taxonomic data: Data already interpreted. |
Strengths | Free of interpretation. Containing much information. Different forms of encoded data can be extracted, including by future methods that do not yet exist. | Although specific file types may exist, it is always possible (and easy) to translate them into a suite of alphanumeric characters compatible with a universal format (such as .csv). Small files, minimal storage cost. |
Weaknesses | Cannot be directly used as input for analyses (need to be interpreted and encoded, either by a human or artificial intelligence). Different and specific storage formats. Large files and high storage cost. | Information restricted to a minimal level. Subjectivity: alternative interpretations (or coding errors) are possible. |
Examples | Photographs, sound and video recordings, microCT scans, chromatograms depicting DNA sequences. | Quantitative measurements, qualitative traits encoded numerically or described in natural language, nucleotide or amino acid sequences, morphometric landmarks. |
b. Taxonomic data of unique vs. multiple specimens. | ||
These two categories of data differ in the way they are (i) submitted to repositories, (ii) searched for (a particular specimen nested in a multiple-specimen data set has to be detectable using | ||
basic search options), and (iii) presented and downloaded on the repository interface. Ideally, for multiple specimen data sets, it should be possible to download either the data measured | ||
for a particular specimen only, or the whole data set. | ||
Description | Unique specimen data: Data or set of data concerning a single specimen. Most often it consists of raw taxonomic data (see above). | Multiple-specimens data sets: Set of data concerning particular trait(s) measured for several specimens. Most often it consists of encoded data. |
Strengths | Specific (individual) searches are easy. | Submission of large data set at once. |
Weaknesses | Case by case treatment unrealistic where specimen numbers are ![]() |
Stringent search and data extraction might be compromised by an inadequate data archiving process. |
Examples | Picture(s) of a specimen, complete mitogenome sequence of a given individual. | Tabular data (.csv), DNA alignment (.fasta, .nex). |