Skip to main content
. 2020 Apr 16;69(6):1231–1253. doi: 10.1093/sysbio/syaa026

Table 2.

Properties of different kinds of specimen-based taxonomic data

a. Raw vs. encoded taxonomic data.
These two categories differ by the quantitative and qualitative nature of the information they convey, and consequently by their ease and cost of storage.
Description Raw taxonomic data: One of the multiple facets characterizing a specimen captured by a sensor (e.g., camera, sound recorder, scanner, DNA sequencer). Allows one to represent the different properties of a virtual specimen (e.g., coloration, shape, size, structure, texture, chemical composition, bioacoustics properties). Encoded taxonomic data: Data already interpreted.
Strengths Free of interpretation. Containing much information. Different forms of encoded data can be extracted, including by future methods that do not yet exist. Although specific file types may exist, it is always possible (and easy) to translate them into a suite of alphanumeric characters compatible with a universal format (such as .csv). Small files, minimal storage cost.
Weaknesses Cannot be directly used as input for analyses (need to be interpreted and encoded, either by a human or artificial intelligence). Different and specific storage formats. Large files and high storage cost. Information restricted to a minimal level. Subjectivity: alternative interpretations (or coding errors) are possible.
Examples Photographs, sound and video recordings, microCT scans, chromatograms depicting DNA sequences. Quantitative measurements, qualitative traits encoded numerically or described in natural language, nucleotide or amino acid sequences, morphometric landmarks.
b. Taxonomic data of unique vs. multiple specimens.
These two categories of data differ in the way they are (i) submitted to repositories, (ii) searched for (a particular specimen nested in a multiple-specimen data set has to be detectable using
basic search options), and (iii) presented and downloaded on the repository interface. Ideally, for multiple specimen data sets, it should be possible to download either the data measured
for a particular specimen only, or the whole data set.
Description Unique specimen data: Data or set of data concerning a single specimen. Most often it consists of raw taxonomic data (see above). Multiple-specimens data sets: Set of data concerning particular trait(s) measured for several specimens. Most often it consists of encoded data.
Strengths Specific (individual) searches are easy. Submission of large data set at once.
Weaknesses Case by case treatment unrealistic where specimen numbers are Inline graphic Stringent search and data extraction might be compromised by an inadequate data archiving process.
Examples Picture(s) of a specimen, complete mitogenome sequence of a given individual. Tabular data (.csv), DNA alignment (.fasta, .nex).