Table 1.
Glossary of commonly used microbiome terms
Term | Definition |
---|---|
16S rRNA gene | Gene encoding the RNA component of the 30S subunit of a prokaryotic ribosome; ubiquitous to bacteria and archaea |
Alpha diversity | The variance within a sample, used to evaluate the number of different species (usually represented by the number of ASVs) in each sample |
Amplicon | The fragment of DNA resulting from a primer set after amplification using PCR |
ASV | Amplicon Sequence Variant: individual sequence variants differing by as little as one nucleotide with no fixed dissimilarity threshold |
Barcoding | Unique DNA sequences attached to broad range primers before amplification. These unique barcodes allow different samples to be pooled and sequenced together in the same run and later separated during analysis (see demultiplexing) |
Beta diversity | The variance between samples, usually expressed as a distance matrix |
Demultiplexing | Separation of sequencing reads from a sequenced pooled library by unique barcodes and assignment to the corresponding samples |
Evenness | Balance of the features (ASVs, species, etc.) within a sample |
Extraction Controls | Blank or non-DNA samples (such as an empty sponge) added to a study to assess background laboratory contamination (see also library controls and NTC) |
Feature Table | Also known as a count table (as when using OTUs, OTU Tables). Table that contains the number of sequences counted for each feature (ASV or OTU most commonly), per sample in a matrix |
GUI | Graphical User Interface: Computer program that allows users to “point-and-click” as opposed to the command line |
HPC | High-performance computing cluster: More powerful computer than a local system many universities have shared HPC for high computational jobs |
Library Controls | Controls included with PCR libraries to assess primer performance and contamination (see NTC) |
Library pooling | Combines barcoded DNA during library preparation to make one pooled sample of DNA for sequencing. Individual identity is maintained through barcoding |
Long-read | DNA fragments generated that range in length from 5 kb+, most commonly on a PacBio or Nanopore sequencer |
Metadata | Data that represent biological data collected, describing the information surrounding the data to provide context for analysis and interpretation |
Metagenome | Refers to all the genomes represented in a biological mixture |
Mock Community | A bacterial mixture (internally generated or commercially available) with known proportions of bacterial to assess sequencing quality and act as a positive control |
NTC | No-template controls: Controls included with PCR libraries to assess primer performance and contamination (see Library control) |
Normalization | Transformation of raw read numbers to account for uneven read numbers— usually in this method, the ASV numbers are multiplied by a value or proportion. |
OTU | Operational Taxonomic Unit: clusters of sequencing reads that differ by less than a fixed dissimilarity threshold (usually 3%) see also ASV |
Paired-end sequencing | A DNA fragment is sequenced from both ends (usually 100- to 300-bp long) |
Phylogenetic trees | Tree representative of the evolutionary relationship between sequences in the sample can be constructed de novo from only sequences in a dataset or compared with a reference tree |
Pipeline | A collection of tools, programs, and other codes that are run in succession to produce results (common pipelines include QIIME2, Mothur, and RCP) |
Rarifying | Randomly subsampling ASVs or OTUs within a sample without replacement to a preselected depth |
Raw reads | Number of reads generated from each sample; due to sequencing inefficiency, this number will not be the same across samples and thus normalization is needed |
Relative abundance | Percentage of a total population attributed to one taxon such as phyla or species in relation to other features in the community |
Richness | Number of different species within a sample, regardless of how they are distributed |
Sample pooling | Combination of raw sample material (such as equal amounts of rumen fluid) or DNA (not to be confused with library pooling, here no individual identity is maintained) |
Short read | DNA fragments generated that range in length from 75 to 300 bp, most commonly on an Illumina sequencer |
Shotgun metagenomics | All DNA within a mixed microbe environment, fragmented, and sequenced. Differs from the amplicon 16S approach as it is not amplifying one target but any piece of the genome. |
Single-end sequencing | A fragment is sequenced only from one end to the other (usually ~75- to 100-bp long) |
Taxonomy | Represents the identification and classification of each microorganism, represented by an ASV, present in the community; this is distinct from phylogeny, which represents evolutionary relatedness of the ASVs |
V1 to V9 | Hypervariable regions studied on the 16S rRNA gene |
V4 | A common hypervariable region for 16S studies, also the target for the Earth Microbiome Project |