Skip to main content
. 2022 Feb 1;100(2):skab346. doi: 10.1093/jas/skab346

Table 1.

Glossary of commonly used microbiome terms

Term Definition
16S rRNA gene Gene encoding the RNA component of the 30S subunit of a prokaryotic ribosome; ubiquitous to bacteria and archaea
Alpha diversity The variance within a sample, used to evaluate the number of different species (usually represented by the number of ASVs) in each sample
Amplicon The fragment of DNA resulting from a primer set after amplification using PCR
ASV Amplicon Sequence Variant: individual sequence variants differing by as little as one nucleotide with no fixed dissimilarity threshold
Barcoding Unique DNA sequences attached to broad range primers before amplification. These unique barcodes allow different samples to be pooled and sequenced together in the same run and later separated during analysis (see demultiplexing)
Beta diversity The variance between samples, usually expressed as a distance matrix
Demultiplexing Separation of sequencing reads from a sequenced pooled library by unique barcodes and assignment to the corresponding samples
Evenness Balance of the features (ASVs, species, etc.) within a sample
Extraction Controls Blank or non-DNA samples (such as an empty sponge) added to a study to assess background laboratory contamination (see also library controls and NTC)
Feature Table Also known as a count table (as when using OTUs, OTU Tables). Table that contains the number of sequences counted for each feature (ASV or OTU most commonly), per sample in a matrix
GUI Graphical User Interface: Computer program that allows users to “point-and-click” as opposed to the command line
HPC High-performance computing cluster: More powerful computer than a local system many universities have shared HPC for high computational jobs
Library Controls Controls included with PCR libraries to assess primer performance and contamination (see NTC)
Library pooling Combines barcoded DNA during library preparation to make one pooled sample of DNA for sequencing. Individual identity is maintained through barcoding
Long-read DNA fragments generated that range in length from 5 kb+, most commonly on a PacBio or Nanopore sequencer
Metadata Data that represent biological data collected, describing the information surrounding the data to provide context for analysis and interpretation
Metagenome Refers to all the genomes represented in a biological mixture
Mock Community A bacterial mixture (internally generated or commercially available) with known proportions of bacterial to assess sequencing quality and act as a positive control
NTC No-template controls: Controls included with PCR libraries to assess primer performance and contamination (see Library control)
Normalization Transformation of raw read numbers to account for uneven read numbers— usually in this method, the ASV numbers are multiplied by a value or proportion.
OTU Operational Taxonomic Unit: clusters of sequencing reads that differ by less than a fixed dissimilarity threshold (usually 3%) see also ASV
Paired-end sequencing A DNA fragment is sequenced from both ends (usually 100- to 300-bp long)
Phylogenetic trees Tree representative of the evolutionary relationship between sequences in the sample can be constructed de novo from only sequences in a dataset or compared with a reference tree
Pipeline A collection of tools, programs, and other codes that are run in succession to produce results (common pipelines include QIIME2, Mothur, and RCP)
Rarifying Randomly subsampling ASVs or OTUs within a sample without replacement to a preselected depth
Raw reads Number of reads generated from each sample; due to sequencing inefficiency, this number will not be the same across samples and thus normalization is needed
Relative abundance Percentage of a total population attributed to one taxon such as phyla or species in relation to other features in the community
Richness Number of different species within a sample, regardless of how they are distributed
Sample pooling Combination of raw sample material (such as equal amounts of rumen fluid) or DNA (not to be confused with library pooling, here no individual identity is maintained)
Short read DNA fragments generated that range in length from 75 to 300 bp, most commonly on an Illumina sequencer
Shotgun metagenomics All DNA within a mixed microbe environment, fragmented, and sequenced. Differs from the amplicon 16S approach as it is not amplifying one target but any piece of the genome.
Single-end sequencing A fragment is sequenced only from one end to the other (usually ~75- to 100-bp long)
Taxonomy Represents the identification and classification of each microorganism, represented by an ASV, present in the community; this is distinct from phylogeny, which represents evolutionary relatedness of the ASVs
V1 to V9 Hypervariable regions studied on the 16S rRNA gene
V4 A common hypervariable region for 16S studies, also the target for the Earth Microbiome Project