. 2022 Feb 1;100(2):skab346. doi: 10.1093/jas/skab346

Table 1.

Glossary of commonly used microbiome terms

Term	Definition
16S rRNA gene	Gene encoding the RNA component of the 30S subunit of a prokaryotic ribosome; ubiquitous to bacteria and archaea
Alpha diversity	The variance within a sample, used to evaluate the number of different species (usually represented by the number of ASVs) in each sample
Amplicon	The fragment of DNA resulting from a primer set after amplification using PCR
ASV	Amplicon Sequence Variant: individual sequence variants differing by as little as one nucleotide with no fixed dissimilarity threshold
Barcoding	Unique DNA sequences attached to broad range primers before amplification. These unique barcodes allow different samples to be pooled and sequenced together in the same run and later separated during analysis (see demultiplexing)
Beta diversity	The variance between samples, usually expressed as a distance matrix
Demultiplexing	Separation of sequencing reads from a sequenced pooled library by unique barcodes and assignment to the corresponding samples
Evenness	Balance of the features (ASVs, species, etc.) within a sample
Extraction Controls	Blank or non-DNA samples (such as an empty sponge) added to a study to assess background laboratory contamination (see also library controls and NTC)
Feature Table	Also known as a count table (as when using OTUs, OTU Tables). Table that contains the number of sequences counted for each feature (ASV or OTU most commonly), per sample in a matrix
GUI	Graphical User Interface: Computer program that allows users to “point-and-click” as opposed to the command line
HPC	High-performance computing cluster: More powerful computer than a local system many universities have shared HPC for high computational jobs
Library Controls	Controls included with PCR libraries to assess primer performance and contamination (see NTC)
Library pooling	Combines barcoded DNA during library preparation to make one pooled sample of DNA for sequencing. Individual identity is maintained through barcoding
Long-read	DNA fragments generated that range in length from 5 kb+, most commonly on a PacBio or Nanopore sequencer
Metadata	Data that represent biological data collected, describing the information surrounding the data to provide context for analysis and interpretation
Metagenome	Refers to all the genomes represented in a biological mixture
Mock Community	A bacterial mixture (internally generated or commercially available) with known proportions of bacterial to assess sequencing quality and act as a positive control
NTC	No-template controls: Controls included with PCR libraries to assess primer performance and contamination (see Library control)
Normalization	Transformation of raw read numbers to account for uneven read numbers— usually in this method, the ASV numbers are multiplied by a value or proportion.
OTU	Operational Taxonomic Unit: clusters of sequencing reads that differ by less than a fixed dissimilarity threshold (usually 3%) see also ASV
Paired-end sequencing	A DNA fragment is sequenced from both ends (usually 100- to 300-bp long)
Phylogenetic trees	Tree representative of the evolutionary relationship between sequences in the sample can be constructed de novo from only sequences in a dataset or compared with a reference tree
Pipeline	A collection of tools, programs, and other codes that are run in succession to produce results (common pipelines include QIIME2, Mothur, and RCP)
Rarifying	Randomly subsampling ASVs or OTUs within a sample without replacement to a preselected depth
Raw reads	Number of reads generated from each sample; due to sequencing inefficiency, this number will not be the same across samples and thus normalization is needed
Relative abundance	Percentage of a total population attributed to one taxon such as phyla or species in relation to other features in the community
Richness	Number of different species within a sample, regardless of how they are distributed
Sample pooling	Combination of raw sample material (such as equal amounts of rumen fluid) or DNA (not to be confused with library pooling, here no individual identity is maintained)
Short read	DNA fragments generated that range in length from 75 to 300 bp, most commonly on an Illumina sequencer
Shotgun metagenomics	All DNA within a mixed microbe environment, fragmented, and sequenced. Differs from the amplicon 16S approach as it is not amplifying one target but any piece of the genome.
Single-end sequencing	A fragment is sequenced only from one end to the other (usually ~75- to 100-bp long)
Taxonomy	Represents the identification and classification of each microorganism, represented by an ASV, present in the community; this is distinct from phylogeny, which represents evolutionary relatedness of the ASVs
V1 to V9	Hypervariable regions studied on the 16S rRNA gene
V4	A common hypervariable region for 16S studies, also the target for the Earth Microbiome Project