Skip to main content
. 2015 Dec 8;6(6):e01888-15. doi: 10.1128/mBio.01888-15

TABLE 1 .

Glossary of terms used in DNA sequence analysis

Term Abbreviation Definition
16S rRNA gene A slowly evolving gene in bacteria whose sequence
is used for definition of taxa. It is a gene that is
targeted for sequencing in microbiome analysis,
where the goal is enumeration of the taxa
present in a community.
Alignment The process of comparing the sequence of a single
sequencing read or a contig/whole genome
following assembly to a reference genome.
The goal is often to identify the organism
from which a sequencing read came or to
identify variants within the sequence.
Assembly Reconstructing a genome, in whole or in part,
from the fragment sequences produced
by WGS (or mWGS).
Contig A contiguous stretch of sequence produced when
a series of overlapping sequence reads are
merged to produce a single longer sequence.
Dideoxynucleotide sequencing A “classical” method of DNA sequencing that preceded
NGS and is frequently called Sanger sequencing.
Metagenomics Analyzing a mixture of microbial genomes, a metagenome,
without separating the genomes or culturing the organisms.
Metagenomic whole-genome
shotgun sequencing
mWGS The application of WGS to a metagenomics sample. DNA
is extracted from the sample, producing a
mixture of genomes, which are then subjected
to WGS en masse.
Microbiome A community of microbes comprising bacteria, viruses,
and fungi and other eukaryotic microbes. Often
the target of metagenomic analyses.
Next-generation sequencing NGS A collection of DNA sequencing methods that each
use different biochemical approaches and instruments
to produce data in vastly larger amounts, at greatly
lower cost, in shorter time, and with less manual
intervention than previous methods.
Reference genome A genome sequence of a particular organism that can
be used as a standard, e.g., for alignment or
comparison of other genomes.
Read The basic element produced by DNA sequencing.
Sequencing of a DNA fragment produces a
series of bases called a sequencing read.
Sanger sequencing A “classical” method of DNA sequencing that preceded
NGS but was almost exclusively used from the
1970s until the advent of NGS. Compared to
NGS, it produced fewer data, was more
expensive, and required more manual work.
Single nucleotide polymorphism SNP A difference of a single base compared to a reference
genome. These can be substitutions of one
base for another or insertion/deletion
of a base (indel).
Variant Any difference in a DNA sequence compared to
a reference sequence. This can be a single-base
difference (SNP) or insertions, deletions,
inversions, or translocations of larger
stretches of sequence (structural variants).
Whole-genome shotgun sequencing WGS Randomly fragmenting an entire genome and obtaining
DNA sequence from the fragments to produce
a collection of random DNA sequences. This
can be applied to a single bacterium or to a
mixture (metagenomc; see mWGS). These
data can be used to identify variants following
alignment of genes by comparison to sequence
databases or to compare genome structures
following assembly.