TABLE 1 .
Glossary of terms used in DNA sequence analysis
| Term | Abbreviation | Definition |
|---|---|---|
| 16S rRNA gene | A slowly evolving gene in bacteria whose sequence is used for definition of taxa. It is a gene that is targeted for sequencing in microbiome analysis, where the goal is enumeration of the taxa present in a community. |
|
| Alignment | The process of comparing the sequence of a single sequencing read or a contig/whole genome following assembly to a reference genome. The goal is often to identify the organism from which a sequencing read came or to identify variants within the sequence. |
|
| Assembly | Reconstructing a genome, in whole or in part, from the fragment sequences produced by WGS (or mWGS). |
|
| Contig | A contiguous stretch of sequence produced when a series of overlapping sequence reads are merged to produce a single longer sequence. |
|
| Dideoxynucleotide sequencing | A “classical” method of DNA sequencing that preceded NGS and is frequently called Sanger sequencing. |
|
| Metagenomics | Analyzing a mixture of microbial genomes, a metagenome, without separating the genomes or culturing the organisms. |
|
| Metagenomic whole-genome shotgun sequencing |
mWGS | The application of WGS to a metagenomics sample. DNA is extracted from the sample, producing a mixture of genomes, which are then subjected to WGS en masse. |
| Microbiome | A community of microbes comprising bacteria, viruses, and fungi and other eukaryotic microbes. Often the target of metagenomic analyses. |
|
| Next-generation sequencing | NGS | A collection of DNA sequencing methods that each use different biochemical approaches and instruments to produce data in vastly larger amounts, at greatly lower cost, in shorter time, and with less manual intervention than previous methods. |
| Reference genome | A genome sequence of a particular organism that can be used as a standard, e.g., for alignment or comparison of other genomes. |
|
| Read | The basic element produced by DNA sequencing. Sequencing of a DNA fragment produces a series of bases called a sequencing read. |
|
| Sanger sequencing | A “classical” method of DNA sequencing that preceded NGS but was almost exclusively used from the 1970s until the advent of NGS. Compared to NGS, it produced fewer data, was more expensive, and required more manual work. |
|
| Single nucleotide polymorphism | SNP | A difference of a single base compared to a reference genome. These can be substitutions of one base for another or insertion/deletion of a base (indel). |
| Variant | Any difference in a DNA sequence compared to a reference sequence. This can be a single-base difference (SNP) or insertions, deletions, inversions, or translocations of larger stretches of sequence (structural variants). |
|
| Whole-genome shotgun sequencing | WGS | Randomly fragmenting an entire genome and obtaining DNA sequence from the fragments to produce a collection of random DNA sequences. This can be applied to a single bacterium or to a mixture (metagenomc; see mWGS). These data can be used to identify variants following alignment of genes by comparison to sequence databases or to compare genome structures following assembly. |