Skip to main content
. 2020 Aug 13;12:71. doi: 10.1186/s13073-020-00765-y

Box 1.

Terminology for microbial community strain analysis

Strikingly, there is no universal definition of what constitutes a microbial strain (or, for that matter, species) [2, 3]. Many factors contribute to this difficulty, including the rapidity of microbial evolution, the plasticity of many microbial genomes, the prevalence of mobile elements and lateral transfers, the difficulty in differentiating between many microbial taxa or clades by non-molecular methods, and the overall natural history of microbiology and microbial systematics. This ambiguity has led to a field in which different microbial strains of the same species can differ by as much as 5% nucleotide identity, or 30% or more of their gene content [4]. As such, even apparently benign, phenotypically similar microbial strain variants can differ genomically more than most eukaryotic species, and most related terminology can be context-dependent or defined operationally:

Species: microbial species have been variously defined based on (1) whole-genome or pangenome nucleotide or amino acid phylogenetic identity thresholds; (2) gross microbial physiology / morphology / phenotype; (3) phenotypes induced by a microbe on its host or environment (e.g., human pathogens); and (4) the host or environment of a microbe, e.g., a specific geographical or biochemical origin [5]. The more than 100-year history of microbial systematics must thus be constantly resolved against new, and emerging, molecular and phenotypic information, leading to operational definitions of microbial species in roughly the two categories of “clades defined as species at some previous point” versus “clades that meet specific quantitative phylogenetic criteria” [6]. These two definitions can be considered roughly equivalent if phylogeny (genotype) is considered to be a trait (i.e. phenotype) by which isolates or community members can be classified into self-similar groups.

Species group or complex: a group of taxonomically defined species that are not well-differentiated based on genomic or other criteria [7]. These typically arise in microbial systematics due to multiple independent identifications of what later prove to be (essentially) the same organism. Conversely, individual taxonomically defined microbial species can later prove to represent implicit complexes, if they, e.g., are not initially differentiated by physiology but are later found to be molecularly distinct.

Subspecies clade: in communities, an operationally defined group of related organisms or radius of phylogenetic divergence smaller than, and contained within, a parent species [8]. This allows microbial genotypes within communities to be manipulated independently of their potential systematics, since, e.g., some taxonomically defined species may unintentionally capture widely divergent genotypes (and are thus better described using multiple subspecies clades), while others may prove to be closely related or near-identical (and are thus better described as a single species complex). Historically, subspecies have also referred to phenotypically distinct groups within a species [5], which may or may not be monophyletic.

Isolate: a presumed clonal strain grown, assayed, and manipulated (presumably) axenically (i.e., in monoculture), typically in vitro, after a process such as streaking and/or colony picking [9]. As per canonical references such as Bergey’s Manual [10], when not defined genomically, isolates have been commonly differentiated based on phenotypes such as morphology; medium specificity; serologic, phage, or bacteriocin sensitivity; biochemical reactions; pathogenicity; or other microbial physiology.

Strain: Historically, this has meant a microbial isolate, although the definition is not well-suited to microbial community studies. In this context, the term is used variously to refer to a specific microbial genome or collection of clonally identical cells (i.e., a genotype); one or more colonies (believed to be) derived from the same progenitor cell; or most often, in practice, a collection of cells or genomes within a relatively small range of phylogenetic variation (i.e., a very narrow subspecies clade).