Abstract
Structural variation in bacterial genomes is an important evolutionary driver. Genomic rearrangements, such as inversions, duplications, and insertions, can regulate gene expression and promote niche adaptation. Importantly, many of these variations are reversible and preprogrammed to generate heterogeneity. While many tools have been developed to detect structural variation in eukaryotic genomes, variation in bacterial genomes and metagenomes remains understudied. However, recent advances in genome sequencing technology and the development of new bioinformatic pipelines hold promise in further understanding microbial genomics.
Next generation sequencing and high throughput technologies have advanced the field of microbiology dramatically over the last two decades. Whereas bacterial genomes were thought to be fairly static in the days-to-weeks context, the application of high throughput sequencing approaches has revealed that bacterial genomes are, in fact, highly plastic. Bacteria can adapt rapidly to environmental fluctuations by modulating genomic content through genomic rearrangements, sometimes reversibly. The functional consequences of these types of genomic adaptation have been demonstrated elegantly in many bacterial pathogens, particularly in regions of the genome that have roles in host-microbe and microbe-microbe interactions. However, many questions remain as to exactly where, when, and how frequently these variations occur.
In the first part of this review, we will highlight a few key examples of structural variation (SV) in bacteria and describe how these genomic alterations affect phenotype. Then we will discuss the challenges of discovering new variations using next generation sequencing technology. Finally, we will highlight how long read sequencing identifies novel genomic variations in both isolates and in complex communities.
Mechanisms of SV in bacterial genomes
In bacteria, SV is typically defined as a rearrangement that occurs in a region of the genome greater than 50 base pairs in size. While the different types of SV can be defined in many different ways, we will break down the most common into three separate, broad categories: inversion, duplication, and insertion.
Inversion
An inversion occurs when a segment of DNA is excised and recombined in the opposite direction. Generally, the inverted region is flanked by high homology sequences or shared recognition sites and are primarily driven by specific enzymes or through prophage activation. In many cases, these inversions are reversible and appear to be part of a preprogrammed scheme to generate heterogeneity, alter gene expression, or adapt to specific niches. In bacteria there are two types of widely observed inversion: site specific recombination and large chromosomal inversions.
Site specific recombination is a targeted, preprogrammed, and reversible inversion of genomic segments. These segments can range in size from 100s to 1000s of base pairs. This process is enzymatically driven by recombinases that recognize two palindromic flanking sequences, often referred to as inverted repeats, which can vary in length between 10 and 30 base pairs and may contain mismatches [1–5]. In bacteria, site specific recombination is mediated by tyrosine or small serine-type recombinases, named for the catalytically active amino acid that carries out the nucleophilic attack on the DNA backbone [4,6,7]. Enzyme complex activity can be uni- or bi-directional and may be influenced by additional small proteins or accessory factors [2]. Depending on the prevalence of the recognition sites, recombinases can act on a single locus or at multiple loci.
Site specific recombination is widespread and found across many different bacterial phyla [8,9]. Most well characterized inversions flip regulatory regions or promoter sequences, forming an ON/OFF switch. When the region is in the same orientation as the downstream locus, transcription occurs. However when it is flipped to face the opposite direction, transcription no longer happens. Site specific recombination can also occur within and across coding regions resulting in domain switching [10–13]. When this occurs, enzymes can change specificity by swapping modular specificity-determining domains (Fig. 1A). Many well studied examples of site specific recombination have been identified in regions important for host-microbe interactions and microbe-microbe interactions [1]. For example, the phase variable Bacteroides capsular proteins both modulate the immune response and protect against phage predation (Round et al. 2011; Porter et al. 2020). Large chromosomal inversions occur in regions greater than 10,000 base pairs. Similar to site specific recombination regions, they often have homologous regions at their ends; however, unlike site specific recombination, they are hypothesized to be prophage driven or are a result of faulty recombination [14–17]. Intriguingly, these large chromosomal rearrangements can facilitate niche adaptation and in some instances, may be reversible [18]. They are also fairly common among clinical isolates [19].
Figure 1: SV reversibly alters restriction modification and methylation enzymes.

Restriction modification systems can both methylate and cleave DNA. DNA cleavage is one of the main mechanisms of bacteriophage defense. Whereas DNA methylation patterns are important both for self DNA recognition and transcriptional regulation. The restriction modification loci contains hsdR (restriction), hsdM (modification), and hsdS (specificity) genes. The hsdS gene contains two target recognition domains (TRD) which determine the specificity of the enzymes. The hsdS gene can be reversibly modified affecting its specificity. (A) In regions that vary by site specific recombination, loci contain multiple copies of hsdS. Generally, only one copy of hsdS (hsdSA) is transcribed [13]. The additional copies of hsdS (hsdSB, hsdSC) function as alternative alleles. Site specific recombinases flip DNA sequences in between inverted repeats (dashed lines) and shuffle TRDs between the transcribed and non-transcribed hsdS genes [66]. Invertible systems have been observed in many bacteria including Streptococcus pneumoniae [12] and Bacteroides fragilis [67]. (B) In other organisms, hsdS gene specificity varies by slipped-strand mispairing. In many hsdS genes, simple nucleotide repeats separate TRDs [32]. Increases or decreases in the number or repeats can cause premature truncation. Truncated HsdS form dimers altering specificity. A well characterized example of slipped-strand mispairing is found in Neisseria gonorrhoeae [68].
Duplication
Genomic duplications occur when regions of the genome are copied. Duplication is found across all domains of life and can contribute to a large percentage of an organism’s genome [20]. Duplication events that involve the replication of genes are often referred to as copy number variation.
Evolutionarily, there can be multiple outcomes resulting from gene duplication. If a higher level of protein production is advantageous, identical genes are maintained. For example, many bacteria harbor multiple copies of ribosomal RNA which likely enables these bacteria to ramp up protein production relatively easily when needed for rapid adaptation to changing environments [21]. Duplication and divergence of gene sequences can also result in specialization. Alternatively, if multiple gene copies are detrimental, pseudogene or gene loss can occur.
In contrast to genomic duplication that is preserved for the long term, short term adaptive amplification has also been described. Often referred to as multicopy duplication or gene accordions, rapid copy generation has been observed transiently in response to stress [22–25]. The rapid accumulation of identical genes increases the amount of protein produced. Adaptive amplification can cause antibiotic resistance, niche adaptation, and altered host-microbe interactions [26,27]. The exact mechanisms that drive this phenomenon in vivo are not known, but it is postulated to be driven by homologous recombination after the initial duplication event [25].
Finally, slipped-strand mispairing, which can occur in regions of repetitive DNA sequences, can result in expansion or deletion of repeated sequences of DNA. These repetitive elements can influence downstream gene expression by altering regulatory elements, causing frameshifts, or generating alternative start sites [28–32] (Fig. 1B).
Insertion
Insertion is the addition of a DNA sequence into a region of the genome where it was not present prior. There are many different types of insertions that can occur in bacteria. Some involve DNA taken up from outside of the cell via conjugation or natural competency [33], while others involve the duplication or movement of genetic elements from within the genome to another position. Insertion can be carried out by homologous or non homologous recombination. Consequences of insertion vary depending on the location of insertion and what elements are carried within it.
A main driving force of insertional events are mobile genetic elements. Common types of these elements are insertion sequences (also known as transposases), integrons, prophages, and transposons. Mobile genetic elements can excise themselves and move in a site-specific or non-specific manner throughout the host genome [34–36]. Additionally, many mobile genetic elements contain cargo genes that cause antibiotic resistance, toxin production, or enhanced metabolic capabilities [35]. Downstream consequences of insertion largely depend on where in the genome the mobile elements insert. If insertion is within a coding region, that gene may become inactivated. Alternatively, many mobile elements contain regulatory elements or promoters, which may upregulate adjacent loci [37,38].
An additional type of insertional event is translocation. Translocations involve the transfer of genetic material from one part of the chromosome to another. Similarly to inversions, these types of SV are balanced and do not result in the net loss or gain of genomic content. However, these rearrangements may be biased to occur in specific locations of the genome (Darling et al. 2008). Phages are a common source of both insertion and translocation events in many bacterial genomes [39]. Depending upon their location of insertion and region translocated, transcription and expression patterns may be altered both inside and out of the translocation (Nagy-Staron et al. 2021).
Challenges of detecting SV in bacterial genomes
Despite the demonstrated importance of SV, these phenomena remain understudied in bacteria and especially in the context of microbiomes. In particular, the prevalence of SV across the domain Bacteria and its resulting effects on phenotype and evolution are not well understood.
A recent attempt to systematically identify invertible sequences in bacteria revealed their presence in species spanning 10 bacterial phyla, whereas there were only a handful of examples prior [8]. A general search for deletions and duplications in bacteria from the human microbiome revealed thousands of events across 56 different bacterial species, and at least one event in every species analyzed [40]. An analysis of bacterial SVs in bacteria from the microbiome samples of 1,437 individuals revealed strong associations between bacterial SVs and host bile acid composition [41]. Despite these recent studies demonstrating the prevalence and potential importance of these events, there is relatively little attention paid to SVs when compared with work performed on single nucleotide polymorphisms [42].
The neglect of SV is, in part, due to the technical difficulty of identifying SV with short read sequencing data. Short read sequences and sequencing library insert sizes are significantly shorter than many structural variants, limiting the accuracy and methodologies available for identification of these elements from short read sequencing data. Specifically, de novo assembly from short read sequencing data often fails to assemble through repetitive and low complexity regions, which frequently flank SVs, sometimes causing ambiguity in structural variant identification (for example: making it difficult to distinguish an insertion from a tandem duplication [43].
Detection SV via short read sequencing
Despite these challenges, many effective short read-based approaches for SV detection exist (Table 1), although most are targeted towards eukaryotic genomes. We highlight a subset of these approaches in this review (Table 1), however many more have been developed [47,57]. SV detection methods generally fall into two categories: (i) mapping based approaches and (ii) assembly based approaches [43]. Assembly based approaches leverage either an assembly graph or a de novo assembly for a given sample that is then used to compare to a reference genome, where SVs can be identified as differences between the assemblies. Cortex [44] and SGVar [45] are successful examples of assembly graph-based approaches, although both were developed and tested for use in eukaryotic isolate sequencing samples. Mapping based approaches map raw sequencing reads to a reference assembly, identifying potential SVs as regions of unexpected mapping patterns. For example, LUMPY [46] and Manta [47] analyze read depth, paired end read discordance, and split reads to identify potential SVs. MGEfinder looks for clipped read alignments to identify possible insertions and is optimized for use with bacterial genomes [48]. Most individual approaches specialize or perform better in detecting specific types of SVs [49], and picking a suite of approaches to broaden the range of detectable SVs is common, but the particular combination of approaches selected should be considered carefully [49].
Table 1:
Selected computational SV detection methods described in this review. For more methods, see [49] and [62].
| Name | Publication | Approach | Types of SV | Long-read compatible | Benchmarked in bacteria | Benchmarked in shotgun metagenomes |
|---|---|---|---|---|---|---|
| Cortex | [44] | Assembly | Insertions, deletions | |||
| SGVar | [45] | Assembly | Insertions, deletions | |||
| LUMPY | [46] | Mapping | Insertions, deletions, duplications, inversions, translocations | |||
| Manta | [47] | Mapping | Insertions, deletions, duplications, inversions, translocations | |||
| MGEfinder | [48] | Mapping | Insertions | yes | ||
| PhaseFinder | [8] | Mapping | Inversions | yes | yes | |
| SGVFinder | [40] | Mapping | Duplications | yes | yes | |
| breseq | [69] | Mapping | Insertions, deletions | yes | ||
| cuteSV | [61] | Mapping | Insertions, deletions, duplications, inversions, translocations | yes | ||
| Sniffles | [50] | Mapping | Insertions, deletions, duplications, inversions, translocations | yes | ||
| combiSV | [62] | Multi-method | Insertions, deletions, duplications, inversions, translocations | yes |
Detecting SV in metagenomic sequencing data
Shotgun metagenomic sequencing is an indispensable tool in bacterial genomics, and metagenomes offer the unique possibility to observe the full range of SVs present in natural bacterial populations. However, certain characteristics of metagenomes, relative to isolate genomes, require methods to either be developed with metagenomes in mind or benchmarked appropriately. In a metagenomic sample, a mixed population of organisms from the same strain or species are often present, introducing possible SV heterogeneity. In addition, mis-mapping reads from related species may influence false positive discovery rates. A subset of methods for SV detection have been designed for metagenomic data, including PhaseFinder for identifying inversions [8] and SGVfinder for deletions and duplications [40]. Methods that consider low frequency alleles may also be applicable, such as Sniffles [50].
Detecting SV via long-read sequencing
Long read sequencing technologies offer the potential to expand our ability to identify and characterize SV. For example, long read sequencing revealed SV in the human genome missed for years by short read sequencing [51]. Long reads can sequence through entire SV events, including large, multi-kb events, and assemble through flanking repetitive regions, allowing for SV detection with precision, simplifying detection approaches, and even resolving SV events undetectable with short read sequencing data [52–54].
Two technologies currently dominate long read sequencing [55], Oxford Nanopore and PacBio SMRT sequencing. SMRT sequencing utilizes a polymerase tethered to the bottom of a well that fluoresces upon addition of a nucleotide [56]. Nanopore sequencing passes a molecule through a protein nanopore, where the nucleotide can be inferred from the resulting ionic current fluctuations [57]. Both technologies have historically suffered from a sequencing error rate much higher than that of short read sequencing. A high error rate poses problems for detecting some SVs by convoluting identification of rare and low coverage events. In nanopore data, indels and substitutions introduced by sequencing error are frequent but not uniform; homopolymers are particularly difficult to resolve [58]. SMRT sequencing similarly struggles with resolving homopolymers [59]. Despite issues posed by higher error rates, long reads are invaluable in detecting SV events currently missed by short read sequencing, and long-read sequencing technologies continue to improve, with overall error rates at the read level asserted to have been reduced <1% for both SMRT sequencers [59] and nanopore sequencers [60].
Methods for SV detection utilizing long read sequencing exist but are fewer in number. Notably, none have been tested for use with shotgun metagenomic data. cuteSV utilizes long read mapping to extract a variety of alignment signatures for identification of SVs [61]. Sniffles also uses a long read mapping approach, combined with subsampling of read alignments to adjust mismapping parameters for each individual dataset, resulting in higher precision [50]. combiSV [62] combines six different long read SV callers to improve overall precision and recall. In addition, both mapping and assembly based methods for SV detection rely on high quality genome assemblies to make inferences. Here long read sequencing has already made large inroads by enabling reliable reconstruction of high quality genomes from isolate [63,64] and metagenomic [54,65] sequencing. This is particularly exciting for metagenomic sequencing experiments, because short read metagenomes typically produce more fragmented genomes than isolate sequencing, resulting in missed SV events.
Conclusion
Over the last two decades, next generation sequencing coupled with computational analyses have revolutionized bacteriology. Much of the increase in bacterial genomic data has relied on short read sequencing technology. Despite the explosion in available data and genomic information, the limitations of short read sequencing have been unable to address fundamental questions about SV in bacterial genomics. Advancements in long read sequencing technologies are helping address this gap in knowledge. Additionally, microbial public sequencing datasets and assembled reference genomes represent a largely untapped resource for SV mining, and can be combined with newly generated datasets to make new inferences. Further development of methodologies for the identification of SVs in both bacterial isolate and metagenomic sequencing samples will be needed to fully make use of these resources.. Given the rich diversity of SVs that exist in bacteria, ranging from inversions to duplications to insertions, we anticipate that the application of improved tools to study these events will enable us to better understand how SVs contribute to bacterial phenotypic variability and adaptability.
Highlights.
Bacteria can adapt to environmental fluctuations through genome rearrangement
Inversions, duplications, translocations, and insertions are common methods of adaptation
Short read sequencing can identify various structural variants
Long read sequencing enables detection of a greater breadth of structural variants and variants undetectable via short read sequencing
Structural variation plays a significant and largely underappreciated role in biology
Funding:
This work was supported by the National Institutes of Health R01 AI148623, R01 AI143757 and DK12829201, a grant from the Emerson Collective, and an award from the Stand Up 2 Cancer Foundation. R.B.C. is supported by a National Institutes of Health T32 training grant HG000044.
References and recommended reading
Papers of particular interest, published within the period of review, have been highlighted as:
• of special interest
•• of outstanding interest.
- 1. Trzilova D, Tamayo R: Site-Specific Recombination - How Simple DNA Inversions Produce Complex Phenotypic Heterogeneity in Bacterial Populations. Trends Genet 2021, 37:59–72. •The authors of this recent review provide a comprehensive and detailed summary of the mechanisms of site-specific recombination in bacteria and their phenotypic outcomes.
- 2.Johnson RC: Hin/Gin-Mediated Site-Specific DNA Inversion. In Brenner’s Encyclopedia of Genetics (Second Edition). Edited by Maloy S, Hughes K. Academic Press; 2013:462–466. [Google Scholar]
- 3.Grindley NDF, Whiteson KL, Rice PA: Mechanisms of site-specific recombination. Annu Rev Biochem 2006, 75:567–605. [DOI] [PubMed] [Google Scholar]
- 4.Stark WM: The Serine Recombinases. Microbiol Spectr 2014, 2. [DOI] [PubMed] [Google Scholar]
- 5.Johnson RC: Mechanism of site-specific DNA inversion in bacteria. Curr Opin Genet Dev 1991, 1:404–411. [DOI] [PubMed] [Google Scholar]
- 6.Meinke G, Bohm A, Hauber J, Pisabarro MT, Buchholz F: Cre Recombinase and Other Tyrosine Recombinases. Chem Rev 2016, 116:12785–12820. [DOI] [PubMed] [Google Scholar]
- 7.Johnson RC: Site-specific DNA Inversion by Serine Recombinases. Microbiol Spectr 2015, 3:MDNA3-0047-2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Jiang X, Hall AB, Arthur TD, Plichta DR, Covington CT, Poyet M, Crothers J, Moses PL, Tolonen AC, Vlamakis H, et al. : Invertible promoters mediate bacterial phase variation, antibiotic resistance, and host adaptation in the gut. Science 2019, 363:181–187. ••The authors develop a method for identification of invertible sequences, or invertons, in bacterial genomes and use it to show they are much more prevalent than previously thought and that antibiotic treatment can induce inversion of promoter containing invertons adjacent to antibiotic resistance genes.
- 9.Sekulovic O, Mathias Garrett E, Bourgeois J, Tamayo R, Shen A, Camilli A: Genome-wide detection of conservative site-specific recombination in bacteria. PLoS Genet 2018, 14:e1007332. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Nakayama-Imaohji H, Hirakawa H, Ichimura M, Wakimoto S, Kuhara S, Hayashi T, Kuwahara T: Identification of the site-specific DNA invertase responsible for the phase variation of SusC/SusD family outer membrane proteins in Bacteroides fragilis. J Bacteriol 2009, 191:6003–6011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Kuwahara T, Yamashita A, Hirakawa H, Nakayama H, Toh H, Okada N, Kuhara S, Hattori M, Hayashi T, Ohnishi Y: Genomic analysis of Bacteroides fragilis reveals extensive DNA inversions regulating cell surface adaptation. Proc Natl Acad Sci U S A 2004, 101:14919–14924. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Li J, Li J-W, Feng Z, Wang J, An H, Liu Y, Wang Y, Wang K, Zhang X, Miao Z, et al. : Epigenetic Switch Driven by DNA Inversions Dictates Phase Variation in Streptococcus pneumoniae. PLoS Pathog 2016, 12:e1005762. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Li J-W, Li J, Wang J, Li C, Zhang J-R: Molecular Mechanisms of hsdS Inversions in the cod Locus of Streptococcus pneumoniae. J Bacteriol 2019, 201. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Guérillot R, Kostoulias X, Donovan L, Li L, Carter GP, Hachani A, Vandelannoote K, Giulieri S, Monk IR, Kunimoto M, et al. : Unstable chromosome rearrangements in Staphylococcus aureus cause phenotype switching associated with persistent infections. Proc Natl Acad Sci U S A 2019, 116:20135–20140. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Yebra G, Haag AF, Neamah MM, Wee BA, Richardson EJ, Horcajo P, Granneman S, Tormo-Más MÁ, de la Fuente R, Fitzgerald JR, et al. : Radical genome remodelling accompanied the emergence of a novel host-restricted bacterial pathogen. PLoS Pathog 2021, 17:e1009606. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Fitzgerald SF, Lupolova N, Shaaban S, Dallman TJ, Greig D, Allison L, Tongue SC, Evans J, Henry MK, McNeilly TN, et al. : Genome structural variation in Escherichia coli O157:H7. Microb Genom 2021, 7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Gutiérrez R, Markus B, Carstens Marques de Sousa K, Marcos-Hadad E, Mugasimangalam RC, Nachum-Biala Y, Hawlena H, Covo S, Harrus S: Prophage-Driven Genomic Structural Changes Promote Bartonella Vertical Evolution. Genome Biol Evol 2018, 10:3089–3103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Brandis G, Hughes D: The SNAP hypothesis: Chromosomal rearrangements could emerge from positive Selection during Niche Adaptation. PLoS Genet 2020, 16:e1008615. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Hughes D: Evaluating genome dynamics: the constraints on rearrangements within bacterial genomes. Genome Biol 2000, 1:REVIEWS0006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Zhang J: Evolution by gene duplication: an update. Trends in Ecology & Evolution 2003, 18:292–298. [Google Scholar]
- 21.Klappenbach JA, Dunbar JM, Schmidt TM: rRNA operon copy number reflects ecological strategies of bacteria. Appl Environ Microbiol 2000, 66:1328–1333. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Slack A, Thornton PC, Magner DB, Rosenberg SM, Hastings PJ: On the mechanism of gene amplification induced under stress in Escherichia coli. PLoS Genet 2006, 2:e48. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Hastings PJ: Adaptive amplification. Crit Rev Biochem Mol Biol 2007, 42:271–283. [DOI] [PubMed] [Google Scholar]
- 24.Romero D, Palacios R: Gene amplification and genomic plasticity in prokaryotes. Annu Rev Genet 1997, 31:91–111. [DOI] [PubMed] [Google Scholar]
- 25.Elliott KT, Cuff LE, Neidle EL: Copy number change: evolving views on gene amplification. Future Microbiol 2013, 8:887–899. [DOI] [PubMed] [Google Scholar]
- 26.Nicoloff H, Hjort K, Levin BR, Andersson DI: The high prevalence of antibiotic heteroresistance in pathogenic bacteria is mainly caused by gene amplification. Nat Microbiol 2019, 4:504–514. [DOI] [PubMed] [Google Scholar]
- 27. Belikova D, Jochim A, Power J, Holden MTG, Heilbronner S: “Gene accordions” cause genotypic and phenotypic heterogeneity in clonal populations of Staphylococcus aureus. Nat Commun 2020, 11:3526. •The authors use both long and short read sequencing to identify regions of Staphylococcus aureus clinical isolate genomes that rapidly expand during in vitro and in vivo growth. They further characterize the functional consequences of one of these genes, csa1, which has an immunomodulatory effect.
- 28.van Belkum A, Scherer S, van Alphen L, Verbrugh H: Short-sequence DNA repeats in prokaryotic genomes. Microbiol Mol Biol Rev 1998, 62:275–293. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.van Belkum A, van Leeuwen W, Scherer S, Verbrugh H: Occurrence and structure-function relationship of pentameric short sequence repeats in microbial genomes. Res Microbiol 1999, 150:617–626. [DOI] [PubMed] [Google Scholar]
- 30.Levinson G, Gutman GA: Slipped-strand mispairing: a major mechanism for DNA sequence evolution. Mol Biol Evol 1987, 4:203–221. [DOI] [PubMed] [Google Scholar]
- 31.Moxon R, Bayliss C, Hood D: Bacterial contingency loci: the role of simple sequence DNA repeats in bacterial adaptation. Annu Rev Genet 2006, 40:307–333. [DOI] [PubMed] [Google Scholar]
- 32.Atack JM, Guo C, Yang L, Zhou Y, Jennings MP: DNA sequence repeats identify numerous Type I restriction-modification systems that are potential epigenetic regulators controlling phase-variable regulons; phasevarions. FASEB J 2020, 34:1038–1051. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Bellanger X, Payot S, Leblond-Bourget N, Guédon G: Conjugative and mobilizable genomic islands in bacteria: evolution and diversity. FEMS Microbiol Rev 2014, 38:720–760. [DOI] [PubMed] [Google Scholar]
- 34.Gillings MR: Integrons: past, present, and future. Microbiol Mol Biol Rev 2014, 78:257–277. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Rankin DJ, Rocha EPC, Brown SP: What traits are carried on mobile genetic elements, and why? Heredity 2011, 106:1–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Stokes HW, Gillings MR: Gene flow, mobile genetic elements and the recruitment of antibiotic resistance genes into Gram-negative pathogens. FEMS Microbiol Rev 2011, 35:790–819. [DOI] [PubMed] [Google Scholar]
- 37.Zlitni S, Bishara A, Moss EL, Tkachenko E, Kang JB, Culver RN, Andermann TM, Weng Z, Wood C, Handy C, et al. : Strain-resolved microbiome sequencing reveals mobile elements that drive bacterial competition on a clinical timescale. Genome Med 2020, 12:50. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Chu ND, Clarke SA, Timberlake S, Polz MF, Grossman AD, Alm EJ: A Mobile Element in mutS Drives Hypermutation in a Marine Vibrio. MBio 2017, 8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Toussaint A, Rice PA: Transposable phages, DNA reorganization and transfer. Curr Opin Microbiol 2017, 38:88–94. [DOI] [PubMed] [Google Scholar]
- 40. Zeevi D, Korem T, Godneva A, Bar N, Kurilshikov A, Lotan-Pompan M, Weinberger A, Fu J, Wijmenga C, Zhernakova A, et al. : Structural variation in the gut microbiome associates with host health. Nature 2019, 568:43–48. •The authors develop a method for detection of insertions and deletions within shotgun metagenomic sequencing samples, SGVFinder, and show that SV within gut microbiota are highly prevalent, specific to an individual host’s gut community, and highly associated with disease risk factors.
- 41. Wang D, Doestzada M, Chen L, Andreu-Sánchez S, van den Munckhof ICL, Augustijn HE, Koehorst M, Ruiz-Moreno AJ, Bloks VW, Riksen NP, et al. : Characterization of gut microbial structural variations as determinants of human bile acid metabolism. Cell Host Microbe 2021, 29:1802–1814.e5. ••The authors profile insertions and deletions in metagenome assembled genomes from the microbiomes of over 1,400 individuals, revealing strong associations between bacterial SV events and host bile acid composition
- 42.Garud NR, Good BH, Hallatschek O, Pollard KS: Evolutionary dynamics of bacteria in the gut microbiome within and across hosts. PLoS Biol 2019, 17:e3000102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Mahmoud M, Gobet N, Cruz-Dávalos DI, Mounier N, Dessimoz C, Sedlazeck FJ: Structural variant calling: the long and the short of it. Genome Biol 2019, 20:246. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Iqbal Z, Caccamo M, Turner I, Flicek P, McVean G: De novo assembly and genotyping of variants using colored de Bruijn graphs. Nat Genet 2012, 44:226–232. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Tian S, Yan H, Klee EW, Kalmbach M, Slager SL: Comparative analysis of de novo assemblers for variation discovery in personal genomes. Brief Bioinform 2018, 19:893–904. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Layer RM, Chiang C, Quinlan AR, Hall IM: LUMPY: a probabilistic framework for structural variant discovery. Genome Biol 2014, 15:R84. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Chen X, Schulz-Trieglaff O, Shaw R, Barnes B, Schlesinger F, Källberg M, Cox AJ, Kruglyak S, Saunders CT: Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications. Bioinformatics 2016, 32:1220–1222. [DOI] [PubMed] [Google Scholar]
- 48. Durrant MG, Li MM, Siranosian BA, Montgomery SB, Bhatt AS: A Bioinformatic Analysis of Integrative Mobile Genetic Elements Highlights Their Role in Bacterial Adaptation. Cell Host Microbe 2020, 27:140–153.e9. •The authors develop a toolkit for mobile genetic element and insertion site detection, MGEfinder, and use it to identify mobile genetic elements and frequent sites of insertion in over 12,000 bacterial isolate genomes.
- 49.Kosugi S, Momozawa Y, Liu X, Terao C, Kubo M, Kamatani Y: Comprehensive evaluation of structural variation detection algorithms for whole genome sequencing. Genome Biol 2019, 20:117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Sedlazeck FJ, Rescheneder P, Smolka M, Fang H, Nattestad M, von Haeseler A, Schatz MC: Accurate detection of complex structural variations using single-molecule sequencing. Nat Methods 2018, 15:461–468. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Chaisson MJP, Wilson RK, Eichler EE: Genetic variation and the de novo assembly of human genomes. Nat Rev Genet 2015, 16:627–640. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Ring N, Abrahams JS, Jain M, Olsen H, Preston A, Bagby S: Resolving the complex Bordetella pertussis genome using barcoded nanopore sequencing. Microb Genom 2018, 4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Page AJ, Ainsworth EV, Langridge GC: socru: typing of genome-level order and orientation around ribosomal operons in bacteria. Microb Genom 2020, 6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Chen L, Zhao N, Cao J, Liu X, Xu J, Ma Y, Yu Y, Zhang X, Zhang W, Guan X, et al. : Short- and long-read metagenomics expand individualized structural variations in gut microbiomes. Nat Commun 2022, 13:3175. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Amarasinghe SL, Su S, Dong X, Zappia L, Ritchie ME, Gouil Q: Opportunities and challenges in long-read sequencing data analysis. Genome Biol 2020, 21:30. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Roberts RJ, Carneiro MO, Schatz MC: The advantages of SMRT sequencing. Genome Biol 2013, 14:405. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Rang FJ, Kloosterman WP, de Ridder J: From squiggle to basepair: computational approaches for improving nanopore sequencing read accuracy. Genome Biol 2018, 19:90. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Delahaye C, Nicolas J: Sequencing DNA with nanopores: Troubles and biases. PLoS One 2021, 16:e0257521. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Wenger AM, Peluso P, Rowell WJ, Chang P-C, Hall RJ, Concepcion GT, Ebler J, Fungtammasan A, Kolesnikov A, Olson ND, et al. : Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat Biotechnol 2019, 37:1155–1162. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Sereika M, Kirkegaard RH, Karst SM, Michaelsen TY, Sørensen EA, Wollenberg RD, Albertsen M: Oxford Nanopore R10.4 long-read sequencing enables near-perfect bacterial genomes from pure cultures and metagenomes without short-read or reference polishing. bioRxiv 2021, doi: 10.1101/2021.10.27.466057. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Jiang T, Liu Y, Jiang Y, Li J, Gao Y, Cui Z, Liu Y, Liu B, Wang Y: Long-read-based human genomic structural variation detection with cuteSV. Genome Biol 2020, 21:189. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62. Dierckxsens N, Li T, Vermeesch JR, Xie Z: A benchmark of structural variation detection by long reads through a realistic simulated model. Genome Biol 2021, 22:342. •The authors test existing long read structural variant (SV) callers with both simulated long read datasets and simulated SV events, and then develop their own meta SV caller, combiSV, that combines the results from existing methods for improved recall.
- 63.Wick RR, Judd LM, Gorrie CL, Holt KE: Completing bacterial genome assemblies with multiplex MinION sequencing. Microb Genom 2017, 3:e000132. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.De Maio N, Shaw LP, Hubbard A, George S, Sanderson ND, Swann J, Wick R, AbuOun M, Stubberfield E, Hoosdally SJ, et al. : Comparison of long-read sequencing technologies in the hybrid assembly of complex bacterial genomes. Microb Genom 2019, 5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Moss EL, Maghini DG, Bhatt AS: Complete, closed bacterial genomes from microbiomes using nanopore sequencing. Nat Biotechnol 2020, 38:701–707. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.De Ste Croix M, Vacca I, Kwun MJ, Ralph JD, Bentley SD, Haigh R, Croucher NJ, Oggioni MR: Phase-variable methylation and epigenetic regulation by type I restriction-modification systems. FEMS Microbiol Rev 2017, 41:S3–S15. [DOI] [PubMed] [Google Scholar]
- 67.Cerdeño-Tárraga AM, Patrick S, Crossman LC, Blakely G, Abratt V, Lennard N, Poxton I, Duerden B, Harris B, Quail MA, et al. : Extensive DNA inversions in the B. fragilis genome control variable gene expression. Science 2005, 307:1463–1465. [DOI] [PubMed] [Google Scholar]
- 68.Adamczyk-Poplawska M, Lower M, Piekarowicz A: Deletion of one nucleotide within the homonucleotide tract present in the hsdS gene alters the DNA sequence specificity of type I restriction-modification system NgoAV. J Bacteriol 2011, 193:6750–6759. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Barrick JE, Colburn G, Deatherage DE, Traverse CC, Strand MD, Borges JJ, Knoester DB, Reba A, Meyer AG: Identifying structural variation in haploid microbial genomes from short-read resequencing data using breseq. BMC Genomics 2014, 15:1039. [DOI] [PMC free article] [PubMed] [Google Scholar]
