Skip to main content
Nucleic Acids Research logoLink to Nucleic Acids Research
. 2024 Oct 26;52(21):13128–13137. doi: 10.1093/nar/gkae948

Insertion sequence elements and unique symmetrical genomic regions mediate chromosomal inversions in Streptococcus pyogenes

Magnus G Jespersen 1,, Andrew J Hayes 2, Steven Y C Tong 3,4, Mark R Davies 5,
PMCID: PMC11602124  PMID: 39460626

Abstract

Chromosomal inversions are a phenomenon in many bacterial species, often across the axis of replication. Inversions have been shown to alter gene expression, changing persistence of colonisation and infection following environmental stresses. In Streptococcus pyogenes, inversions have been reported. However, frequency and molecular markers of inversions have not been systematically examined. Here, 249 complete S.pyogenes genomes were analysed using a pangenomic core gene synteny framework to identify sequences associated with inversions. 47% of genomes (118/249) contained at least one inversion, from 23 unique inversion locations. Chromosomal locations enabling inversions were usually associated with mobile elements (insertion sequences n = 9 and prophages n = 7). Two insertion sequences, IS1548 and IS1239, accounted for >80% of insertion sequences and were the only insertion sequences associated with inversions. The most observed inversion location (n = 104 genomes, 88% of genomes with an inversion) occurs between two conserved regions encoding rRNAs, tRNAs and sigma factor genes. The regions are symmetrically placed around the origin of replication forming a unique chromosomal structure in S. pyogenes, relative to other streptococci. Cataloging of the chromosomal location and frequency of inversions can direct dissection of phenotypic changes following chromosomal inversions. The framework used here can be transferred to other bacterial species to characterise chromosomal inversions.

Graphical Abstract

Graphical Abstract.

Graphical Abstract

Introduction

With the ever-growing number of publicly available bacterial genomes, chromosomal inversions have emerged as a common component of the dynamic bacterial genome structure for some bacteria, including several pathogens (1–5). Chromosomal inversions are often mediated by either stretches of highly similar sequences or with site specific recognition enzymes that allow for homologous recombination and chromosomal rearrangements (6,7). However, not all inversions are equally viable. One way of grouping chromosomal inversions is into symmetric and asymmetric categories. This is determined relative to the axis formed by the origin of chromosomal replication (oriC) and the terminus (Ter), which is approximately opposite in the genome (8). Studies suggest symmetric or almost symmetric (quasi-symmetric) inversions to be more favourable to selection (2,9). Inversions across the oriC/Ter axis, or between replichores (each half of the chromosome) conserve gene orientation. Additionally, symmetry of inversions conserves genetic distance of genes with respect to the replication axis, and thus the potential copy number variants arising from an unfinished replication of the chromosome during rapid cell growth. Therefore, symmetrical replications across the replication axis maintains potential fine-tuned genomic positioning of genes (1).

Inversions, both symmetric and asymmetric, have also been shown to change gene expression profiles and phenotype characteristics in a range of bacterial species enabling adaptation to stressful environments such as antibiotic and immune pressure (10–15). As such, it’s proposed that inversions likely play a role in both short- and long-term adaptation of bacteria (16). Chromosomal inversions may become fixed in a population and thus allow for reconstruction of phylogenetic relationships of isolates (9,17–19). The size of chromosomal inversions is also seemingly under a selective pressure, with observed inversions being smaller than what would be expected if no selection acted upon them (9). Furthermore, experimental measurement of both induced and naturally occurring inversions in several Gram-positive (12,13,20–25) and Gram-negative (8,9,11,14,17,18) species have revealed that chromosomal inversions occur frequently, with many being deleterious to cell viability and others affecting growth rates (9–14,17,18,20).

Several large-scale symmetric inversions across replichores have been documented in the major human pathogen, S.pyogenes. The first observed large scale inversion occurred between two tRNA, rRNA and alternative sigma factor X (comX/sigX) encoding regions (23). Subsequent studies highlighted that prophages and insertion sequences (ISs) flanked other chromosomal inversions (21,22,26), which have occurred across many clinically dominant S.pyogenes lineages (emm types) including emm3, emm12, emm23, emm28 and emm49 (21–23,26,27). Phenotypic changes resulting from inversions in S.pyogenes are still a topic to be fully explored, but current studies suggest altered growth kinetics and altered transcriptional profiles can occur as a result (24,26). Additional studies discovered an inversion affecting the key virulence gene, streptolysin O, supporting the potential for inversions to alter key virulence determinants amongst other phenotypes (15,25). Most discoveries of chromosomal inversions in S.pyogenes are serendipitous, leaving the full landscape of chromosomal inversions, their associated molecular markers, and frequency to be catalogued. In this study, 249 complete genomes of S.pyogenes were analysed using a pangenome framework that integrates core gene synteny to detect chromosomal inversions. This provides the first systematic characterisation of chromosomal inversions in S. pyogenes, enabling targeted downstream analysis and an increased understanding of this genomic phenomenon.

Materials and methods

Genomes and pangenome

Complete Refseq genomes of S. pyogenes (as of 15th January 2022, n = 249) were downloaded and quality assured for extra chromosomal elements not related to mobile genetic elements, and overall genome structure using socru (28). A pangenome was constructed using Panaroo v.1.2.9 with the clean-mode set to sensitive, percent identity to 98% and percent length to 95%, as previously described (29). Core gene synteny was summarised using Corekaburra v.0.0.4 with a core gene presence of 99% (30,31). To determine unique genome architectures, core gene synteny at 100% prevalence (1280 genes) was used.

Chromosomal inversions

Chromosomal inversions were identified using core gene pairs from Corekaburra (31). Alternative gene pairs across genomes were manually curated to identify uncommon core gene pairs, arising from chromosomal inversions rather than absence of a core gene. Symmetric and asymmetric inversions were defined based on the difference between the left and right offset relative to the origin of replication in the S. pyogenes S119 reference genome (Genbank LR031521) (32). The same method as implemented by Darling et al. (equation 1) was used to calculate a symmetry statistic (9). Log values of this symmetry statistic ≤9 (rounded to two significant figures) were classed as symmetric, which equated to ∼30kb deviation from exact symmetry. All inversions present only on a single replichore were defined as asymmetric.

Regions containing putative chromosomal inversion breakpoints (endpoints) were assessed for the presence of known inversion-associated sequence features (IS elements, tRNA/rRNA regions and prophage) using region alignment and gene content curation between core gene pairs. Associated sequence features were then assigned to each breakpoint where applicable (33,34). Chromosomal inversions were plotted using R and circlize v.0.4.15 (35).

Mobile elements

Prophage insertion sites were mapped based on information provided in papers systematically naming these using blastn and manual inspection (36,37). The presence of prophages was confirmed by an elevated number of accessory genes between core genes predicted by Corekaburra (31) and manual verification. Alignment of prophages and identification of regions with high similarity was conducted using R v.4.1.1, with ggplot2 v.3.4.0 and gggenes v.0.5.0.

ISs were identified using Hidden Markov Models publicly available and Hmmer v.3.3.2 to search a dataset of representative protein sequences from the pangenome (38–41). IS1548 and IS1239 pangenome clusters were identified using blastp v.2.6.0 with representative sequences of the two insertion sequence proteins (42). Plots involving ISs were produced using R v.4.1.1, with circlize v.0.4.15, ggplot2 v.3.4.0 and ggprism v.1.0.3 (35).

Occupancy of insertion sites was inferred from the presence of >10 000 bp of accessory sequence between core genes flanking known insertion sites as determined by Magphi (43).

Chromosomal similarity screen across streptococcal species

To identify regions of genomes with high similarity across streptococcal species, all complete Refseq genomes from S.pneumoniae, S. thermophilus, S. suis, S. agalactiaeand S. dysgalactiae (as per 12th December 2022) were downloaded. These along with S. pyogenes were run through custom scripts utilizing cd-hit v.4.8,1, circlator v.1.5.5 and seqkit v.2.3.1 (44–47). In short, both scripts rotate each genome to start at the first codon of DnaA, divide the genome into two equally sized regions, and segment each chunk into 3 kb sized regions with 500 bp overlap. One script searches for symmetrically placed repeated regions (defined as the corresponding position plus or minus 30 kb) across the two genome segments by iteratively using cd-hit-est-2d with coverage 80% and identity 80% of a 30 kb window moving along each region of the genome with a step size of 10 kb. The second script searches for regions similar to any region from the opposite region of the genome. This is done using cd-hit-est-2d 80% coverage and 80% identity with each region from one genome segment against every region from the other half of the genome separately. The scripts detect both direct (same strand) and inverted (opposing strand) repeats. This was chosen due to the dynamic nature of inversions. In a scenario where a small inversion occurs within a genome or mobile element, strand placement is inverted relative to the remainder of the genome. A repeat in this region may then facilitate a subsequent larger inversion with a now ‘inverted’ repeat across the opposing replichores.

In both scripts, clustered regions are merged if neighbouring on both genome replichores. Both scripts used can be found at: github.com/milnus/Cross_replichore_similarities v.0.0.1.

Statistics and data handling

Statistical tests for percentage of genomes with similar regions across different species was carried out using R and a pairwise Wilcoxon test between S. pneumoniae, S. thermophilus, S. dysgalactiae and S.pyogenes. Results were corrected for multiple testing using Holm's method (48).

Results

S. pyogenes pangenome architecture

To quantify chromosomal inversions in S.pyogenes, a reference pangenome was constructed from 249 complete genomes of S. pyogenes using Panaroo, with core gene synteny summarised using Corekaburra, enabling chromosomal position and size to be defined across genetically diverse genomic backgrounds (29–31). The 249 genomes represent 103 emm types (nemm28= 33, nemm4= 25, nemm1= 21, nemm12= 12, nemm89= 12, others n ≤ 6) and 103 multilocus sequence types. This identified 1390 core genes with informative positional information, enabling a ‘common’ genomic structure to be determined. Deviations from this ‘common’ structure were used to identify chromosomal inversions, their proximate breakpoints and size. There were a total of 25 unique core gene architectures (order of core genes across the chromosome) observed among the 249 complete genome sequences included in the study. The most common genomic architecture was present in 128 isolates with the next most common in 74. Of the 25 syntenies, 15 were found in only a single reference genome.

Chromosomal inversions

Of the 249 genomes analysed, 118 contained at least one chromosomal inversion relative to the most common S.pyogenes genomic architecture. Classifying inversion location, association with molecular markers, and proximate breakpoints, 23 unique inversions were identified (Supplementary Table S1). Molecular markers associated with inversions were identified as ISs (n = 9), prophages (n = 8), regions encoding tRNA, rRNA and comX (n = 4) and a putative toxin–antitoxin system (n = 1) (Figure 1A). For one inversion it was not possible to identify any associated sequences, possibly due to it being a result of multiple independent inversions, being caused by a site-specific recombinase, or through the deletion of the associated sequence after the inversion occurred.

Figure 1.

Figure 1.

Location and size of 23 unique chromosomal inversions within 249 complete S.pyogenes genomes. (A) Representation of chromosomal inversion locations relative to the S119 emm1 genome (Refseq id GCF_900608505.1). Each inversion is represented by a line ordered from outer to inner by longest to shortest. Colours of lines indicate the cause of the inversion, and tRNAs and rRNAs are marked as indicated by the legend. Selected symmetric inversions highlighted in the results section are marked by a square and a triangle. The rRNA/tRNA/comX regions are marked by two yellow rectangles on the outer chromosome circle. (B) Relative size of the S119 genome affected by each inversion (points). The size of points indicates the number of times a given inversion was observed (refer to legend) with colour reflecting the associated markers as per (A).

The frequency, size and position of the inversions across the oriC/Ter axis varied widely. Most of the 23 unique inversions were observed only once across genomes (14/23, 60%), and 20 (20/23, 87%) appeared in five or less genomes. This contrasted with the most common chromosomal inversion which was associated with two rRNA/tRNA/comX regions and occurred in 87 of the 118 inverted genomes. Three of the 23 unique inversions had breakpoints in these two rRNA/tRNA/comX regions. These could be found in 87, 16 and 1 of the 118 genome sequences, with an additional inversion between one rRNA/tRNA/comX region and an alternative rRNA/tRNA encoding region (marked by a square on Figure 1A). Among the 103 represented emm types, 47 had an inversion with breakpoints across these regions, indicating that this does not appear to be a lineage specific adaptation. Other than the rRNA/tRNA/comX associated inversions, an insertion sequence associated inversion was the only event to be observed across multiple emm lineages. The smallest observed inversion was ∼3kb (0.16% of S119 genome size) and the largest ∼915kb (48.2% of S119 genome size) (Figure 1B). Additional characterisation of the inversions associated with mobile genetic elements was performed to further quantify evolutionary restrictions relative to these molecular markers.

Prophages

Prophage elements integrate in a site-specific manner within the S.pyogenes genome, with many sites systematically named in recent studies (36,37,49). To investigate the association between prophage insertion sites and chromosomal inversions, both prophage insertion sites and predicted sites of chromosomal inversions were assessed (36,37). Mapping of inversion breakpoints identified four prophage insertion sites (F, D, H and K) that were over-represented with chromosomal inversions (Figure 2A). These sites were the most frequently occupied full length phage integration sites within the reference genomes assessed, being occupied in between 27% (site D) and 53% (site K) of genomes (Supplementary Figure S1). Of the site pairs, sites F–H, D–M and G–H are located such that in S119 they are approximately symmetric (Supplementary Table S1), while site K has significant offset with its inversion partner phage. This suggests that position of insertion sites, as well as prevalence of phage carriage in a site could both be key to participating in multiple inversions. Expanding on previous observations, the S.pyogenes prophages implicated in chromosomal inversions were found to have regions of high nucleotide similarity (Figure 2B).

Figure 2.

Figure 2.

Multiple prophage insertion regions are implicated in chromosomal inversions in S.pyogenes. (A) Relative position of inversions between prophage insertion regions relative to their position in the circular emm1 S119 genome. Prophage sites are annotated as per McShan et.al 2019, Javan et al. 2019 and Xie et al. 2023 (36,37,49). Each line across the circle’s centre connecting two insertion regions indicate a chromosomal inversion location. (B) Pairwise comparison of prophages associated with chromosomal inversions at prophage insertion sites S–sE and D–M. Regions of high sequence homology are connected across prophages by coloured shading, with nucleotide homology provided. Orange shading relates to putative breakpoints for chromosomal inversions. Grey arrows represent prophage encoded genes with core genes flanking prophages coloured by red arrows. Gene name and locus_tag identifiers are provided according to the SF370 reference genome annotation. Forward and reverse strand of the genome is given by plus and minus signs, respectively, for each region. Scale bars for nucleotide distance are given for each prophage pair. Inversion of S and sE occurred in isolate GCF_004154405, and inversion of D and M occurred in isolate GCF_016549335.

Insertion sequences

ISs are small genetic elements that encode for one, or a small number, of open reading frames, often only related to their own transposition (50,51). Such transposition can lead to duplication of their genetic material across chromosomes and plasmids initiated through a variety of mechanisms (50). Multiple families and subgroups of ISs have been characterised (50). Nine unique chromosomal inversions were found to be associated with ISs, all of which belong to two subgroups, IS1239 (IS30 family) and IS1548 (ISAs1 family) (Figure 3A).

Figure 3.

Figure 3.

IS1239 and IS1528 are common insertions sequences in the S. pyogenes chromosome. (A) Location of IS1548 (middle ring, orange) and IS1239 (inner ring, blue) ISs identified in 249 complete S. pyogenes reference genomes. Position of insertion sites are scaffolded against the S119 reference genome for context. Connected lines illustrate genome regions associated with insertion sequence mediated inversions and are coloured by insertion sequence type involved. (B) Accumulated count of 16 different insertion sequence families as defined by Khedkar et al. 2022 and IS30 elements from ISfinder (38,40) in 249 S.pyogenes genomes. Insertion sequence families representing IS1548 and IS1239 are coloured according to the legend, with other IS types in black. (C) Frequency of IS1239, IS1548 and other ISs in individual S.pyogenes genomes, as coloured by the legend. Arrows indicate genomes that had an inversion detected between IS1239 or IS1548.

An inversion flanked by two IS1548 sequences (triangle on Figure 1A) was the most observed inversion (n = 10) other than the rRNA/tRNA/comX associated inversion (square on Figure 1A). This inversion occurs between IS1548 sequences; one inserted between scpA (C5a peptidase) and lmb (laminin-binding protein) virulence genes, and the other located at a distance of 363 kbp, between SPY_RS01085 and SPY_RS01075 of SF370. This inversion is found across two emm types, emm1 (n = 9 strains) and emm204 (n = 1 strain), indicating an event associated with insertion sequence acquisition beyond a single lineage. The inversion is symmetric around oriC (Supplementary Table S1) and is the only observed inversion likely to mimic the scale and symmetry of the common inversion of the rRNA/tRNA/comX inversion (Figure 1ASupplementary Figure S2).

As IS1239 and IS1548 were the only IS-mediated inversions identified across multiple S. pyogenes lineages, the correlation between the overall number of ISs in the chromosome and inversions was further investigated. The S. pyogenes pangenome was examined for signatures of insertion sequence families using a Hidden Markov Model approach (38–40). In total 1517 ISs were identified across 16 families, of which IS1239 and IS1548 were the most frequently observed 1234/1517 (81.3%) (Figure 3B). The copy number of these two IS ranged from 0 to 21 per S.pyogenes genome (Figure 3C) and displayed very low sequence variation within a genome (97%–100% sequence identity). Genomes with a high copy number of IS1239 and IS1548 elements correlated with a signature of chromosomal inversions relative to low copy number genomes (Figure 3C). While this result is skewed slightly by the phylogenetic distribution of the isolates in this dataset, these two IS were widely distributed throughout multiple independent lineages and genomic locations in the S.pyogenes pangenome (Figure 3A). This increased copy number, likely combined with other factors such as site and orientation of the element insertion, provides an explanation for why the observed chromosomal inversions were largely constrained to these two high copy number insertion sequence subgroups.

Detection of cross-replichore repeat regions across streptococcal species

The most commonly observed inversion occurs between two conserved rRNA/tRNA/comX repetitive regions equidistant of the origin of replication (oriC) (square on Figure 1A). The two regions of rRNA/tRNA/comX are highly conserved across S.pyogenes genomes (Figure 4A). This reflects a possible conserved mechanism of chromosomal inversions, independent of the composition of mobile genetic elements, in S.pyogenes. To expand on whether chromosomal repeat regions may represent a species-specific phenomenon, genome sequences of five different pathogenic and non-pathogenetic streptococci were examined for high similarity genetic regions equidistant from the oriC. Of the species examined, S. pyogenes was the only species to display evidence of highly conserved regions equidistantly located relative to the oriC resulting in symmetrical inversions across the replichores (Figure 4B and C; Supplementary Figure S4). To further explore if this finding was constrained to symmetrically placed regions or a broader feature of the S. pyogenes genome, all regions independent of distance from the oriC were analysed for increased identity (based on a threshold of 80% sequence coverage and 80% sequence identity) (Supplementary Figure S3). These asymmetric regions of similarity were seemingly driven by rRNA encoding regions (Figure 4B). S. thermophilus was the only species that was not statistically different from S.pyogenes in a pairwise comparison of S.pneumoniae,S.dysgalactiae,S. thermophilusand S. pyogenes (Table 1 and Figure 4C; Supplementary Figure S5; Supplementary Table S2). However, the frequency of highly identical regions across the genomes was greater in S.pyogenes than in any other streptococcal species. This observation may in part be related to the differential rRNA distribution across the Streptococcus genus, with rRNA operons in S. agalactiae and S. suis co-localised to a single replichore (Supplementary Figure S6).

Figure 4.

Figure 4.

Repeated sequences across the replichore in streptococci and their importance in S. pyogenes. (A) Mean sequence identity across alignment of the rRNA/tRNA/comX region of S. pyogenes (n = 491 rRNA/tRNA/comX sequences). The horizontal axis is the relative position of the alignment, and the vertical is identity across a 10 bp sliding window. The shaded area marks the rRNA/tRNA/comX region with genes of the region marked above, coloured by function of gene as per the legend on the right. (B) Density of both symmetric and non-symmetric (‘All’) genomic repeated regions (minimum threshold of 3 kb at 80% coverage and 80% identity) across S.thermophilus (n = 84) and S. pyogenes (n = 249) genomes. Lines indicate similar regions across the replication axis and have an opacity of 10%, such that darker regions are shared across more genomes and lighter among less. White dots on the outer ring indicate relative placement of ribosomal RNA regions. (C) Distributions of highly similar regions of the opposing replichore in six streptococcal species (horizontal axis). The percentage of the genome from each streptococcal species with symmetrical and asymmetrical placement, relative to the origin of replication, as given by the colour legend. Each point represents a single genome from the given species; S. agalactiae (n = 136), S. suis (n = 117), S. pneumoniae (n = 141), S. dysgalactiae (n = 21), S. thermophilus (n = 84) and S. pyogenes (n = 249).

Table 1.

Percentages and tests of high identity regions across replichores in streptococcal species

S. pneumoniae
(n = 124)
S. dysgalactiae
(n = 21)
S. thermophilus
(n = 84)
S. pyogenes
(n = 192)
Median
(Q1, Q3)
0.95
(0.85, 1.10)
1.75
(1.60, 1.85)
2.29
(2.00, 2.50)
2.20
(2.00, 2.58)
S. dysgalactiae 9.8x10−8
S. thermophilus <2x10−16 5.1x10−8
S. pyogenes <2x10−16 3.3x10−9 0.99

Top row indicates streptococcal species and number of genomes examined. The second row indicates quantiles for the percentage of high identity regions found asymmetrically placed across replichores of genomes. Remaining rows indicate the Holm corrected P-values of pairwise Wilcoxon two-sided tests between asymmetric percentage distributions across four streptococcal species.

Discussion

Chromosomal inversions, often stochastic, are a mechanism by which bacteria can promote genotypic and phenotypic heterogeneity (10,12,13). Yet, inversions are not easily identifiable due to their association with highly homologous or complex regions in the chromosome. Common approaches to identify chromosomal inversions in some bacteria such as S. pyogenes are based on pairwise alignment of genomes (22,23), yet population level methods to define chromosomal architecture are lacking. In this study, we provide an approach to systematically catalogue the frequency and location of chromosomal inversions in complete genomes using a pangenome guided approach with integrated core gene synteny. Analysing 249 complete genomes of S. pyogenes, evidence of chromosomal inversions was detectable in ∼50% of S. pyogenes reference genomes. Over 23 chromosomal regions associated with inversion breakpoints are summarised in this study, expanding on the five events identified in a previous analysis using 24 reference genomes (54). These inversions correlated with the presence of conserved core chromosomal features, prophages and high-copy number ISs, as has been described for other species (11,13,14,21–23,26,52). We hypothesise that genomic inversions may be a conserved mechanism to generate genotypic, and potentially phenotypic, diversity in S. pyogenes.

Chromosomal inversions are often associated with repetitive regions across a genome, with symmetry across the replichore one of the major factors associated with inversion likelihood (2,4,5,9). This is supported in S. pyogenes, with the most commonly observed inversion occurring between two conserved rRNA/tRNA/comX regions. Located equidistant from the origin of replication (oriC) this results in a symmetrical inversion. The mirrored large homologous regions (∼6.5kb) across oriC appears to be a unique genomic feature of S. pyogenes relative to other streptococcal species and may play a role as an inversion hotspot. In a broader context, we observe that the distribution of rRNA genes in S.pyogenes contrasts that of S. agalactiae and S. suis, which encode all rRNA regions on one replichore. The placement of rRNA regions on both replichores and the participation of both regions in the inversion could help balance rRNA distribution and placement on the chromosome (53).

Systematically exploring the landscape of chromosomal inversions in S. pyogenes has also enabled the characterisation of predictive molecular markers. We observed an over-representation of insertion sequence subgroups IS1239 and IS1548 associated with chromosomal inversions. IS1239 and IS1548 have both previously been associated with inversions in S. pyogenes (21,26,54). IS1239 has been observed to insert upstream of streptococcal superantigen in S. pyogenes and be present in multiple copies across a range of emm types (55,56). The chromosomal insertion of IS1548 between the virulence genes lmb and scpA has been identified in S. pyogenes and S.agalactiae (lmb and scpB) genomes (57,58). In S.pyogenes the insertion of the IS1548 element in this region can facilitate an inversion that moves the position of the Mga locus (responsible for the production of the M protein, a major surface antigen used to type S.pyogenes) to the opposing replichore. In general, the movement of genes from one replichore to another could alter the three-dimensional location of genes in the chromosome resulting in differing expression of genes and phenotype. In S.agalactiae the IS1548 between lmb and scpB could potentiate chromosomal inversions, but none have yet been reported. The occurrence of such an inversion seems likely with the findings observed for S.pyogenes. Therefore, our method can extend knowledge across species barriers. This is further exemplified by findings from Bordetella pertussis in which high frequency ISs were associated with chromosomal inversions, similar to the phenomenon observed in S. pyogenes (52).

One limitation of our pangenome based approach to detect chromosomal inversions is the requirement for core gene locations. The method is therefore unable to detect inversions not affecting at least two core genes in the database being analysed. As a result, inversions only affecting accessory (non-core) genes will not be detected. Examples of this are inversions limited to within mobile genetic elements or other variably encoded genes. This can be mitigated on a study-by-study basis by sub-setting data by shared accessory gene content, to increase ‘core gene’ numbers. In addition, the dynamic nature of some chromosomal inversions can result in multiple chromosomal orientations in a population that are not represented by the single consensus genome assembly. This was reported in a recent study where polymerase chain reaction (PCR) and long-read sequencing supported the presence of multiple chromosomal orientations in a single S. pyogenes M12 culture (59). While not a standard process when generating reference genomes, the frequency of dynamic inversions should be monitored and catalogued when generating consensus assemblies.

Another limitation of existing chromosomal inversion approaches is the inability to detect chromosomal inversions from short read sequencing and associated ‘draft’ genome assemblies. This is primarily due to an inability to resolve large regions with high sequence identity (60), which is a feature of breakpoints in many chromosomal inversions. Bridging repetitive chromosomal regions of high identity is key to consistently identify chromosomal inversions. This method has already been deployed with long read sequencing for L. lactis after unsuccessful elucidation of chromosomal inversions using short read sequencing (10,13). As increasingly longer sequencing technologies become commonplace in bacterial genomic studies, chromosomal inversions will become easier to detect and apply on a population level.

To date there has been limited phenotypic characterization of changes that may arise from chromosomal inversions in S. pyogenes. A challenge to phenotypic testing is the dynamic nature of some chromosomal inversions which may result in mixed populations in culture. This prevents simple phenotypic characterisation of a single chromosomal conformation, with removal or alteration of the regions facilitating inversion required to lock a conformation in place. Several studies have identified that chromosomal inversions in S. pyogenes can alter gene expression networks and population competition in vitro (24,26). However, these phenotypic studies have been restricted to large scale asymmetric inversions across the replication axis, which are likely representative of rare inversions. This emphasises the need for more common and potentially biologically relevant inversions to be studied in depth.

The rapid implementation of long-read sequencing in population genomic studies will expediate the ability to detect and track bacterial chromosomal inversions. Our approach is not limited by species and can easily be applied to other bacteria with complete genomes available. The integration of such approaches will lead to a better understanding of the underlying evolutionary pressures, mechanisms and phylogenetic importance of chromosomal inversions in bacterial population biology.

Supplementary Material

gkae948_Supplemental_Files

Acknowledgements

We thank Dr Ouli Xie for critical reading and feedback on the manuscript.

Contributor Information

Magnus G Jespersen, Department of Microbiology and Immunology, The University of Melbourne, at the Peter Doherty Institute for Infection and Immunity, 792 Elizabeth Street, Melbourne, Victoria 3000, Australia.

Andrew J Hayes, Department of Microbiology and Immunology, The University of Melbourne, at the Peter Doherty Institute for Infection and Immunity, 792 Elizabeth Street, Melbourne, Victoria 3000, Australia.

Steven Y C Tong, Department of Infectious Diseases, The University of Melbourne, at the Peter Doherty Institute for Infection and Immunity, 792 Elizabeth Street, Melbourne, Victoria 3000, Australia; Victorian Infectious Diseases Service, The Royal Melbourne Hospital, at the Peter Doherty Institute for Infection and Immunity, 792 Elizabeth Street, Melbourne, Victoria 3000, Australia.

Mark R Davies, Department of Microbiology and Immunology, The University of Melbourne, at the Peter Doherty Institute for Infection and Immunity, 792 Elizabeth Street, Melbourne, Victoria 3000, Australia.

Data availability

List of complete genomes used throughout the study can be found in Supplementary Table S2. Software used for analyses and illustration of Figure 4B, Supplementary Figure S4, and Supplementary Figure S5 can be found at: github.com/milnus/Cross_replichore_similarities v.0.0.1. We also have the relevant code already available at Zenodo: https://doi.org/10.5281/zenodo.10159646.

Supplementary data

Supplementary Data are available at NAR Online.

Funding

M.G.J was supported by a Research Scholarship from The University of Melbourne. Funding for open access charge: The University of Melbourne.

Conflict of interest statement. None declared.

References

  • 1. Eisen J.A., Heidelberg J.F., White O., Salzberg S.L.. Evidence for symmetric chromosomal inversions around the replication origin in bacteria. Genome Biol. 2000; 1:RESEARCH0011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Repar J., Warnecke T.. Non-random inversion landscapes in prokaryotic genomes are shaped by heterogeneous selection pressures. Mol. Biol. Evol. 2017; 34:1902–1911. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. D’Iorio M., Dewar K.. Replication-associated inversions are the dominant form of bacterial chromosome structural variation. Life Sci. Alliance. 2023; 6:e202201434. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Mackiewicz P., Mackiewicz D., Kowalczuk M., Cebrat S.. Flip-flop around the origin and terminus of replication in prokaryotic genomes. Genome Biol. 2001; 2:INTERACTIONS1004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Makino S., Suzuki M.. Bacterial genomic reorganization upon DNA replication. Science. 2001; 292:803. [DOI] [PubMed] [Google Scholar]
  • 6. West P.T., Chanin R.B., Bhatt A.S.. From genome structure to function: insights into structural variation in microbiology. Curr. Opin. Microbiol. 2022; 69:102192. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Noureen M., Tada I., Kawashima T., Arita M.. Rearrangement analysis of multiple bacterial genomes. BMC Bioinf. 2019; 20:631. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Nzabarushimana E., Tang H.. Insertion sequence elements-mediated structural variations in bacterial genomes. Mob. DNA. 2018; 9:29. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Darling A.E., Miklós I., Ragan M.A.. Dynamics of genome rearrangement in bacterial populations. PLoS Genet. 2008; 4:e1000128. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Cui L., Neoh H., Iwamoto A., Hiramatsu K.. Coordinated phenotype switching with large-scale chromosome flip-flop inversion observed in bacteria. Proc. Natl. Acad. Sci. U.S.A. 2012; 109:E1647–E1656. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Irvine S., Bunk B., Bayes H.K., Spröer C., Connolly J.P.R., Six A., Evans T.J., Roe A.J., Overmann J., Walker D.. Genomic and transcriptomic characterization of Pseudomonas aeruginosa small colony variants derived from a chronic infection model. Microb Genom. 2019; 5:e000262. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Guérillot R., Kostoulias X., Donovan L., Li L., Carter G.P., Hachani A., Vandelannoote K., Giulieri S., Monk I.R., Kunimoto M.et al.. Unstable chromosome rearrangements in Staphylococcus aureus cause phenotype switching associated with persistent infections. Proc. Natl. Acad. Sci. U.S.A. 2019; 116:20135–20140. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Kojic M., Jovcic B., Miljkovic M., Novovic K., Begovic J., Studholme D.J.. Large-scale chromosome flip-flop reversible inversion mediates phenotypic switching of expression of antibiotic resistance in lactococci. Microbiol. Res. 2020; 241:126583. [DOI] [PubMed] [Google Scholar]
  • 14. Fitzgerald S.F., Lupolova N., Shaaban S., Dallman T.J., Greig D., Allison L., Tongue S.C., Evans J., Henry M.K., McNeilly T.N.et al.. Genome structural variation in Escherichia coli O157:H7. Microb Genom. 2021; 7:000682. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Savic D.J., Ferretti J.J.. Novel genomic rearrangement that affects expression of the Streptococcus pyogenes streptolysin O (slo) gene. J. Bacteriol. 2003; 185:1857–1869. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Merrikh C.N., Merrikh H.. Gene inversion potentiates bacterial evolvability and virulence. Nat. Commun. 2018; 9:4662. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Luhmann N., Doerr D., Chauve C.. Comparative scaffolding and gap filling of ancient bacterial genomes applied to two ancient Yersinia pestis genomes. Microb. Genom. 2017; 3:e000123. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Bochkareva O.O., Dranenko N.O., Ocheredko E.S., Kanevsky G.M., Lozinsky Y.N., Khalaycheva V.A., Artamonova I.I., Gelfand M.S.. Genome rearrangements and phylogeny reconstruction in Yersinia pestis. PeerJ. 2018; 6:e4545. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Clark C., Jonušas J., Mitchell J.D., Francis A.. An algebraic model for inversion and deletion in bacterial genome rearrangement. J. Math. Biol. 2023; 87:34. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Campo N., Dias M.J., Daveran-Mingot M.-L., Ritzenthaler P., Le Bourgeois P.. Chromosomal constraints in gram-positive bacteria revealed by artificial inversions. Mol. Microbiol. 2004; 51:511–522. [DOI] [PubMed] [Google Scholar]
  • 21. Tse H., Bao J.Y.J., Davies M.R., Maamary P., Tsoi H.-W., Tong A.H.Y., Ho T.C.C., Lin C.-H., Gillen C.M., Barnett T.C.et al.. Molecular characterization of the 2011 Hong Kong scarlet fever outbreak. J. Infect. Dis. 2012; 206:341–351. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Bao Y., Liang Z., Booyjzsen C., Mayfield J.A., Li Y., Lee S.W., Ploplis V.A., Song H., Castellino F.J.. Unique genomic arrangements in an invasive serotype M23 strain of Streptococcus pyogenes identify genes that induce hypervirulence. J. Bacteriol. 2014; 196:4089–4102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Nakagawa I., Kurokawa K., Yamashita A., Nakata M., Tomiyasu Y., Okahashi N., Kawabata S., Yamazaki K., Shiba T., Yasunaga T.et al.. Genome sequence of an M3 strain of Streptococcus pyogenes reveals a large-scale genomic rearrangement in invasive strains and new insights into phage evolution. Genome Res. 2003; 13:1042–1055. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Savic D.J., Nguyen S.V., McCullor K., McShan W.M.. Biological impact of a large-scale genomic inversion that grossly disrupts the relative positions of the origin and terminus loci of the Streptococcus pyogenes chromosome. J. Bacteriol. 2019; 201:e00090–e19. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Savic D.J., Ferretti J.J.. Evidence for a site specific genomic rearrangement in the slo region of Streptococcus pyogenes. Adv. Exp. Med. Biol. 1997; 418:983–985. [DOI] [PubMed] [Google Scholar]
  • 26. Bao Y.-J., Liang Z., Mayfield J.A., McShan W.M., Lee S.W., Ploplis V.A., Castellino F.J.. Novel genomic rearrangements mediated by multiple genetic elements in Streptococcus pyogenes M23ND confer potential for evolutionary persistence. Microbiology (Reading). 2016; 162:1346–1359. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Longo M., De Jode M., Plainvert C., Weckel A., Hua A., Château A., Glaser P., Poyart C., Fouet A.. Complete genome sequence of Streptococcus pyogenes emm28 clinical isolate M28PF1, responsible for a puerperal fever. Genome Announc. 2015; 3:e00750-15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Page A.J., Ainsworth E.V., Langridge G.C.. socru: typing of genome-level order and orientation around ribosomal operons in bacteria. Microb. Genom. 2020; 6:mgen000396. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Jespersen M.G., Hayes A.J., Tong S.Y.C., Davies M.R.. Pangenome evaluation of gene essentiality in Streptococcus pyogenes. Microbiol. Spectr. 2024; 12:e03240-23. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Tonkin-Hill G., MacAlasdair N., Ruis C., Weimann A., Horesh G., Lees J.A., Gladstone R.A., Lo S., Beaudoin C., Floto R.A.et al.. Producing polished prokaryotic pangenomes with the Panaroo pipeline. Genome Biol. 2020; 21:180. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Jespersen M.G., Hayes A., Davies M.R.. Corekaburra: pan-genome post-processing using core gene synteny. J. Open Source Software. 2022; 7:4910. [Google Scholar]
  • 32. Rosinski-Chupin I., Sauvage E., Fouet A., Poyart C., Glaser P.. Conserved and specific features of Streptococcus pyogenes and Streptococcus agalactiae transcriptional landscapes. Bmc Genomics. 2019; 20:236. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. Sullivan M.J., Petty N.K., Beatson S.A.. Easyfig: a genome comparison visualizer. Bioinformatics. 2011; 27:1009–1010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34. Sievers F., Higgins D.G.. Clustal Omega for making accurate alignments of many protein sequences. Protein Sci. 2018; 27:135–145. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35. Gu Z., Gu L., Eils R., Schlesner M., Brors B.. Circlize implements and enhances circular visualization in R. Bioinformatics. 2014; 30:2811–2812. [DOI] [PubMed] [Google Scholar]
  • 36. Rezaei Javan R., Ramos-Sevillano E., Akter A., Brown J., Brueggemann A.B.. Prophages and satellite prophages are widespread in Streptococcus and may play a role in pneumococcal pathogenesis. Nat. Commun. 2019; 10:4852. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. McShan W.M., McCullor K.A., Nguyen S.V.. The bacteriophages of Streptococcus pyogenes. Microbiol. Spectr. 2019; 7: [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. Siguier P., Varani A., Perochon J., Chandler M.. Exploring bacterial insertion sequences with ISfinder: objectives, uses, and future developments. Methods Mol. Biol. 2012; 859:91–103. [DOI] [PubMed] [Google Scholar]
  • 39. Siguier P., Perochon J., Lestrade L., Mahillon J., Chandler M.. ISfinder: the reference centre for bacterial insertion sequences. Nucleic Acids Res. 2006; 34:D32–D36. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. Khedkar S., Smyshlyaev G., Letunic I., Maistrenko O.M., Coelho L.P., Orakov A., Forslund S.K., Hildebrand F., Luetge M., Schmidt T.S.B.et al.. Landscape of mobile genetic elements and their antibiotic resistance cargo in prokaryotic genomes. Nucleic Acids Res. 2022; 50:3155–3168. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41. Meng X., Ji Y.. Modern computational techniques for the HMMER sequence analysis. ISRN Bioinform. 2013; 2013:252183. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42. Altschul S.F., Gish W., Miller W., Myers E.W., Lipman D.J.. Basic local alignment search tool. J. Mol. Biol. 1990; 215:403–410. [DOI] [PubMed] [Google Scholar]
  • 43. Jespersen M.G., Hayes A., Davies M.R.. Magphi: sequence extraction tool from FASTA and GFF3 files using seed pairs. J. Open Source Software. 2022; 7:4369. [Google Scholar]
  • 44. Fu L., Niu B., Zhu Z., Wu S., Li W.. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics. 2012; 28:3150–3152. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45. Li W., Godzik A.. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics. 2006; 22:1658–1659. [DOI] [PubMed] [Google Scholar]
  • 46. Shen W., Le S., Li Y., Hu F.. SeqKit: a cross-platform and ultrafast toolkit for FASTA/Q file manipulation. PLoS One. 2016; 11:e0163962. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47. Hunt M., Silva N.D., Otto T.D., Parkhill J., Keane J.A., Harris S.R.. Circlator: automated circularization of genome assemblies using long sequencing reads. Genome Biol. 2015; 16:294. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48. Holm S. A simple sequentially rejective multiple test procedure. Scand. J. Stat. 1979; 6:65–70. [Google Scholar]
  • 49. Xie O., Morris J.M., Hayes A.J., Towers R.J., Jespersen M.G., Lees J.A., Zakour N.L.B., Berking O., Baines S.L., Carter G.P.et al.. Inter-species gene flow drives ongoing evolution of Streptococcus pyogenes and Streptococcus dysgalactiae subsp. equisimilis. Nat. Commun. 2024; 15:2286. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50. Siguier P., Gourbeyre E., Varani A., Ton-Hoang B., Chandler M.. Everyman's guide to bacterial insertion sequences. Microbiol. Spectr. 2015; 3:MDNA3-0030-2014. [DOI] [PubMed] [Google Scholar]
  • 51. Siguier P., Gourbeyre E., Chandler M.. Bacterial insertion sequences: their genomic impact and diversity. FEMS Microbiol. Rev. 2014; 38:865–891. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52. Weigand M.R., Peng Y., Batra D., Burroughs M., Davis J.K., Knipe K., Loparev V.N., Johnson T., Juieng P., Rowe L.A.et al.. Conserved patterns of symmetric inversion in the genome evolution of Bordetella respiratory pathogens. Msystems. 2019; 4:e00702-19. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53. Tillier E.R.M., Collins R.A.. Genome rearrangement by replication-directed translocation. Nat. Genet. 2000; 26:195–197. [DOI] [PubMed] [Google Scholar]
  • 54. Bessen D.E., McShan W.M., Nguyen S.V., Shetty A., Agrawal S., Tettelin H.. Molecular epidemiology and genomics of group A Streptococcus. Infect. Genet. Evol. 2015; 33:393–418. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55. Kapur V., Reda K.B., Li L.L., Ho L.J., Rich R.R., Musser J.M.. Characterization and distribution of insertion sequence IS1239 in Streptococcus pyogenes. Gene. 1994; 150:135–140. [DOI] [PubMed] [Google Scholar]
  • 56. Reda K.B., Kapur V., Goela D., Lamphear J.G., Musser J.M., Rich R R.. Phylogenetic distribution of streptococcal superantigen SSA allelic variants provides evidence for horizontal transfer of ssa within Streptococcus pyogenes. Infect. Immun. 1996; 64:1161–1165. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57. Granlund M., Oberg L., Sellin M., Norgren M.. Identification of a novel insertion element, IS1548, in group B streptococci, predominantly in strains causing endocarditis. J. Infect. Dis. 1998; 177:967–976. [DOI] [PubMed] [Google Scholar]
  • 58. Al Safadi R., Amor S., Hery-Arnaud G., Spellerberg B., Lanotte P., Mereghetti L., Gannier F., Quentin R., Rosenau A.. Enhanced expression of lmb gene encoding laminin-binding protein in Streptococcus agalactiae strains harboring IS1548 in scpB-lmb intergenic region. PLoS One. 2010; 5:e10794. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59. Walker M.J., Brouwer S., Forde B.M., Worthing K.A., McIntyre L., Sundac L., Maloney S., Roberts L.W., Barnett T.C., Richter J.et al.. Detection of epidemic scarlet fever group A Streptococcus in Australia. Clin. Infect. Dis. 2019; 69:1232–1234. [DOI] [PubMed] [Google Scholar]
  • 60. Adams M.D., Bishop B., Wright M.S.. Quantitative assessment of insertion sequence impact on bacterial genome architecture. Microb. Genom. 2016; 2:e000062. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

gkae948_Supplemental_Files

Data Availability Statement

List of complete genomes used throughout the study can be found in Supplementary Table S2. Software used for analyses and illustration of Figure 4B, Supplementary Figure S4, and Supplementary Figure S5 can be found at: github.com/milnus/Cross_replichore_similarities v.0.0.1. We also have the relevant code already available at Zenodo: https://doi.org/10.5281/zenodo.10159646.


Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press

RESOURCES