Skip to main content
mBio logoLink to mBio
. 2016 May 31;7(3):e00403-16. doi: 10.1128/mBio.00403-16

Transcriptome Remodeling Contributes to Epidemic Disease Caused by the Human Pathogen Streptococcus pyogenes

Stephen B Beres a, Priyanka Kachroo a, Waleed Nasser a, Randall J Olsen a,b, Luchang Zhu a, Anthony R Flores a,c, Ivan de la Riva a, Jesus Paez-Mayorga a, Francisco E Jimenez a, Concepcion Cantu a, Jaana Vuopio d,e, Jari Jalava e, Karl G Kristinsson f,g, Magnus Gottfredsson f,g, Jukka Corander h, Nahuel Fittipaldi i, Maria Chiara Di Luca j, Dezemona Petrelli k, Luca A Vitali j, Annessa Raiford l, Leslie Jenkins l, James M Musser a,b,
PMCID: PMC4895104  PMID: 27247229

ABSTRACT

For over a century, a fundamental objective in infection biology research has been to understand the molecular processes contributing to the origin and perpetuation of epidemics. Divergent hypotheses have emerged concerning the extent to which environmental events or pathogen evolution dominates in these processes. Remarkably few studies bear on this important issue. Based on population pathogenomic analysis of 1,200 Streptococcus pyogenes type emm89 infection isolates, we report that a series of horizontal gene transfer events produced a new pathogenic genotype with increased ability to cause infection, leading to an epidemic wave of disease on at least two continents. In the aggregate, these and other genetic changes substantially remodeled the transcriptomes of the evolved progeny, causing extensive differential expression of virulence genes and altered pathogen-host interaction, including enhanced immune evasion. Our findings delineate the precise molecular genetic changes that occurred and enhance our understanding of the evolutionary processes that contribute to the emergence and persistence of epidemically successful pathogen clones. The data have significant implications for understanding bacterial epidemics and for translational research efforts to blunt their detrimental effects.

IMPORTANCE

The confluence of studies of molecular events underlying pathogen strain emergence, evolutionary genetic processes mediating altered virulence, and epidemics is in its infancy. Although understanding these events is necessary to develop new or improved strategies to protect health, surprisingly few studies have addressed this issue, in particular, at the comprehensive population genomic level. Herein we establish that substantial remodeling of the transcriptome of the human-specific pathogen Streptococcus pyogenes by horizontal gene flow and other evolutionary genetic changes is a central factor in precipitating and perpetuating epidemic disease. The data unambiguously show that the key outcome of these molecular events is evolution of a new, more virulent pathogenic genotype. Our findings provide new understanding of epidemic disease.

INTRODUCTION

Genetic diversity begets phenotype variation and with it the possibility of a different life. Considerable effort has been expended in the last 40 years to understand the genetic diversity and population structure of many bacterial pathogens, especially those that detrimentally affect human and livestock health and cause epidemics (127). These studies have led to the general concept that some bacterial species are clonal, with relatively little evidence that horizontal gene transfer (HGT) and recombination shape species diversity, whereas other bacterial pathogens are highly recombinogenic, with species diversity mediated by extensive HGT events (127). Genetic studies have been greatly facilitated in recent years by relatively inexpensive large-scale comparative DNA sequencing, which now makes it possible to precisely delineate the nature and extent of genomic variation present in large populations (hundreds to many thousands) of individual pathogenic bacterial species (4, 5, 1018, 2326). For example, analyses of important pathogens such as Staphylococcus aureus, Streptococcus pyogenes, Streptococcus pneumoniae, Escherichia coli, Salmonella enterica serovars, and Legionella pneumophila have been conducted, resulting in much new information about genetic variation in these and other species (4, 5, 1018, 2326, 2835).

In parallel with studies of bacterial population genetic structure, there has been interest in identifying the precise genomic changes that contribute to the emergence, numerical success, and epidemic behavior of members of some bacterial species. A major effort has been devoted to analysis of comprehensive, population-based samples of the strict human pathogen S. pyogenes (commonly, group A streptococcus [GAS]) as a model pathogen (2835). S. pyogenes is endemic in humans worldwide and periodically causes epidemics of superficial (e.g., pharyngitis and impetigo) and invasive (e.g., necrotizing fasciitis, pneumonia, myositis) infections. Globally, the organism causes an estimated 711 million human infections and over 500,000 deaths annually (36). The species is genetically diverse, with more than 240 emm-types (typing based on sequence differences in the hypervariable amino-terminal portion of the emm gene encoding the antiphagocytic Emm virulence protein; http://www.cdc.gov/abcs/index.html), and approximately 650 multilocus sequence types (MLSTs) (http://spyogenes.mlst.net) described.

In the early 1980s, a dramatic increase in the frequency and severity of infections caused by S. pyogenes led to the recognition of a global pandemic caused by emm1 strains (3744). This pandemic afforded the opportunity to compare preepidemic and epidemic strains for potential bacterial factors contributing to this global health problem. To gain insight into the emergence, dissemination, and diversification of emm1 strains causing this pandemic, we sequenced the genome of 3,615 emm1 infection isolates (32). Phylogenetic analyses revealed that the pandemic emm1 strains that emerged are a genetically closely related clonal population that evolved from its most recent preepidemic progenitor in the early 1980s. The key genetic event underpinning the pandemic was acquisition by HGT and recombinational replacement of a 36 kb segment of the S. pyogenes core chromosome (i.e., that portion of the chromosome/genome that is largely conserved across emm-types and not present on obvious mobile genetic elements such as phages and integrative-conjugative elements) that mediated enhanced production of toxins NAD+ glycohydrolase (SPN [S. pyogenes NADase]) and streptolysin O (SLO) (32). A subsequent study (35) showed that the striking upregulation of SPN and SLO production and altered virulence phenotype by members of the pandemic clone occurred as a consequence of only three single nucleotide polymorphisms (SNPs). Two are located in the −35 to −10 spacer region of the promoter sequence upstream of the nga-ifs-slo transcriptional unit and resulted in increased gene expression. The third, a nonsynonymous SNP in the nga gene, increases the activity of SPN, a secreted cytotoxin virulence factor (45). Additional evidence supporting the notion of upregulation of SPN and SLO as a contributing cause of S. pyogenes epidemic disease was found by sequence analysis of 1,125 emm89 genomes (35) obtained in comprehensive population-based surveillance studies conducted in the United States, Finland, and Iceland between 1995 and 2013. Among these emm89 strains, we identified three distinct phylogenetic clades (designated clade 1, clade 2, and clade 3). The current worldwide recent increase in the incidence of emm89 invasive infections corresponded temporally with the emergence and expansion of clade 3 strains upregulated in SPN and SLO production (35, 46).

Thus, progress is being made in understanding genomic alterations that are linked with increases in disease frequency and severity in some human pathogens. However, despite these advances, very little analogous work has been conducted to investigate global changes in gene expression that may contribute to the origin and perpetuation of bacterial epidemics. Similarly, there is a general lack of studies linking genome variation, transcriptional changes, and altered virulence in epidemic forms. The primary goal of this investigation was to study how genome variation linked with changes in transcriptome and altered virulence might contribute to the origin and perpetuation of bacterial epidemics, using the ongoing S. pyogenes emm89 epidemic as a convenient model system. We used comparative pathogenomics to dissect the precise molecular genetic events that have mediated the evolutionary origin and diversification of the epidemic emm89 strains. Unexpectedly, we found that a high frequency of HGT events has shaped the emm89 population genetic structure to a far greater extent than vertically inherited SNPs and short insertions and deletions (indels). Three main mechanisms that mediate HGT in bacteria have been described: conjugation, transduction, and transformation. Although S. pyogenes is not considered to be naturally competent, analysis of MLST data found S. pyogenes to have a level of recombination comparable to that of Streptococcus pneumoniae, a species that is naturally competent (47). The mechanism mediating the relatively high level of recombination detected in GAS is not known, but, given the prevalence of phage in GAS genomes, generalized transduction may play an important role. Global transcriptome sequencing (RNAseq) analysis was conducted on genetically representative preepidemic and epidemic emm89 strains to determine the extent to which the genomic changes causing altered gene expression may have contributed to the epidemic. We found that HGT is extensive in the emm89 population and has contributed disproportionately to the diversification of virulence factors and their expression. Nonsynonymous SNPs in major regulatory genes and other modest genetic changes have also led to transcriptome remodeling intimately linked with the origination and perpetuation of the epidemic. The results have significant implications for understanding epidemic bacterial disease and for translational research efforts designed to control or limit the detrimental effect of infectious agents. The overall strategy used as described here is of general utility and pertinence to the investigation of other pathogens.

RESULTS AND DISCUSSION

Population genetic structure and contribution of horizontal gene transfer (HGT).

We studied 1,200 emm89 S. pyogenes strains, virtually all (n = 1,198) cultured from patients with invasive infections that occurred between 1995 and 2014 (Fig. 1; see also Table S1 in the supplemental material). The great majority of strains (n = 1,180) were collected as part of comprehensive population-based studies conducted in the United States, Finland, and Iceland. The genomes of all 1,200 strains were sequenced to a mean 60-fold depth of coverage (range, 13-fold to 440-fold) using an Illumina paired-end strategy, and polymorphisms were identified. Inference of genetic relationships using core chromosomal SNPs revealed that these emm89 strains have a major population of 1,193 strains and a minor population of 7 substantially divergent genetic outlier strains (Fig. 2A). Bayesian clustering showed the major emm89 population of 1,193 strains to comprise 3 primary clades (Fig. 2B). The genomes of 3 strains representing the genetic backgrounds of organisms assigned to the three primary clades (MGAS11027 of clade 1, MGAS23530 of clade 2, and MGAS27061 of clade 3) were closed and annotated (see Fig. S1). The epidemiological information available for the 1,200 strains revealed that clade 3 strains emerged and expanded rapidly in the United States, Finland, and Iceland, displacing the corresponding predecessor clade 1 and 2 strains in the populations studied (Fig. 1). These findings are consistent with the preliminary data that we recently reported (35).

FIG 1 .

FIG 1 

Temporal and geographic distribution of the emm89 strain cohort. Shown is the temporal distribution of the emm89 strains by clade. The inset shows the geographic distribution of the isolates by country. The colored horizontal bars at the bottom of the figure show the temporal distribution of the strains by country. A single isolate from Italy is not illustrated. The reduced numbers of cases in 2014 are due to United States isolates not being available for study rather than to a decline in the frequency of infections. Clade 3 strains emerged in 2003 and expanded greatly in number, displacing the predecessor clade 1 and 2 strain samples studied in all 3 populations (United States, Finland, and Iceland).

FIG 2 .

FIG 2 

Genetic relationships among emm89 strains. Genetic relationships were inferred by the neighbor-joining method based on concatenated core chromosomal SNP data using SplitsTree. (A) Genetic relationships based on 28,425 SNPs identified among the members of the entire population of 1,200 emm89 strains. (B) Genetic relationships based on 11,846 SNPs identified among the 1,193 major population strains. Isolates are colored by cluster as determined using Bayesian analysis of population structure (BAPS) as indicated in the hierarchy below the figure. Three major clades (C1, C2, and C3) are defined at the first level of clustering. Subclade 3D (SC3D), a recently emerged and expanding population of strains in Finland, is defined at the second level of clustering. Indicated for the inferred phylogenies are the mean genetic distances (MGDs), both inter- and intraclade, measured as differences in core chromosomal SNPs. The mean genetic distance among strains within clades is less than the MGD to strains of the nearest neighboring clade(s). Bootstrap analysis with 100 iterations gives 100% confidence for all of the clade-to-clade branches (i.e., C1-C2, C2-C3, and C3-SC3D). (C) Genetic relationships based on 8,989 SNPs identified among the major population of 1,193 strains, filtered to exclude horizontally acquired sites as inferred using Gubbins. Exclusion of sites attributed to horizontal gene transfer events collapses the MGD strain-to-strain both within and between the clades. The MGD within the clades remains less than the MGD to the nearest neighboring clade(s). Trees in panels B and C are illustrated at the same scale.

The emm89 population genomic data revealed an unprecedented level of genetic diversity for strains of a single S. pyogenes emm-type. Comparison of the emm89 genome sequences with data available for 37 S. pyogenes genomes of 18 other emm-types (see Table S2 in the supplemental material) showed that the emm89 strains are the only emm-type to have two deeply rooted branches in the phylogenetic network (Fig. 3). Although we found evidence of recombination within the emm89 population, the random distribution of SNPs and the lack of sequence identity of the 7 minor population emm89 outlier strains with sequences of another GAS emm-type or MLST argue that these strains have not arisen through emm-type switching.

FIG 3 .

FIG 3 

Genetic relationships between strains of various Emm/M protein serotypes. Genetic relationships were inferred among 49 GAS strains of 20 M types based on 75,184 concatenated core chromosomal SNPs by the neighbor network method. The analysis is based on 42 closed genomes and 7 whole-genome-sequenced emm89 genetic outlier strains (indicated in italics). The MGD interserotype consists of 16,340 SNPs. emm89 strains are the only emm-type strains with two distinct lineages (L1 and L2) in the interserotype network. The MGD of 14,247 SNPs between the emm89 L1 and L2 genomes is greater than the MGD of 11,548 SNPs among the serotype M5, M6, M18, and M23 genomes. Of note, the emm89 L1 to L2 MGD is greater than the emm89 L1 to M53 genome MGD of 14,194 SNPs.

We identified extensive genomic diversity between and within the three primary emm89 clades. The mean genetic distance (MGD) among the 1,193 strains of the 3 clades was 610 SNPs in the core genome (Fig. 2B). In striking contrast, among 3,615 emm1 strains collected in 8 countries on two continents over 45 years (i.e., a collection 3 times larger, from a broader geographic region, and a period 2.5 times longer than those used for analysis of the emm89 sample), the MGD was only 106 core SNPs (32).

There was a nonrandom distribution of SNPs throughout the emm89 core genomes. Multiple regions had elevated SNP density, indicating HGT and core genomes with a mosaic evolutionary history (Fig. 4; see also Fig. S1 in the supplemental material). Gubbins (genealogies unbiased by recombinations in nucleotide sequences) statistical analysis of SNP distribution (48) identified 2,316 regions of putative HGT with a mean size of 3,695 bp (range, 4 bp to 71,774 bp) at 526 loci around the genome. Because HGT can distort inferences of genetic relationships and evolutionary history, the phylogeny of the strains was reassessed using sequences filtered to exclude regions of recombination (Fig. 2C). This analysis greatly reduced the MGD (i.e., average pairwise core SNPs) among the 1,193 strains by 78% (from 610 to 134), a level similar to that found in 3,615 emm1 strains (32). The MGD from clade 1 to clade 2 and the MGD from clade 2 to clade 3 were reduced by 87% and 76%, respectively. The MGD between strains within each of the clades also was substantially reduced. The MGD strain-to-strain within clade 1 went from 226 to 100 (−56%), within clade 2 from 83 to 63 (−24%), and within clade 3 from 244 to 45 (−82%). Importantly, however, after exclusion of SNPs present in chromosomal segments associated with HGT events, 3 primary clades still remained among the 1,193 strains.

FIG 4 .

FIG 4 

Distribution of SNPs and regions of horizontal gene transfer. Illustrated in the genome atlas of clade 2 strain MGAS23530 from the 1st (outermost) ring to the 7th (innermost) ring are the following. (Ring 1) Genome size in megabase pairs (black). (Ring 2) Landmarks: rRNA, 23S 16S-5S rRNA; FCT, fibronectin/collagen/T-antigen; SLS, streptolysin S; SRT, streptin; SAL, salvaricin; MGA, mga operon; HAS, hasABC capsule synthesis operon. (Rings 3 and 4) Coding sequences on the forward (light green) and reverse (dark green) strands. (Ring 5) Clade 1 strain MGAS11027 SNPs (n = 1,915, light blue) relative to clade 2 strain MGAS23530. (Ring 6) Clade 3 strain MGAS27061 SNPs (n = 415, red) relative to clade 2 strain MGAS23530. (Ring 7) Predicted regions of horizontal gene transfer separating clade 1 and 2 strains (light blue), clade 2 and 3 strains (red), and clade 3 and subclade 3D strains (dark blue) as listed in Table 1. SNPs are nonrandomly distributed. Regions of elevated SNP density correspond to predicted horizontal gene transfer/recombination blocks.

Outgroup rooting with the genome of emm1 reference strain SF370 showed that the evolutionary pathway leading to the current emm89 epidemic lineage had clades branching in the sequence of clade 1 followed by clade 2 and then clade 3 (see Fig. S2 in the supplemental material). Clade 1 and clade 2 strains differed by 8 regions of HGT encompassing 171.1 kb or 10% of the genome, and clade 2 and clade 3 strains differed by 6 regions of HGT encompassing 15.3 kb or 0.9% of the genome (Fig. 4 and Table 1). Seven of the 8 HGT regions differentiating clade 1 and clade 2 are most similar in sequence to regions in emm2 reference genome MGAS10270 (see Fig. S3). Of special note, 33 isolates in clade 3 differed from the 725 other clade 3 strains by one additional HGT. These strains, designated subclade 3D (SC-3D) (Fig. 2B), first occurred in the Finland sample in 2009 and have disproportionately increased in prevalence in recent years as a cause of bloodstream infections in that country (see Table S1 and Fig. S4).

TABLE 1 .

HGT recombination blocks separating GAS emm89/M89 clades

Block Clades Starta Stopa Length
(kb)
SNPs Genes M-like % ID
RB1b C1-C2 92,389 164,162 71,774 411 72 M2 88.74b
RB2 C1-C2 295,481 297,574 2,094 9 2 M2 100.00
RB3c C1-C2 773,487 780,634 7,148 55 8 M2 99.55
RB4c C1-C2 794,417 800,659 6,243 28 8 M2 99.55
RB5 C1-C2 921,261 960,297 39,037 100 41 M2 99.68
RB6 C1-C2 1,022,619 1,030,407 7,789 20 6 M2 99.97
RB7 C1-C2 1,543,651 1,561,165 17,515 103 13 M2 97.17
RB8 C1-C2 1,577,916 1,597,447 19,532 138 21 M28 99.58
RB9 C2-C3 86,603 88,366 1,764 12 2 M5/M23 99.38
RB10 C2-C3 145,163 155,569 10,407 59 11 M1/M12 98.52
RB11 C2-C3 244,407 244,758 352 5 1 M5 100.00
RB12 C2-C3 1,472,262 1,473,025 764 9 1 M12 100.00
RB13 C2-C3 1,558,898 1,559,698 801 7 2 M49 99.75
RB14 C2-C3 1,693,613 1,694,805 1,193 6 2 M5/M6 100.00
RB15 C3-SC3D 341,762 359,579 17,818 106 21 M1 99.67
a

The start and stop positions provided are relative to the MGAS23530 genome.

b

The first 18.6 kb and last 39.6 kb are M2-like (>99% identity [ID]); however, the central 13.5 kb FCT pilus-encoding region is unlike that of any other sequenced GAS emm-type.

c

RB3 and RB4 likely represent a single HGT event that encompasses the intervening streptin lantibiotic synthesis genes, thus resulting in a larger single recombination of 26,697 bp.

HGT events are responsible for the bulk of the core sequence differences between the clades. The transferred sequences encompass multiple genes encoding many known secreted and cell surface-associated virulence factors, including the pilus/T-antigen adhesin, fibronectin-binding protein FbaB, the toxin pair NGA and SLO, internalin InlA, C5a peptidase ScpA, antiphagocytic M-like proteins Enn and Mrp, virulence regulators Mga and Ihk-Irr, immunogenic secreted protein Isp1, and the HasABC capsule synthesis enzymes (49). These HGT events have had important consequences. For example, clade 1 strains differ from clade 2 and 3 strains in pilus/T-antigen, and the clade 3 strains cannot produce capsule due to loss of the hasABC genes. Of note, different pilus types have been shown to vary in cell adherence and tissue tropism, and differences in the levels of production of capsule and SPN and SLO cytotoxins can alter virulence (35, 49, 50).

Consistent with SPN and SLO playing a key role in S. pyogenes strain emergence and enhanced fitness, each of the three clades has a distinct nga-ifs-slo region resulting from two independent HGT events. In addition, SC-3D strains differ from the other clade 3 strains due to HGT of a region encoding the SpyA and SpeJ virulence factors (49, 5153). Inasmuch as these multiple HGT events involve regions encoding virulence factors, it is reasonable to hypothesize that many of these HGT events alter host-pathogen interactions.

Variation in gene content and phage genotype.

HGT in bacteria can be mediated by mobile genetic elements (MGE), phages, and integrative-conjugative elements (ICEs). S. pyogenes phages commonly encode one or more secreted virulence factors such as streptococcal pyrogenic exotoxin superantigens and streptococcal phage DNases (54, 55). S. pyogenes ICEs usually encode one or more factors mediating resistance to antibiotics such as tetracycline and macrolides (54). Horizontal acquisition of antibiotic resistance and novel virulence factor genes, mediated by ICEs and phages, has been associated with localized outbreaks and large epidemics of S. pyogenes infections (29). MGE content was investigated in 1,193 emm89 isolates relative to the combined gene content (>53,000 genes) of 30 GAS genomes of 18 emm-types (see Tables S1 and S2 in the supplemental material). This analysis identified 64 different profiles of MGE content (Fig. 5). ICEs were infrequent in the strain sample. The three most prevalent MGE content profiles, or phage genotypes (PGs), accounted for 72% of the strains (Fig. 5). These three phage genotypes (PG01, PG02, and PG03) correspond to the phage content of the reference genomes for each of the three primary clades (Fig. S1 in the supplemental material). With the exception of PG02 (defined as lack of prophages), most phage genotypes were confined to a single clade. The most prevalent (43%) PG in clade 1 was PG03 (phage 11027.1 encoding SpeC and Spd1 and phage 11027.2 encoding Sdn). Also prevalent were PG05 (13%) and PG06 (11%) strains, potentially derived from PG03 strains by phage loss. Most clade 2 strains are PG02 (72%), having no phages. The abundance of PG02 strains representing 20% of the entire emm89 cohort is unusual in that, prior to our investigation, nearly all S. pyogenes genomes had been found to be polylysogenic (55). Most clade 3 strains are PG01 (62%), having phage 27061.1 encoding SpeC and Spd1, followed next in prevalence by PG02 (22%). Of note, although phages 11027.1 and 27061.1 are integrated at the same genomic locus and encode the same two secreted virulence factors, they are different phages (see Fig. S5). PG01 (presence of 27061.1) first occurred in our strain samples in 2003, a time that corresponds to the emergence of the epidemic clade 3 strains. However, the acquisition of 27061.1 by the emm89 population does not result in the epidemic clade 3 strains acquiring new phage-encoded virulence genes that were not already prevalent in the preepidemic clade 1 strains. This finding suggests that acquisition of phage-encoded virulence genes was not a key driver for the emergence of epidemic clade 3 organisms as has been speculated (46).

FIG 5 .

FIG 5 

Prophage content of the emm89 strains. Shown is the phylogeny inferred by neighbor-joining for the 1,193 clade 1, 2, and 3 isolates based on 8,989 core SNPs filtered to exclude SNPs acquired by horizontal gene transfer events. Isolates are colored by phage genotype (PG) as indicated in the index. PGs were assigned in order of prevalence of occurrence in the strain sample. With the exception of PG02 (absence of phage), most of the PGs are exclusive to a single clade. PG01 was first present in the strain sample in 2003 in two isolates, one each of clades 2 and 3. The year 2003 is also when epidemic clade 3 strains were first present in the strain sample.

HGT and extensively remodeled global transcriptomes.

One school of thought postulates that HGT events are similar to point mutations in that most of them are neutral, or nearly so, and have little effect on pathogen traits. The unexpected magnitude of HGT events in the study population (based on previous findings from analysis of other S. pyogenes emm-types) provided a unique opportunity to test the hypothesis that these HGT events have enhanced the virulence of the epidemic emm89 strains by remodeling of the global transcriptome. As a consequence of the greater technical difficulty and expense involved, global transcriptional variation has been far less studied than genomic variation in bacterial pathogens. Moreover, since the data corresponding to the groups of samples studied here were population based and comprehensive and included temporal-spatial information, we had the additional opportunity to assess the potential effect of transcriptome remodeling on strain emergence and dissemination. We used RNAseq to compare transcript variations at two growth points among genetically representative strains of clades 1, 2, and 3 (Fig. 6). These strains have the allelic variant of the major virulence regulators covRS, mga, and ropB that is most common to the clades they represent. These regulators lack known function-altering polymorphisms that influence S. pyogenes gene expression and virulence (49, 5660). The number of genes differentially expressed in stationary-phase growth exceeded the number in exponential-phase growth by approximately 3-fold in all of the clade-to-clade comparisons (Fig. 7A). A general finding was that the greater the genetic distance between strains was, the greater the number of genes significantly altered in transcription. The largest number of differentially expressed genes was recorded between strains MGAS11027 (clade 1) and MGAS23530 (clade 2), consistent with strains in these clades being separated by the greatest MGD (Fig. 2). Genes altered in transcript level by 1.5-fold or greater accounted for 14% and 36% of the genome at the exponential and stationary growth phases, respectively, in comparisons of MGAS11027 (clade 1) and MGAS23530 (clade 2) (see Table S3, section 1, in the supplemental material). Although genes (n = 182) located within the eight distinct regions of HGT differentiating clade 1 and clade 2 comprise only 11% of the gene content, they accounted for 24% of the differentially expressed genes at exponential growth, a highly nonrandom occurrence (P < 0.0001). Importantly, genes encoding many key virulence factors had significantly different transcript levels, including the fibronectin/collagen/T-antigen (FCT) region pilin genes, nga-ifs-slo, speG, ideS, ska, sclA, fba, enn, emm, mrp, and mga (49). Collectively, these findings demonstrate that the genome segments that had been horizontally acquired and retained on the evolutionary pathway leading from clade 1 to clade 2 strains have contributed disproportionately to remodeling the global transcriptome, including many virulence genes, and argue that they are likely not selectively neutral.

FIG 6 .

FIG 6 

Transcriptome analysis of genetically representative preepidemic and epidemic emm89 strains. RNAseq analysis was done in triplicate for six genetically representative strains. The strain index provided in panel C applies to all of the panels. (A) Growth curves. The graph shows the averages of growth curves analyzed in triplicate. The growth curves were closely similar for all strains. Cells were harvested for RNA isolation at mid-exponential growth (ME = optical density at 600 nm [OD600] of 0.5) and early stationary growth (ES = 2 h post-exponential phase). (B and C) Principal component analyses. Illustrated are transcriptional variances among the strains expressed as the primary and secondary principal components, the two largest unrelated variances in the data. Strain replicates cluster, illustrating good reproducibility.

FIG 7 .

FIG 7 

RNAseq and qPCR expression analyses. (A) Genes significantly altered in transcript level at 1.5-fold change or greater in RNAseq. The numbers shown refer to the total number of differentially expressed genes for each comparison. The representative strains of each clade analyzed are C2/C1 (MGAS23530/MGAS11027), C3/C2 (MGAS26844/MGAS23530), and SC-3D/C3 (MGAS27520/MGAS26844). (B) Transcript levels for the nga-ifs-slo operon. The transcript levels of nga, ifs, and slo were significantly (4- to 8-fold) (P < 0.05) greater in the epidemic strains than in the preepidemic strains. The index in panel B applies to panels B, C, and D. (C) Transcript levels of speC and spd1. Transcript levels of speC and spd1 were significantly greater in the preepidemic strain at the exponential growth phase (P < 0.01). (D) Transcript levels for the hasABC operon. Transcription of hasABC was very weak for the clade 2 strain at both growth phases and was significantly less than for the clade 1 strain at the mid-exponential growth phase (P < 0.002). (E, F, and G) Relative transcript levels as measured by qPCR for nga, slo, and hasA, respectively. Differences in strain-to-strain expression were assessed by one-way ANOVA. Expression of nga and slo was significantly greater for all 5 clade 3 strains than for all 6 clade 1 and 2 strains (P < 0.001). All 3 clade 1 and 2 strains with hasABC weak/repressed promoter pattern A expressed significantly less hasA than all 3 clade 1 strains with strong/derepressed promoter pattern B (P < 0.001). Levels of expression of hasA were not significantly different among all 3 strains with weak/repressed promoter pattern A and all 5 genetically acapsular clade 3 strains. RB, recombination block; RPKM, reads per kilobase per million reads mapped.

The genomic changes accruing in the molecular evolution of clade 2 to clade 3 are of considerable interest because they are associated with the emergence, dissemination, and recent rapid increase in the frequency of emm89 invasive infections recorded in many countries (46, 6165). In contrast to the 11% of the gene content reshaped by HGT in the transition from clade 1 to clade 2, a more modest 1% was reshaped in the transition from clade 2 to clade 3. Despite this modest 1% change we found that in comparing the transcriptomes of clade 2 strain MGAS23530 with clade 3 strain MGAS26844, 4% and 11% of the genes were differentially expressed at the exponential and stationary growth phases, respectively (Fig. 7A; see also Table S3, section 2, in the supplemental material). Genes located within regions of HGT were significantly overrepresented among the differentially expressed genes in exponential growth (P < 0.0001). Included among the 28 genes with significantly increased expression in exponential growth were the critical virulence genes nga-ifs-slo (Fig. 7B). To confirm that increased expression of nga and slo is a trait broadly common to clade 3 strains, we assessed the expression of these genes by quantitative PCR (qPCR) in 11 strains selected to represent the range of genetic and geographic diversity present in the emm89 major population (Fig. 7E and F). The subclades represented by these 11 isolates encompass 1,120 (94%) of the 1,193 strains of the major emm89 population. All 5 of the clade 3 strains had significantly greater nga and slo expression than all 6 of the clade 1 and 2 strains (P < 0.001). This is consistent with previous findings for Nga NADase activity assessed for 27 strains of the cohort (50). Importantly, significantly increased transcription of nga-ifs-slo was associated with the emergence and epidemic increase in S. pyogenes emm1 invasive infections (32, 34, 35).

Additional genetic changes that differentiate epidemic clade 3 strains from the most recent predecessor clade 2 strains are acquisition of phage 27061.1 encoding speC and spd1 and loss of the hasABC capsule synthesis genes. To explore the role these genetic changes have potentially played in contributing to the emergence of the epidemic clade 3 strains, we inspected transcript data for the speC and spd1 genes and hasABC virulence factor genes between the preepidemic (clade 1 and clade 2) and epidemic (clade 3) emm89 representative strains. Transcript levels of speC and spd1 were significantly greater for the preepidemic clade 1 MGAS11027 strain than for the epidemic clade 3 MGAS26844 strain at both phases of growth assessed (Fig. 7C). The finding of significantly lower levels of speC and spd1 transcripts in the genetically representative epidemic clade 3 strain further argues that presence of these virulence factors in the clade 3 lineage is unlikely to have conferred a fitness advantage relative to clade 1 strains and therefore is an unlikely mechanism for the emergence of the epidemic clone and displacement of the predecessor clade 1 and clade 2 strains (46). Similarly, although the epidemic clade 3 strains are incapable of producing the antiphagocytic hyaluronic acid (HA) capsule due to HGT-mediated loss of the hasABC genes, the transcript data indicate that this gene loss was likely not responsible for a significant decrease in capsule production between the clade 2 and 3 strains. We found that transcription of hasABC was very weak in clade 2 strain MGAS23530 under both growth conditions assessed (Fig. 7D), arguing that capsule production was already negligible before the HGT-mediated loss of the hasABC genes by the clade 3 lineage. Capsule production was strong only for clade 1 strain MGAS11027 at the exponential growth phase.

We next investigated the molecular basis for the differences in capsule production using all strains of clades 1 and 2. Sequence variation in the hasABC promoter has been reported to alter transcription and capsule production (66). Inspection of the genome sequence data, coupled with Sanger sequencing of the hasABC promoter for all clade 1 and 2 strains, identified two major variants (see Fig. S6A in the supplemental material). These promoter variants corresponded to strong clade 1 strain MGAS11027 and weak clade 2 strain MGAS23530 hasABC transcription. Whereas the two promoter variants are equally represented among clade 1 strains, the vast majority (88.5%) of clade 2 strains had the weak transcription variant (see Fig. S6B and C in the supplemental material). Expression of hasA correlated perfectly with the promoter variant (Fig. 7G), which is consistent with results of HA production assays previously reported for 27 strains of the cohort (50). Importantly, hasA transcript levels for strains with the weak promoter variant were not significantly different from those of the clade 3 strains that lack the hasABC genes. Thus, the evolution of clade 3 from a clade 2 progenitor strain likely involved a transition from very little capsule production to no capsule production. This again argues that loss of the hasABC genes by the clade 3 lineage is unlikely to confer a fitness advantage relative to the clade 1 and 2 strains and is therefore an unlikely mechanism for the epidemic emergence and displacement of the predecessor lineages. Whereas some S. pyogenes outbreaks have been associated with strains having a hyperencapsulation phenotype (33, 67) we are unaware of a body of epidemiological data associating GAS epidemic outbreaks with a loss of capsule phenotype. To summarize, the global transcriptome data comparing the preepidemic and epidemic strains show that neither production of phage-encoded virulence factors SpeC and Spd1 nor lack of production of the antiphagocytic HA capsule is a characteristic unique to the emergent clade 3 strains relative to the predecessor clade 1 and 2 strains and therefore does not correspond to the epidemic increase in invasive infections.

The very recent emergence of SC-3D strains is temporally associated with a single HGT event in which SC-3D strains acquired an 18 kb sequence that includes 21 genes, including genes encoding the secreted virulence proteins SpyA, a C3-like ADP-ribosyltransferase, and SpeJ, a pyrogenic exotoxin superantigen (49, 5153). On the basis of the nearly identical sequences, this 18 kb region likely was acquired from an epidemic emm1 clone donor. Differentially expressed genes accounted for 2% and 11% of the genome at the exponential and stationary growth phases, respectively, in comparisons of the transcriptomes of strain MGAS26844 (clade 3) and MGAS27520 (SC-3D) (Fig. 7A; see also Table S3, section 3, in the supplemental material). This was the lowest number of differentially expressed genes among the four genetically representative strains studied, consistent with SC-3D strains being a recently emerged closely genetically related subset of the epidemic clade 3 strains.

Further transcriptome remodeling and epidemic perpetuation.

Discovery of significant alteration of transcriptomes caused by HGT events, and the role in emergence and dissemination of clade 3 organisms, led us to investigate the hypothesis that additional transcriptome remodeling contributed to perpetuating the emm89 epidemic. We tested this hypothesis by focusing on SC-3D strains, because these organisms disproportionately increased in frequency in Finland starting from 2013 (Fig. 2A; see also Fig. S4 and Table S1 in the supplemental material). Given the relatively modest number of genes differentially expressed between MGAS26844 (clade 3) and MGAS27520 (subclade 3D), we interrogated the genome data for candidate polymorphisms that may further alter the transcriptome and potentially influence pathogen behavior. Analysis of the genome sequences of the 33 SC-3D strains found unique single amino acid replacements in gene regulators CovR (S130N) and LiaS (K214R). These polymorphisms were prevalent among the SC-3D strains; 11 strains had the CovR (S130N) change, and 6 strains had the LiaS (K214R) change (see Fig. S4). In contrast, none of the other 1,183 emm89 or 3,615 emm1 strains (32) studied had these polymorphisms. The branching of the strains with these mutations in the inferred phylogeny and their absence in other S. pyogenes strains indicate identity by descent rather than identity by independent mutation (i.e., commonality by evolutionary convergence).

Repeated recovery of clonal progeny with either the CovR (S130N) or LiaS (K214R) polymorphisms from invasive episodes has not been reported previously and thus was unexpected. Because relatively little is known about liaS in S. pyogenes, we elected to study the LiaS (K214R) polymorphism in more detail. Consistent with our altered-transcriptome hypothesis, RNAseq analysis showed that the transcriptome of strain MGAS27710 LiaS (K214R) differed from that of SC-3D LiaS wild-type strain MGAS27520, including significant changes in expression of several virulence genes (data not presented). However, as these two strains are not isogenic, the extent to which the altered transcription was due to the LiaS (K214R) polymorphism could not be assessed. To address this issue, we constructed a LiaS (K214R) isogenic mutant from parental strain MGAS27556 and conducted RNAseq analysis. We found that, compared to the wild-type parental strain, the LiaS (K214R) isogenic mutant had 127 and 70 differentially expressed genes in exponential-phase growth and stationary-phase growth, respectively (see Table S3, section 4, in the supplemental material). Virulence genes significantly increased in expression by the LiaS (K214R) isogenic mutant included all 9 genes of the streptolysin S biosynthesis operon (sagABCDEFGHI) in exponential phase and speG encoding streptococcal pyrogenic exotoxin G in stationary phase.

The capacity of the CovR (S130N) and LiaS (K214R) naturally occurring mutant strains to repeatedly cause serious infections means that they can effectively spread between hosts and implies that they are not attenuated in the ability to survive in the upper respiratory tract, the more common S. pyogenes niche. Consistent with this idea, we found that the naturally occurring mutant strains had an enhanced ability to survive in human saliva ex vivo relative to SC-3D wild-type strain MGAS27520 (Fig. 8H). These results contrast with data showing that strains with other covR or covS (covR/S) mutations have reduced survival in human saliva relative to wild-type strains (68).

FIG 8 .

FIG 8 

Virulence assays. (A) Kaplan-Meier survival curve for mice (n = 25/strain) inoculated intramuscularly in the right hind limb with 2.5 × 108 CFU. The genetically representative epidemic strain (MGAS26844) was significantly more lethal than the preepidemic strains throughout the period of observation. The index of the strains compared in panel A applies to panels A to G. (B) Histopathology scores for muscle tissue sections as determined by pathologists blind to the infecting strain. Data represent means (n = 5 assessments/strain) ± standard errors of the means (SEM). (C) Cynomolgus macaques were inoculated intramuscularly in the anterior thigh with 1.0 × 109 CFU/kg of body mass. Shown at the same magnification are micrographs of muscle tissue sections from the site of inoculation. (D and E) Epidemic strain MGAS26844 caused significantly larger lesions (D) with greater tissue destruction (E) than preepidemic strain MGAS11027. (F and G) Although the bacterial burdens were similar at the site of inoculation (F), they were significantly greater for the epidemic strain than for the preepidemic strain at the distal margin (G) showing greater dissemination. P values for panels B, D, E, F, and G were determined with the Mann-Whitney test. (H) Viability of naturally occurring variant strains MGAS28980 CovR (S130N) and MGAS27552 LiaS (K214R) in human saliva persisted for 2 and 4 weeks longer, respectively, than that of wild-type strain MGAS27520. No growth, <10 CFU/ml for a 1:10 dilution. IM, intramuscularly; NHP, nonhuman primate.

Comparative strain virulence.

The epidemiological, comparative genomic, and transcriptome data demonstrate that clade 1, 2, and 3 organisms are genotypically and phenotypically distinct and strongly suggest differences in virulence. To test this hypothesis, the three genetically distinct reference strains for each clade were compared in mouse and nonhuman primate models of necrotizing fasciitis (NF) (6971). Epidemic clade 3 reference strain MGAS26844 was significantly more lethal and caused significantly greater tissue damage in the mouse NF infection model than the two preepidemic reference strains (Fig. 8A and B). Moreover, relative to clade 1 reference strain MGAS11027, epidemic clade 3 strain MGAS26844 caused significantly larger lesions with greater tissue damage in a nonhuman primate model of NF (Fig. 8C to E).

Concluding comment.

We have used S. pyogenes as a model pathogen for studying the evolutionary genomics of epidemic disease and the molecular basis of bacterial pathogenesis. The organism is a strict human pathogen, causes abundant infections worldwide, and has a relatively small genome (~1.8 Mb). In addition to its propensity to cause epidemic waves, the availability of comprehensive, population-based strain collections from many countries, coupled with the fact that humans are its only natural host, means that the history of underlying events that generate genomic diversity is not obscured by molecular processes occurring in nonhuman hosts or environmental reservoirs. These factors afford considerable advantages in the use of S. pyogenes as a model system compared to many other pathogenic bacteria such as E. coli, S. enterica, and S. aureus.

The primary goal of our study was to determine if genomic changes linked with the origin and perpetuation of human epidemic disease have remodeled global gene expression and altered virulence in the model pathogen S. pyogenes. We were especially interested in determining the effect, if any, of horizontally acquired genome segments on global gene expression and virulence of the progeny strains. Despite the importance of bacterial pathogens in human and veterinary health, remarkably few studies have addressed how transcriptome remodeling contributes to the origin and perpetuation of epidemics. Zhou et al. (26) studied diversity in 149 genomes of S. enterica serovar Paratyphi A and used the resulting data to speculate that most recent increases in frequencies of bacterial diseases are due to environmental changes rather than to the novel evolution of pathogenic bacteria. In essence, it was suggested that many epidemics and pandemics of bacterial disease in humans did not involve recent evolution of particularly virulent organisms but instead reflected chance environmental events. A similar conclusion was reached in studies of other pathogens, for example, Yersinia pestis, S. enterica serovar Agona, Mycobacterium tuberculosis, Mycobacterium leprae, and Shigella sonnei (17, 25). Although this may be the case for some pathogens, on the basis of the full-genome data from 4,815 strains, human patient information (33), analysis of isogenic mutant strains, RNAseq studies, and experimental animal infection, we arrive at a fundamentally different conclusion for emm89 and emm1 S. pyogenes, organisms that have caused epidemics involving tens of millions of human infections in the last 30 years. In particular, our results unambiguously show that newly emerged clones causing epidemic disease are more virulent than previously circulating precursor organisms. For clarity, we consider all steps in pathogen-host interaction to potentially contribute to the virulence phenotype, including survival and proliferation after initial contact with the host through invasion of deeper tissues and spread to new hosts. Conclusions about molecular pathogenesis and virulence based solely or predominantly on population genomic analyses of a convenience sample of strains and resulting inferences are not likely to fully reflect the biology of pathogen and host interaction. This issue may be especially problematic if only one or a few nucleotide changes significantly alter virulence.

We believe that our findings have important implications for bacterial pathogens that must successfully circumvent host defenses, at both the individual level and the population level. Our analysis demonstrated that, among the various emm89 clades and subclades, considerable variation exists among global transcriptomes, in both the spectrum of genes expressed and their magnitude of expression. This means that, in essence, many different antigen, toxin, and virulence factor profiles can be and are being displayed to host populations as a function of individual strain genotype and not necessarily of emm-type. In the absence of one or a small number of conserved antigens mediating protective immunity, regardless of the microbe, significant variations in antigen repertoire have implications for vaccine research, formulation, and deployment.

Many elegant studies of the population genomics of bacterial pathogens have been published over the last decade (4, 5, 1018, 23, 25, 26, 32, 7276). There is a small but emerging literature bearing on the impact of regulatory plasticity in bacterial evolution and fitness (7781). However, there has been very little work designed to integrate microbial population genomics, molecular pathogenesis processes, microbial emergence, transcriptome remodeling, and virulence. Our findings suggest that this could be a fruitful area of research for other microbial pathogens. The resulting data are likely to have significant implications for understanding bacterial epidemics and for translational research efforts to blunt their detrimental effects.

MATERIALS AND METHODS

Further details of the materials and methods used are described in Text S1 in the supplemental material.

Bacterial strains.

We studied 1,200 GAS emm89 strains, including 1,198 strains causing invasive infections and two from pharyngitis patients (see Table S1 in the supplemental material). The vast majority of the strains (n = 1,178) were collected as part of comprehensive population-based public health surveillance of GAS invasive infections conducted in the United States, Finland, and Iceland between 1995 and 2014. The remaining emm89 strains were recovered from invasive disease cases in Ontario, Canada, and from a pharyngitis case in Italy. A subset of this population has been previously studied, and preliminary genetic findings have been presented (35, 48).

Genome sequencing.

Isolation of chromosomal DNA, generation of paired-end libraries, and multiplexed sequencing were accomplished as described previously (32, 35) using Illumina (San Diego, CA) instruments (HiSeq2500, MiSeq, and NextSeq). Whole-genome sequencing data for the 1,200 isolates studied were deposited in the NCBI Sequence Read Archive.

Reference genome assembly, annotation, and polymorphism discovery.

The bioinformatics tools used for assembling and annotating the reference genomes and for identifying and analyzing polymorphisms in the population studied are described in Text S1 in the supplemental material. Complete genome sequences for genetically representative strains MGAS11027, MGAS23530, and MGAS27061 were deposited in the NCBI GenBank database. MGAS11027, MGAS23530, and MGAS27061 were deposited in the BEIR strain repository.

Phylogenetic inference and population structure.

The bioinformatics tools used for sequence alignments, detection, and filtering of HGT polymorphisms, for clustering and phylogenetic inference, and for analysis of the population structure are described in Text S1 in the supplemental material.

Gene content and mobile genetic element analysis.

The known GAS pangenome core and accessory gene content was determined based on 30 complete genomes of 18 different emm-types (see Table S2 in the supplemental material) as described in Text S1 in the supplemental material. Among the 53,336 coding sequences (CDSs) of the 30 genomes, PanOCT identified 3,338 ortholog clusters, culled by BLAST reciprocal-best-hit analysis to 2,835 on the basis of the criterion of no two clusters sharing >95% amino acid identity. A GAS pseudo-pangenome sequence of ~3 Mbp was generated by concatenating onto the emm89 MGAS23530 reference genome all accessory gene content not already present in the genome, starting with emm89 strains MGAS11027 and MGAS27061, and then the remaining 27 genomes by emm-type (i.e., emm1, emm2, emm3, etc.). Based on mapping of the emm89 reference genome sequencing reads to the GAS-30 pangenome, an RPKM (reads per kilobase of transcript per million reads mapped) value of >50 corresponded to gene presence. A phage was called present if a minimum of 80% of its gene content represented in the GAS-30 pangenome was found to be present. Reads not mapping to the GAS-30 pangenome were assembled de novo using SPAdes. Resultant contigs with greater than 100 nucleotides were queried against the NCBI nonredundant database using BLAST to determine their nature.

Construction of isogenic mutant strains.

The construction of the liaS isogenic mutant strain was accomplished by allelic exchange as previously described (35). Briefly, MGAS27556 LiaS (K214R) was generated by introducing the liaS A641G SNP into wild-type strain MGAS27556, using DNA amplified from strain MGAS27552, a clinical isolate with a naturally occurring liaS A641G SNP (i.e., LiaS K214R substitution) as the template. Successful introduction of the desired SNP and the absence of spontaneous spurious mutations were confirmed in candidate isogenic mutants by whole-genome sequencing. Primers, plasmids, and restriction enzymes used in the construction are listed in Text S1 in the supplemental material.

Transcriptome sequencing and expression analysis

.Whole-genome transcriptional analysis was conducted for strains genetically representative of the clades and subclades studied using RNAseq as previously described (35, 82) with minor modifications. Briefly, RNA was isolated from triplicate cultures grown in Todd-Hewitt broth supplemented with yeast extract (THY). Multiplexed libraries were subjected to single-end sequence analysis (50 bp) to a high depth value (~10 million reads/sample) with an Illumina HiSeq2500 instrument. RNAseq reads were mapped to the genome of the most closely related emm89 reference strain (for example, clade 3 strains were mapped to the genome of reference strain MGAS27061). Use of multiple reference sequences was critical, as the use of a single common reference did not permit accurate quantitative read mapping to the divergent sequences in the regions of HGT. RNAseq data were normalized, and genes statistically differently expressed following Benjamini-Hochberg correction at a minimum 1.5-fold change in mean transcript level were identified using the bioinformatics tools provided in Text S1 in the supplemental material. RNAseq transcriptome data were deposited in the NCBI Gene Expression Omnibus database. Expression levels of the key virulence genes nga, slo, and hasA were assessed by quantitative real-time PCR in triplicate for 11 strains genetically representative of the most abundant subclades of the major population using primers, probes, and protocols previously described (50). The significance of strain-to-strain differences in expression was assessed by one-way analysis of variance (ANOVA).

Experimental animal infections.

The virulence of serotype emm89 reference strains MGAS11027, MGAS23530, and MGAS26844 was assessed using mouse and nonhuman primate models of necrotizing fasciitis (32, 6971). These strains have a wild-type (i.e., the most commonly occurring) allele for all major transcription regulators, including covR/S, ropB, and mga. All animal experiments were approved by the Institutional Animal Care and Use Committee of Houston Methodist Research Institute.

Accession numbers.

Whole-genome sequencing data for the 1,200 isolates studied were deposited in the NCBI Sequence Read Archive under accession number SRP059971. Complete genome sequences for genetically representative strains MGAS11027, MGAS23530, and MGAS27061 were deposited in the NCBI GenBank database under accession numbers CP013838, CP013839, and CP013840, respectively. MGAS11027, MGAS23530, and MGAS27061 were deposited in the BEIR strain repository under accession numbers NR-33707, NR-33706, and NR-50285, respectively. RNAseq transcriptome data were deposited in the NCBI Gene Expression Omnibus database under accession number GSE76816.

SUPPLEMENTAL MATERIAL

Text S1 

Supplemental Materials and Methods. Download

Figure S1 

Atlases for the three emm89 reference genomes. Shown from the outermost (1st) ring to the innermost (12th) ring are the following: ring 1, megabase pairs (black); ring 2, gene or operon landmarks; rings 3 and 4, coding sequences on the forward strand (light blue) and reverse strand (dark blue); rings 5, 7, and 9, BLAST nucleotide sequence comparison with the genomes indicated in the respective indexes; rings 6, 8, and 10, distribution of SNPs for the genomes indicated in the respective indexes; ring 11, G+C relative to the mean; and ring 12, GC skew. BLAST nucleotide sequence comparisons were made between the genomes of the clade 1, 2, and 3 reference strains and with a de novo assembly of strain MGAS27450, the most phylogenetically distant emm89 outlier strain. Download

Figure S2 

Genetic relationships among emm89 reference strains, with emm1 reference strain SF370 used as the rooting outgroup. Genetic relationships among the three emm89 reference strains and the seven outlier strains are shown using emm1 reference strain SF370 as an outgroup. Relationships were inferred based on 26,371 core SNPs by neighbor-network decomposition of splits. The sequence of branching of the three numerically dominant emm89 primary clades along the evolutionary path leading to the contemporary epidemic emm89 strains is clade 1 (MGAS11027) followed by clade 2 (MGAS23530) and then epidemic clade 3 (MGAS27061). Download

Figure S3 

Potential horizontal gene transfer (HGT) region donors. The genetic relationships among the three emm89 clade reference strains and 39 strains of 18 other emm-types for which there were complete genome sequences publically available as of 10 July 2015 are shown for each of the predicted recombination blocks (RB) separating the clades. Sequences flanking the predicted recombination blocks in the strain MGAS23530 genome were used to define the corresponding regions in the other strains using BLASTn. The sequences corresponding to the predicted recombination blocks among all 42 strains were aligned using MAFFT, and relationships were inferred by neighbor-network decomposition of splits using SplitsTree. The length of the recombination block and locus tags of the genes involved are listed relative to strain MGAS23530. Of note, seven of the eight recombination blocks separating all 359 clade 1 strains from all 78 clade 2 strains share a more recent common ancestor with emm2 reference strain MGAS10270 than with reference strains of any of the other emm-types used in this comparison. Download

Figure S4 

Genetic relationships among emm89 subclade 3D strains. Genetic relationships among the 33 subclade 3D strains are shown using clade 3 reference strain MGAS27061 as the outgroup. Relationships were inferred based on 157 core SNPs by neighbor-joining using SplitsTree. All subclade 3D strains differ from all progenitor clade 3 strains by an 18 kb region of HGT involving the virulence factors SpyA and SpeJ (Table 1; recombination block 15). To constrain the inference primarily to vertically inherited SNPs, SNPs within putative regions of HGT were identified and filtered out using Gubbins. The 11 strains with the CovR (S130N) substitution branch together, indicating inheritance by descent. Similarly, all but 1 of the 6 strains with the LiaS (K214R) substitution branch together, again indicating inheritance by descent. We attribute the single LiaS (K214R) strain not branching with the others as likely being due to a few scant horizontally acquired polymorphisms that were insufficient to statistically significantly elevate the SNP density and therefore were not detected/excluded by Gubbins. Download

Figure S5 

Comparison of phages 11027.1 and 27061.1. Shown above is a percent identity plot, and below is a dot matrix alignment. The phages are similar over the 5′ first ~13 kb sequence, which includes the integrase, replication, and lytic/lysogenic regulatory genes; the phages diverge over most of the central portions encoding head and tail coat proteins and then are similar again over the 3′ last ~3 kb sequence, which encodes the secreted virulence factors streptococcal pyrogenic exotoxin C (SpeC) superantigen and the streptococcal phage DNase 1 (Spd1). The divergence in sequence between phages 11027.1 and 27061.1 means that 27061.1 did not evolve from 11027.1 through a simple single deletion event. Despite being integrated at the same genomic locus and encoding the same virulence factors, they are distinct mosaic phages. Download

Figure S6 

hasABC promoter variants. (A) hasABC promoter pattern variants identified among the emm89 clade 1 and clade 2 strains are illustrated. Patterns A and B account for 99% of the strains. Pattern B has a 38-bp deletion relative to pattern A which eliminates a putative Rho-independent terminator. In M1 strain MGAS2221, deletion of this terminator results in release of hasABC from CovR repression, resulting in enhanced capsule production. (B) Distribution of hasABC promoter variants among the clade 1 and 2 strains. (C) Distribution of hasABC promoter variants among the clade 1 and clade 2 strains. Genetic relationships among the emm89 clade 1 and clade 2 strains inferred by neighbor-joining based on 5,663 core SNPs filtered using Gubbins to exclude regions of horizontal gene transfer are illustrated. Strains are colored by promoter variant as indicated in the index. Clade 1 strains are a mix of nearly equal amounts of pattern A (weak/repressed) and pattern B (strong/derepressed) promoter variants, whereas the vast majority of clade 2 strains are of pattern A. These findings are consistent with the level of hasABC transcripts for clade 1 strain MGAS11027 being significantly greater than that for clade 2 strain MGAS23530 as determined by RNAseq. Download

Table S1 

Strains and characteristics.

Table S2 

Streptococcus pyogenes complete genome sequences.

Table S3 

RNAseq transcriptome analyses.

ACKNOWLEDGMENTS

We thank FiRe (the Finnish Study Group on Antimicrobial Resistance) Chris A. Van Beneden, Bernard Beall, and the Active Bacterial Core surveillance (ABCs) of the Emerging Infections Programs network of the Centers for Disease Control and Prevention (CDC); Kathryn Stockbauer and Helen Chifotides for critical comment and editorial assistance; and Hanne-Leena Hyyrylainen, Kai Puhakainen, and Francesca Latronico for microbiological and epidemiological assistance.

This project was supported in part by the Fondren Foundation, Houston Methodist Hospital; the Academy of Finland (grant 255636); and the European Society of Clinical Microbiology and Infectious Diseases Training Fellowship 2011 and the Federation of European Societies of Microbiology Research Fellowship 2011-1 (awarded to M.C.D.L.).

Footnotes

Citation Beres SB, Kachroo P, Nasser W, Olsen RJ, Zhu L, Flores AR, de la Riva I, Paez-Mayorga J, Jimenez FE, Cantu C, Vuopio J, Jalava J, Kristinsson KG, Gottfredsson M, Corander J, Fittipaldi N, Di Luca MC, Petrelli D, Vitali LA, Raiford A, Jenkins L, Musser JM. 2016. Transcriptome remodeling contributes to epidemic disease caused by the human pathogen Streptococcus pyogenes. mBio 7(3):e00403-16. doi:10.1128/mBio.00403-16.

REFERENCES

  • 1.Achtman M. 2008. Evolution, population structure, and phylogeography of genetically monomorphic bacterial pathogens. Annu Rev Microbiol 62:53–70. doi: 10.1146/annurev.micro.62.081307.162832. [DOI] [PubMed] [Google Scholar]
  • 2.Bobay LM, Traverse CC, Ochman H. 2015. Impermanence of bacterial clones. Proc Natl Acad Sci U S A 112:8893–8900. doi: 10.1073/pnas.1501724112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Caugant DA, Frøholm LO, Bøvre K, Holten E, Frasch CE, Mocca LF, Zollinger WD, Selander RK. 1986. Intercontinental spread of a genetically distinctive complex of clones of Neisseria meningitidis causing epidemic disease. Proc Natl Acad Sci U S A 83:4927–4931. doi: 10.1073/pnas.83.13.4927. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Chewapreecha C, Harris SR, Croucher NJ, Turner C, Marttinen P, Cheng L, Pessia A, Aanensen DM, Mather AE, Page AJ, Salter SJ, Harris D, Nosten F, Goldblatt D, Corander J, Parkhill J, Turner P, Bentley SD. 2014. Dense genomic sampling identifies highways of pneumococcal recombination. Nat Genet 46:305–309. doi: 10.1038/ng.2895. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Cui Y, Yu C, Yan Y, Li D, Li Y, Jombart T, Weinert LA, Wang Z, Guo Z, Xu L, Zhang Y, Zheng H, Qin N, Xiao X, Wu M, Wang X, Zhou D, Qi Z, Du Z, Wu H, Yang X, Cao H, Wang H, Wang J, Yao S, Rakin A, Li Y, Falush D, Balloux F, Achtman M, Song Y, Wang J, Yang R. 2013. Historical variations in mutation rate in an epidemic pathogen, Yersinia pestis. Proc Natl Acad Sci U S A 110:577–582. doi: 10.1073/pnas.1205750110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Falush D, Kraft C, Taylor NS, Correa P, Fox JG, Achtman M, Suerbaum S. 2001. Recombination and mutation during long-term gastric colonization by Helicobacter pylori: estimates of clock rates, recombination size, and minimal age. Proc Natl Acad Sci U S A 98:15056–15061. doi: 10.1073/pnas.251396098. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Feil EJ, Holmes EC, Bessen DE, Chan MS, Day NP, Enright MC, Goldstein R, Hood DW, Kalia A, Moore CE, Zhou J, Spratt BG. 2001. Recombination within natural populations of pathogenic bacteria: short-term empirical estimates and long-term phylogenetic consequences. Proc Natl Acad Sci U S A 98:182–187. doi: 10.1073/pnas.98.1.182. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Feil EJ, Spratt BG. 2001. Recombination and the population structures of bacterial pathogens. Annu Rev Microbiol 55:561–590. doi: 10.1146/annurev.micro.55.1.561. [DOI] [PubMed] [Google Scholar]
  • 9.Fraser C, Hanage WP, Spratt BG. 2005. Neutral microepidemic evolution of bacterial pathogens. Proc Natl Acad Sci U S A 102:1968–1973. doi: 10.1073/pnas.0406993102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Harris SR, Feil EJ, Holden MT, Quail MA, Nickerson EK, Chantratita N, Gardete S, Tavares A, Day N, Lindsay JA, Edgeworth JD, de Lencastre H, Parkhill J, Peacock SJ, Bentley SD. 2010. Evolution of MRSA during hospital transmission and intercontinental spread. Science 327:469–474. doi: 10.1126/science.1182395. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Holden MT, Hsu LY, Kurt K, Weinert LA, Mather AE, Harris SR, Strommenger B, Layer F, Witte W, de Lencastre H, Skov R, Westh H, Zemlicková H, Coombs G, Kearns AM, Hill RL, Edgeworth J, Gould I, Gant V, Cooke J, Edwards GF, McAdam PR, Templeton KE, McCann A, Zhou Z, Castillo-Ramirez S, Feil EJ, Hudson LO, Enright MC, Balloux F, Aanensen DM, Spratt BG, Fitzgerald JR, Parkhill J, Achtman M, Bentley SD, Nubel U. 2013. A genomic portrait of the emergence, evolution, and global spread of a methicillin-resistant Staphylococcus aureus pandemic. Genome Res 23:653–664. doi: 10.1101/gr.147710.112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Holt KE, Parkhill J, Mazzoni CJ, Roumagnac P, Weill FX, Goodhead I, Rance R, Baker S, Maskell DJ, Wain J, Dolecek C, Achtman M, Dougan G. 2008. High-throughput sequencing provides insights into genome variation and evolution in Salmonella typhi. Nat Genet 40:987–993. doi: 10.1038/ng.195. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Holt KE, Thieu Nga TV, Thanh DP, Vinh H, Kim DW, Vu Tra MP, Campbell JI, Hoang NV, Vinh NT, Minh PV, Thuy CT, Nga TT, Thompson C, Dung TT, Nhu NT, Vinh PV, Tuyet PT, Phuc HL, Lien NT, Phu BD, Ai NT, Tien NM, Dong N, Parry CM, Hien TT, Farrar JJ, Parkhill J, Dougan G, Thomson NR, Baker S. 2013. Tracking the establishment of local endemic populations of an emergent enteric pathogen. Proc Natl Acad Sci U S A 110:17522–17527. doi: 10.1073/pnas.1308632110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Mather AE, Reid SW, Maskell DJ, Parkhill J, Fookes MC, Harris SR, Brown DJ, Coia JE, Mulvey MR, Gilmour MW, Petrovska L, de Pinna E, Kuroda M, Akiba M, Izumiya H, Connor TR, Suchard MA, Lemey P, Mellor DJ, Haydon DT, Thomson NR. 2013. Distinguishable epidemics of multidrug-resistant Salmonella Typhimurium DT104 in different hosts. Science 341:1514–1517. doi: 10.1126/science.1240578. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.McAdam PR, Vander Broek CW, Lindsay DS, Ward MJ, Hanson MF, Gillies M, Watson M, Stevens JM, Edwards GF, Fitzgerald JR. 2014. Gene flow in environmental Legionella pneumophila leads to genetic and pathogenic heterogeneity within a legionnaires’ disease outbreak. Genome Biol 15:504. doi: 10.1186/PREACCEPT-1675723368141690. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Morelli G, Song Y, Mazzoni CJ, Eppinger M, Roumagnac P, Wagner DM, Feldkamp M, Kusecek B, Vogler AJ, Li Y, Cui Y, Thomson NR, Jombart T, Leblois R, Lichtner P, Rahalison L, Petersen JM, Balloux F, Keim P, Wirth T, Ravel J, Yang R, Carniel E, Achtman M. 2010. Yersinia pestis genome sequencing identifies patterns of global phylogenetic diversity. Nat Genet 42:1140–1143. doi: 10.1038/ng.705. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Reuter S, Connor TR, Barquist L, Walker D, Feltwell T, Harris SR, Fookes M, Hall ME, Petty NK, Fuchs TM, Corander J, Dufour M, Ringwood T, Savin C, Bouchier C, Martin L, Miettinen M, Shubin M, Riehm JM, Laukkanen-Ninios R, Sihvonen LM, Siitonen A, Skurnik M, Falcao JP, Fukushima H, Scholz HC, Prentice MB, Wren BW, Parkhill J, Carniel E, Achtman M, McNally A, Thomson NR. 2014. Parallel independent evolution of pathogenicity within the genus Yersinia. Proc Natl Acad Sci U S A 111:6768–6773. doi: 10.1073/pnas.1317161111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Sánchez-Busó L, Comas I, Jorques G, González-Candelas F. 2014. Recombination drives genome evolution in outbreak-related Legionella pneumophila isolates. Nat Genet 46:1205–1211. doi: 10.1038/ng.3114. [DOI] [PubMed] [Google Scholar]
  • 19.Selander RK, Levin BR. 1980. Genetic diversity and structure in Escherichia coli populations. Science 210:545–547. doi: 10.1126/science.6999623. [DOI] [PubMed] [Google Scholar]
  • 20.Selander RK, Musser JM, Caugant DA, Gilmour MN, Whittam TS. 1987. Population genetics of pathogenic bacteria. Microb Pathog 3:1–7. [DOI] [PubMed] [Google Scholar]
  • 21.Smith JM, Smith NH, O’Rourke M, Spratt BG. 1993. How clonal are bacteria? Proc Natl Acad Sci U S A 90:4384–4388. doi: 10.1073/pnas.90.10.4384. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Sreevatsan S, Pan X, Stockbauer KE, Connell ND, Kreiswirth BN, Whittam TS, Musser JM. 1997. Restricted structural gene polymorphism in the Mycobacterium tuberculosis complex indicates evolutionarily recent global dissemination. Proc Natl Acad Sci U S A 94:9869–9874. doi: 10.1073/pnas.94.18.9869. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Wong VK, Baker S, Pickard DJ, Parkhill J, Page AJ, Feasey NA, Kingsley RA, Thomson NR, Keane JA, Weill FX, Edwards DJ, Hawkey J, Harris SR, Mather AE, Cain AK, Hadfield J, Hart PJ, Thieu NT, Klemm EJ, Glinos DA, Breiman RF, Watson CH, Kariuki S, Gordon MA, Heyderman RS, Okoro C, Jacobs J, Lunguya O, Edmunds WJ, Msefula C, Chabalgoity JA, Kama M, Jenkins K, Dutta S, Marks F, Campos J, Thompson C, Obaro S, MacLennan CA, Dolecek C, Keddy KH, Smith AM, Parry CM, Karkey A, Mulholland EK, Campbell JI, Dongol S, Basnyat B, Dufour M, Bandaranayake D. 2015. Phylogeographical analysis of the dominant multidrug-resistant H58 clade of Salmonella typhi identifies inter- and intracontinental transmission events. Nat Genet 47:632–639. doi: 10.1038/ng.3281. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Wyres KL, Lambertsen LM, Croucher NJ, McGee L, von Gottberg A, Liñares J, Jacobs MR, Kristinsson KG, Beall BW, Klugman KP, Parkhill J, Hakenbeck R, Bentley SD, Brueggemann AB. 2012. The multidrug-resistant PMEN1 pneumococcus is a paradigm for genetic success. Genome Biol 13:R103. doi: 10.1186/gb-2012-13-11-r103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Zhou Z, McCann A, Litrup E, Murphy R, Cormican M, Fanning S, Brown D, Guttman DS, Brisse S, Achtman M. 2013. Neutral genomic microevolution of a recently emerged pathogen, Salmonella enterica serovar Agona. PLoS Genet 9:e1003471. doi: 10.1371/journal.pgen.1003471. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Zhou Z, McCann A, Weill FX, Blin C, Nair S, Wain J, Dougan G, Achtman M. 2014. Transient Darwinian selection in Salmonella enterica serovar Paratyphi A during 450 years of global spread of enteric fever. Proc Natl Acad Sci U S A 111:12199–12204. doi: 10.1073/pnas.1411012111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Zhu P, van der Ende A, Falush D, Brieske N, Morelli G, Linz B, Popovic T, Schuurman IG, Adegbola RA, Zurth K, Gagneux S, Platonov AE, Riou JY, Caugant DA, Nicolas P, Achtman M. 2001. Fit genotypes and escape variants of subgroup III Neisseria meningitidis during three pandemics of epidemic meningitis. Proc Natl Acad Sci U S A 98:5234–5239. doi: 10.1073/pnas.061386098. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Banks DJ, Porcella SF, Barbian KD, Beres SB, Philips LE, Voyich JM, DeLeo FR, Martin JM, Somerville GA, Musser JM. 2004. Progress toward characterization of the group A streptococcus metagenome: complete genome sequence of a macrolide-resistant serotype M6 strain. J Infect Dis 190:727–738. doi: 10.1086/422697. [DOI] [PubMed] [Google Scholar]
  • 29.Beres SB, Sylva GL, Barbian KD, Lei B, Hoff JS, Mammarella ND, Liu MY, Smoot JC, Porcella SF, Parkins LD, Campbell DS, Smith TM, McCormick JK, Leung DY, Schlievert PM, Musser JM. 2002. Genome sequence of a serotype M3 strain of group A streptococcus: phage-encoded toxins, the high-virulence phenotype, and clone emergence. Proc Natl Acad Sci U S A 99:10078–10083. doi: 10.1073/pnas.152298499. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Fittipaldi N, Beres SB, Olsen RJ, Kapur V, Shea PR, Watkins ME, Cantu CC, Laucirica DR, Jenkins L, Flores AR, Lovgren M, Ardanuy C, Liñares J, Low DE, Tyrrell GJ, Musser JM. 2012. Full-genome dissection of an epidemic of severe invasive disease caused by a hypervirulent, recently emerged clone of group A streptococcus. Am J Pathol 180:1522–1534. doi: 10.1016/j.ajpath.2011.12.037. [DOI] [PubMed] [Google Scholar]
  • 31.Green NM, Zhang S, Porcella SF, Nagiec MJ, Barbian KD, Beres SB, LeFebvre RB, Musser JM. 2005. Genome sequence of a serotype M28 strain of group a streptococcus: potential new insights into puerperal sepsis and bacterial disease specificity. J Infect Dis 192:760–770. doi: 10.1086/430618. [DOI] [PubMed] [Google Scholar]
  • 32.Nasser W, Beres SB, Olsen RJ, Dean MA, Rice KA, Long SW, Kristinsson KG, Gottfredsson M, Vuopio J, Raisanen K, Caugant DA, Steinbakk M, Low DE, McGeer A, Darenberg J, Henriques-Normark B, Van Beneden CA, Hoffmann S, Musser JM. 2014. Evolutionary pathway to increased virulence and epidemic group A streptococcus disease derived from 3,615 genome sequences. Proc Natl Acad Sci U S A 111:E1768–E1776. doi: 10.1073/pnas.1403138111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Smoot JC, Barbian KD, Van Gompel JJ, Smoot LM, Chaussee MS, Sylva GL, Sturdevant DE, Ricklefs SM, Porcella SF, Parkins LD, Beres SB, Campbell DS, Smith TM, Zhang Q, Kapur V, Daly JA, Veasy LG, Musser JM. 2002. Genome sequence and comparative microarray analysis of serotype M18 group A streptococcus strains associated with acute rheumatic fever outbreaks. Proc Natl Acad Sci U S A 99:4668–4673. doi: 10.1073/pnas.062526099. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Sumby P, Porcella SF, Madrigal AG, Barbian KD, Virtaneva K, Ricklefs SM, Sturdevant DE, Graham MR, Vuopio-Varkila J, Hoe NP, Musser JM. 2005. Evolutionary origin and emergence of a highly successful clone of serotype M1 group a streptococcus involved multiple horizontal gene transfer events. J Infect Dis 192:771–782. doi: 10.1086/432514. [DOI] [PubMed] [Google Scholar]
  • 35.Zhu L, Olsen RJ, Nasser W, Beres SB, Vuopio J, Kristinsson KG, Gottfredsson M, Porter AR, DeLeo FR, Musser JM. 2015. A molecular trigger for intercontinental epidemics of group A streptococcus. J Clin Invest 125:3545–3559. doi: 10.1172/JCI82478. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Carapetis JR, Steer AC, Mulholland EK, Weber M. 2005. The global burden of group A streptococcal diseases. Lancet Infect Dis 5:685–694. doi: 10.1016/S1473-3099(05)70267-X. [DOI] [PubMed] [Google Scholar]
  • 37.Martin PR, Høiby EA. 1990. Streptococcal serogroup A epidemic in Norway 1987–1988. Scand J Infect Dis 22:421–429. doi: 10.3109/00365549009027073. [DOI] [PubMed] [Google Scholar]
  • 38.Muotiala A, Seppälä H, Huovinen P, Vuopio-Varkila J. 1997. Molecular comparison of group A streptococci of T1M1 serotype from invasive and noninvasive infections in Finland. J Infect Dis 175:392–399. doi: 10.1093/infdis/175.2.392. [DOI] [PubMed] [Google Scholar]
  • 39.Musser JM, Krause RM. 1998. The revival of group A streptococcal diseases, with a commentary on staphylococcal toxic shock syndrome, p 185–218. In Krause RM (ed), Emerging infections. Academic Press, New York, NY. [Google Scholar]
  • 40.O’Brien KL, Beall B, Barrett NL, Cieslak PR, Reingold A, Farley MM, Danila R, Zell ER, Facklam R, Schwartz B, Schuchat A. 2002. Epidemiology of invasive group a streptococcus disease in the United States, 1995–1999. Clin Infect Dis 35:268–276. doi: 10.1086/341409. [DOI] [PubMed] [Google Scholar]
  • 41.Schwartz B, Facklam RR, Breiman RF. 1990. Changing epidemiology of group A streptococcal infection in the USA. Lancet 336:1167–1171. doi: 10.1016/0140-6736(90)92777-F. [DOI] [PubMed] [Google Scholar]
  • 42.Sharkawy A, Low DE, Saginur R, Gregson D, Schwartz B, Jessamine P, Green K, McGeer A, Ontario Group A Streptococcal Study Group . 2002. Severe group A streptococcal soft-tissue infections in Ontario: 1992–1996. Clin Infect Dis 34:454–460. doi: 10.1086/338466. [DOI] [PubMed] [Google Scholar]
  • 43.Stevens DL, Tanner MH, Winship J, Swarts R, Ries KM, Schlievert PM, Kaplan E. 1989. Severe group A streptococcal infections associated with a toxic shock-like syndrome and scarlet fever toxin A. N Engl J Med 321:1–7. doi: 10.1056/NEJM198907063210101. [DOI] [PubMed] [Google Scholar]
  • 44.Strömberg A, Romanus V, Burman LG. 1991. Outbreak of group A streptococcal bacteremia in Sweden: an epidemiologic and clinical study. J Infect Dis 164:595–598. doi: 10.1093/infdis/164.3.595. [DOI] [PubMed] [Google Scholar]
  • 45.Bricker AL, Carey VJ, Wessels MR. 2005. Role of NADase in virulence in experimental invasive group A streptococcal infection. Infect Immun 73:6562–6566. doi: 10.1128/IAI.73.10.6562-6566.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Turner CE, Abbott J, Lamagni T, Holden MT, David S, Jones MD, Game L, Efstratiou A, Sriskandan S. 2015. Emergence of a new highly successful acapsular group A streptococcus clade of genotype emm89 in the United Kingdom. MBio 6:e00622-15. doi: 10.1128/mBio.00622-15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Vos M, Didelot X. 2009. A comparison of homologous recombination rates in bacteria and archaea. ISME J 3:199–208. doi: 10.1038/ismej.2008.93. [DOI] [PubMed] [Google Scholar]
  • 48.Croucher NJ, Page AJ, Connor TR, Delaney AJ, Keane JA, Bentley SD, Parkhill J, Harris SR. 2015. Rapid phylogenetic analysis of large samples of recombinant bacterial whole genome sequences using Gubbins. Nucleic Acids Res 43:e15. doi: 10.1093/nar/gku1196. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Walker MJ, Barnett TC, McArthur JD, Cole JN, Gillen CM, Henningham A, Sriprakash KS, Sanderson-Smith ML, Nizet V. 2014. Disease manifestations and pathogenic mechanisms of group a streptococcus. Clin Microbiol Rev 27:264–301. doi: 10.1128/CMR.00101-13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Zhu L, Olsen RJ, Nasser W, de la Riva Morales I, Musser JM. 2015. Trading capsule for increased cytotoxin production: contribution to virulence of a newly emerged clade of emm89 Streptococcus pyogenes. mBio 6:e01378-15. doi: 10.1128/mBio.01378-15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Coye LH, Collins CM. 2004. Identification of SpyA, a novel ADP-ribosyltransferase of Streptococcus pyogenes. Mol Microbiol 54:89–98. doi: 10.1111/j.1365-2958.2004.04262.x. [DOI] [PubMed] [Google Scholar]
  • 52.Hoff JS, DeWald M, Moseley SL, Collins CM, Voyich JM. 2011. SpyA, a C3-like ADP-ribosyltransferase, contributes to virulence in a mouse subcutaneous model of Streptococcus pyogenes infection. Infect Immun 79:2404–2411. doi: 10.1128/IAI.01191-10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.McCormick JK, Pragman AA, Stolpa JC, Leung DY, Schlievert PM. 2001. Functional characterization of streptococcal pyrogenic exotoxin J, a novel superantigen. Infect Immun 69:1381–1388. doi: 10.1128/IAI.69.3.1381-1388.2001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Beres SB, Musser JM. 2007. Contribution of exogenous genetic elements to the group A streptococcus metagenome. PLoS One 2:e800. doi: 10.1371/journal.pone.0000800. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Bessen DE, McShan WM, Nguyen SV, Shetty A, Agrawal S, Tettelin H. 2015. Molecular epidemiology and genomics of group A streptococcus. Infect Genet Evol 33:393–418. doi: 10.1016/j.meegid.2014.10.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Carroll RK, Shelburne SA III, Olsen RJ, Suber B, Sahasrabhojane P, Kumaraswami M, Beres SB, Shea PR, Flores AR, Musser JM. 2011. Naturally occurring single amino acid replacements in a regulatory protein alter streptococcal gene expression and virulence in mice. J Clin Invest 121:1956–1968. doi: 10.1172/JCI45169. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Chaussee MS, Sylva GL, Sturdevant DE, Smoot LM, Graham MR, Watson RO, Musser JM. 2002. Rgg influences the expression of multiple regulatory loci to coregulate virulence factor expression in Streptococcus pyogenes. Infect Immun 70:762–770. doi: 10.1128/IAI.70.2.762-770.2002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Graham MR, Smoot LM, Migliaccio CA, Virtaneva K, Sturdevant DE, Porcella SF, Federle MJ, Adams GJ, Scott JR, Musser JM. 2002. Virulence control in group A streptococcus by a two-component gene regulatory system: global expression profiling and in vivo infection modeling. Proc Natl Acad Sci U S A 99:13855–13860. doi: 10.1073/pnas.202353699. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Horstmann N, Sahasrabhojane P, Suber B, Kumaraswami M, Olsen RJ, Flores A, Musser JM, Brennan RG, Shelburne SA III. 2011. Distinct single amino acid replacements in the control of virulence regulator protein differentially impact streptococcal pathogenesis. PLoS Pathog 7:e1002311. doi: 10.1371/journal.ppat.1002311. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.McIver KS, Scott JR. 1997. Role of mga in growth phase regulation of virulence genes of the group A streptococcus. J Bacteriol 179:5178–5187. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Karaky NM, Araj GF, Tokajian ST. 2014. Molecular characterization of Streptococcus pyogenes group A isolates from a tertiary hospital in Lebanon. J Med Microbiol 63:1197–1204. doi: 10.1099/jmm.0.063412-0. [DOI] [PubMed] [Google Scholar]
  • 62.Luca-Harari B, Darenberg J, Neal S, Siljander T, Strakova L, Tanna A, Creti R, Ekelund K, Koliou M, Tassios PT, van der Linden M, Straut M, Vuopio-Varkila J, Bouvet A, Efstratiou A, Schalén C, Henriques-Normark B, Strep-EURO Study Group, Jasir A. 2009. Clinical and microbiological characteristics of severe Streptococcus pyogenes disease in Europe. J Clin Microbiol 47:1155–1165. doi: 10.1128/JCM.02155-08. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Olafsdottir LB, Erlendsdóttir H, Melo-Cristino J, Weinberger DM, Ramirez M, Kristinsson KG, Gottfredsson M. 2014. Invasive infections due to Streptococcus pyogenes: seasonal variation of severity and clinical characteristics, Iceland, 1975 to 2012. Euro Surveill 19:5–14. doi: 10.2807/1560-7917.ES2014.19.17.20784. [DOI] [PubMed] [Google Scholar]
  • 64.Shea PR, Ewbank AL, Gonzalez-Lugo JH, Martagon-Rosado AJ, Martinez-Gutierrez JC, Rehman HA, Serrano-Gonzalez M, Fittipaldi N, Beres SB, Flores AR, Low DE, Willey BM, Musser JM. 2011. Group A Streptococcus emm gene types in pharyngeal isolates, Ontario, Canada, 2002–2010. Emerg Infect Dis 17:2010–2017. doi: 10.3201/eid1711.110159. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Williamson DA, Morgan J, Hope V, Fraser JD, Moreland NJ, Proft T, Mackereth G, Lennon D, Baker MG, Carter PE. 2015. Increasing incidence of invasive group A streptococcus disease in New Zealand, 2002–2012: a national population-based study. J Infect 70:127–134. doi: 10.1016/j.jinf.2014.09.001. [DOI] [PubMed] [Google Scholar]
  • 66.Falaleeva M, Zurek OW, Watkins RL, Reed RW, Ali H, Sumby P, Voyich JM, Korotkova N. 2014. Transcription of the Streptococcus pyogenes hyaluronic acid capsule biosynthesis operon is regulated by previously unknown upstream elements. Infect Immun 82:5293–5307. doi: 10.1128/IAI.02035-14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Smoot JC, Korgenski EK, Daly JA, Veasy LG, Musser JM. 2002. Molecular analysis of group A streptococcus type emm18 isolates temporally associated with acute rheumatic fever outbreaks in Salt Lake City, Utah. J Clin Microbiol 40:1805–1810. doi: 10.1128/JCM.40.5.1805-1810.2002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Treviño J, Perez N, Ramirez-Peña E, Liu Z, Shelburne SA III, Musser JM, Sumby P. 2009. CovS simultaneously activates and inhibits the CovR-mediated repression of distinct subsets of group A streptococcus virulence factor-encoding genes. Infect Immun 77:3141–3149. doi: 10.1128/IAI.01560-08. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Olsen RJ, Musser JM. 2010. Molecular pathogenesis of necrotizing fasciitis. Annu Rev Pathol 5:1–31. doi: 10.1146/annurev-pathol-121808-102135. [DOI] [PubMed] [Google Scholar]
  • 70.Olsen RJ, Sitkiewicz I, Ayeras AA, Gonulal VE, Cantu C, Beres SB, Green NM, Lei B, Humbird T, Greaver J, Chang E, Ragasa WP, Montgomery CA, Cartwright J Jr, McGeer A, Low DE, Whitney AR, Cagle PT, Blasdel TL, DeLeo FR, Musser JM. 2010. Decreased necrotizing fasciitis capacity caused by a single nucleotide mutation that alters a multiple gene virulence axis. Proc Natl Acad Sci U S A 107:888–893. doi: 10.1073/pnas.0911811107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Virtaneva K, Porcella SF, Graham MR, Ireland RM, Johnson CA, Ricklefs SM, Babar I, Parkins LD, Romero RA, Corn GJ, Gardner DJ, Bailey JR, Parnell MJ, Musser JM. 2005. Longitudinal analysis of the group A streptococcus transcriptome in experimental pharyngitis in cynomolgus macaques. Proc Natl Acad Sci U S A 102:9014–9019. doi: 10.1073/pnas.0503671102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Comas I, Chakravartti J, Small PM, Galagan J, Niemann S, Kremer K, Ernst JD, Gagneux S. 2010. Human T cell epitopes of Mycobacterium tuberculosis are evolutionarily hyperconserved. Nat Genet 42:498–503. doi: 10.1038/ng.590. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Flores AR, Galloway-Peña J, Sahasrabhojane P, Saldaña M, Yao H, Su X, Ajami NJ, Holder ME, Petrosino JF, Thompson E, Margarit Y Ros I, Rosini R, Grandi G, Horstmann N, Teatero S, McGeer A, Fittipaldi N, Rappuoli R, Baker CJ, Shelburne SA. 2015. Sequence type 1 group B streptococcus, an emerging cause of invasive disease in adults, evolves by small genetic changes. Proc Natl Acad Sci U S A 112:6431–6436. doi: 10.1073/pnas.1504725112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Holt KE, Baker S, Dongol S, Basnyat B, Adhikari N, Thorson S, Pulickal AS, Song Y, Parkhill J, Farrar JJ, Murdoch DR, Kelly DF, Pollard AJ, Dougan G. 2010. High-throughput bacterial SNP typing identifies distinct clusters of Salmonella typhi causing typhoid in Nepalese children. BMC Infect Dis 10:144. doi: 10.1186/1471-2334-10-144. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Holt KE, Baker S, Weill FX, Holmes EC, Kitchen A, Yu J, Sangal V, Brown DJ, Coia JE, Kim DW, Choi SY, Kim SH, da Silveira WD, Pickard DJ, Farrar JJ, Parkhill J, Dougan G, Thomson NR. 2012. Shigella sonnei genome sequencing and phylogenetic analysis indicate recent global dissemination from Europe. Nat Genet 44:1056–1059. doi: 10.1038/ng.2369. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Schuenemann VJ, Singh P, Mendum TA, Krause-Kyora B, Jäger G, Bos KI, Herbig A, Economou C, Benjak A, Busso P, Nebel A, Boldsen JL, Kjellström A, Wu H, Stewart GR, Taylor GM, Bauer P, Lee OY, Wu HH, Minnikin DE, Besra GS, Tucker K, Roffey S, Sow SO, Cole ST, Nieselt K, Krause J. 2013. Genome-wide comparison of medieval and modern Mycobacterium leprae. Science 341:179–183. doi: 10.1126/science.1238286. [DOI] [PubMed] [Google Scholar]
  • 77.Kingsley RA, Kay S, Connor T, Barquist L, Sait L, Holt KE, Sivaraman K, Wileman T, Goulding D, Clare S, Hale C, Seshasayee A, Harris S, Thomson NR, Gardner P, Rabsch W, Wigley P, Humphrey T, Parkhill J, Dougan G. 2013. Genome and transcriptome adaptation accompanying emergence of the definitive type 2 host-restricted Salmonella enterica serovar Typhimurium pathovar. mBio 4:e00565-13. doi: 10.1128/mBio.00565-13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Konstantinidis KT, Serres MH, Romine MF, Rodrigues JL, Auchtung J, McCue LA, Lipton MS, Obraztsova A, Giometti CS, Nealson KH, Fredrickson JK, Tiedje JM. 2009. Comparative systems biology across an evolutionary gradient within the Shewanella genus. Proc Natl Acad Sci U S A 106:15909–15914. doi: 10.1073/pnas.0902000106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Oren Y, Smith MB, Johns NI, Kaplan Zeevi M, Biran D, Ron EZ, Corander J, Wang HH, Alm EJ, Pupko T. 2014. Transfer of noncoding DNA drives regulatory rewiring in bacteria. Proc Natl Acad Sci U S A 111:16112–16117. doi: 10.1073/pnas.1413272111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Philippe N, Crozat E, Lenski RE, Schneider D. 2007. Evolution of global regulatory networks during a long-term experiment with Escherichia coli. Bioessays 29:846–860. doi: 10.1002/bies.20629. [DOI] [PubMed] [Google Scholar]
  • 81.Vital M, Chai B, Østman B, Cole J, Konstantinidis KT, Tiedje JM. 2015. Gene expression analysis of E. coli strains provides insights into the role of gene regulation in diversification. ISME J 9:1130–1140. doi: 10.1038/ismej.2014.204. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82.Olsen RJ, Fittipaldi N, Kachroo P, Sanson MA, Long SW, Como-Sabetti KJ, Valson C, Cantu C, Lynfield R, Van Beneden C, Beres SB, Musser JM. 2014. Clinical laboratory response to a mock outbreak of invasive bacterial infections: a preparedness study. J Clin Microbiol 52:4210–4216. doi: 10.1128/JCM.02164-14. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Text S1 

Supplemental Materials and Methods. Download

Figure S1 

Atlases for the three emm89 reference genomes. Shown from the outermost (1st) ring to the innermost (12th) ring are the following: ring 1, megabase pairs (black); ring 2, gene or operon landmarks; rings 3 and 4, coding sequences on the forward strand (light blue) and reverse strand (dark blue); rings 5, 7, and 9, BLAST nucleotide sequence comparison with the genomes indicated in the respective indexes; rings 6, 8, and 10, distribution of SNPs for the genomes indicated in the respective indexes; ring 11, G+C relative to the mean; and ring 12, GC skew. BLAST nucleotide sequence comparisons were made between the genomes of the clade 1, 2, and 3 reference strains and with a de novo assembly of strain MGAS27450, the most phylogenetically distant emm89 outlier strain. Download

Figure S2 

Genetic relationships among emm89 reference strains, with emm1 reference strain SF370 used as the rooting outgroup. Genetic relationships among the three emm89 reference strains and the seven outlier strains are shown using emm1 reference strain SF370 as an outgroup. Relationships were inferred based on 26,371 core SNPs by neighbor-network decomposition of splits. The sequence of branching of the three numerically dominant emm89 primary clades along the evolutionary path leading to the contemporary epidemic emm89 strains is clade 1 (MGAS11027) followed by clade 2 (MGAS23530) and then epidemic clade 3 (MGAS27061). Download

Figure S3 

Potential horizontal gene transfer (HGT) region donors. The genetic relationships among the three emm89 clade reference strains and 39 strains of 18 other emm-types for which there were complete genome sequences publically available as of 10 July 2015 are shown for each of the predicted recombination blocks (RB) separating the clades. Sequences flanking the predicted recombination blocks in the strain MGAS23530 genome were used to define the corresponding regions in the other strains using BLASTn. The sequences corresponding to the predicted recombination blocks among all 42 strains were aligned using MAFFT, and relationships were inferred by neighbor-network decomposition of splits using SplitsTree. The length of the recombination block and locus tags of the genes involved are listed relative to strain MGAS23530. Of note, seven of the eight recombination blocks separating all 359 clade 1 strains from all 78 clade 2 strains share a more recent common ancestor with emm2 reference strain MGAS10270 than with reference strains of any of the other emm-types used in this comparison. Download

Figure S4 

Genetic relationships among emm89 subclade 3D strains. Genetic relationships among the 33 subclade 3D strains are shown using clade 3 reference strain MGAS27061 as the outgroup. Relationships were inferred based on 157 core SNPs by neighbor-joining using SplitsTree. All subclade 3D strains differ from all progenitor clade 3 strains by an 18 kb region of HGT involving the virulence factors SpyA and SpeJ (Table 1; recombination block 15). To constrain the inference primarily to vertically inherited SNPs, SNPs within putative regions of HGT were identified and filtered out using Gubbins. The 11 strains with the CovR (S130N) substitution branch together, indicating inheritance by descent. Similarly, all but 1 of the 6 strains with the LiaS (K214R) substitution branch together, again indicating inheritance by descent. We attribute the single LiaS (K214R) strain not branching with the others as likely being due to a few scant horizontally acquired polymorphisms that were insufficient to statistically significantly elevate the SNP density and therefore were not detected/excluded by Gubbins. Download

Figure S5 

Comparison of phages 11027.1 and 27061.1. Shown above is a percent identity plot, and below is a dot matrix alignment. The phages are similar over the 5′ first ~13 kb sequence, which includes the integrase, replication, and lytic/lysogenic regulatory genes; the phages diverge over most of the central portions encoding head and tail coat proteins and then are similar again over the 3′ last ~3 kb sequence, which encodes the secreted virulence factors streptococcal pyrogenic exotoxin C (SpeC) superantigen and the streptococcal phage DNase 1 (Spd1). The divergence in sequence between phages 11027.1 and 27061.1 means that 27061.1 did not evolve from 11027.1 through a simple single deletion event. Despite being integrated at the same genomic locus and encoding the same virulence factors, they are distinct mosaic phages. Download

Figure S6 

hasABC promoter variants. (A) hasABC promoter pattern variants identified among the emm89 clade 1 and clade 2 strains are illustrated. Patterns A and B account for 99% of the strains. Pattern B has a 38-bp deletion relative to pattern A which eliminates a putative Rho-independent terminator. In M1 strain MGAS2221, deletion of this terminator results in release of hasABC from CovR repression, resulting in enhanced capsule production. (B) Distribution of hasABC promoter variants among the clade 1 and 2 strains. (C) Distribution of hasABC promoter variants among the clade 1 and clade 2 strains. Genetic relationships among the emm89 clade 1 and clade 2 strains inferred by neighbor-joining based on 5,663 core SNPs filtered using Gubbins to exclude regions of horizontal gene transfer are illustrated. Strains are colored by promoter variant as indicated in the index. Clade 1 strains are a mix of nearly equal amounts of pattern A (weak/repressed) and pattern B (strong/derepressed) promoter variants, whereas the vast majority of clade 2 strains are of pattern A. These findings are consistent with the level of hasABC transcripts for clade 1 strain MGAS11027 being significantly greater than that for clade 2 strain MGAS23530 as determined by RNAseq. Download

Table S1 

Strains and characteristics.

Table S2 

Streptococcus pyogenes complete genome sequences.

Table S3 

RNAseq transcriptome analyses.


Articles from mBio are provided here courtesy of American Society for Microbiology (ASM)

RESOURCES