Abstract
Two technologies, involving DNA microarray and optical mapping, were used to quickly assess gene content and genomic architecture of recent emergent Escherichia coli O104:H4 and related strains. In real-time outbreak investigations, these technologies can provide congruent perspectives on strain, serotype, and pathotype relationships. Our data demonstrated clear discrimination between clinically, temporally, and geographically distinct O104:H4 isolates and rapid characterization of strain differences.
TEXT
The recent combined magnitude of illness, hemolytic uremic syndrome, and death resulting from an uncommon O104:H4 Shiga toxin-producing Escherichia coli (STEC) pathogen in Europe surpassed that of previous outbreaks with known STEC etiology (http://ecdc.europa.eu/en/aboutus/organisation/Director%20Speeches/201109_MarcSprenger_STEC_ICAAC.pdf). This newly observed pathogen contains a rare combination of two virulence pathotypes involving enterohemorrhagic E. coli (EHEC) and enteroaggregative E. coli (EAEC) attributes that manifest in Shiga toxin production and adherence to intestinal epithelial cells via the production of specialized fimbrial structures, respectively (6, 11). An additional hallmark of this outbreak was the concomitant research interest in competing next-generation sequencing platforms for genomic investigation. Within days of pathogen identification, the first raw genome sequence was released using the Ion Torrent platform. Two weeks later, the first whole-genome assembly was available incorporating Illumina HiSeq data followed by subsequent efforts employing the Pacific Biosciences platform. In aggregate, the whole-genome sequencing efforts produced nearly identical de novo genome sequences from a number of outbreak isolates (3, 6, 11, 14, 17).
It is noteworthy that the utility and speed of these technologies are typically limited by initial access to bona fide outbreak isolates with sound epidemiological linkage to the clinical illness. However, from a practical applied and regulatory perspective, whole-genome sequencing and the subsequent bioinformatic identification of strain/outbreak-specific attributes are tempered by the need for rapid front end identification, subtyping, and source tracking. In the past, we have used microarray genotyping data to triage outbreak strains for further secondary analyses. For example, array data from 250 clinical and bovine isolates from the 2006 spinach-associated E. coli O157:H7 outbreak were clustered into a single major group and two outliers for mapping and sequencing (10). More recent examples include clustering of highly clonal Salmonella enterica serovar Enteritidis from an egg outbreak, S. enterica serovar Tyhimurium in peanut butter/paste, and S. enterica serovar Montevideo from pepper spice using microarray and optical mapping (unpublished data).
Here, we present a rapid genomic-scale analysis regimen for emerging food-borne outbreaks as applied to this recent O104:H4 serotype, using custom high-density microarray genotyping in concert with high-resolution optical restriction fragment mapping. Respectively, these techniques provide disparate but complementary perspectives on unique gene content and genomic architecture with spatial relationships anchored to genetic landmarks and novel insertions and deletions. Both techniques are facilitated by reference genome sequences as well as internal laboratory-generated genomic-scale databases for rapid (2- to 3-day) comparative assessments that limit de novo annotation and postbioinformatic manipulation.
Strains.
In-depth laboratory analysis was conducted on independent O104:H4 isolates associated with the German outbreak (M-GOS and C-GOS) and two additional O104:H4 isolates (RG1 and RG2) derived from patients from the Republic of Georgia in 2009. Other comparisons involved two O104:H21 isolates, MON1 and MON2, from a 1994 milk-associated outbreak in Montana (7), and an O139:H28 enterotoxigenic E. coli (ETEC) strain, E24377A (GenBank accession no. NC_009801) (13). Two in silico reference strains included EAEC 55989 (GenBank accession no. NC_011748), an O104:H4 isolate from the Central African Republic (2, 12, 18), and O157:H7 Sakai (GenBank accession no. NC_002695).
Genome-scale genotyping.
The FDA E. coli-Shigella (ECSG) custom high-density Affymetrix DNA microarray platform contains the entire gene contents derived from 31 chromosomal and 46 plasmid sequences from a diverse set of E. coli and Shigella isolates. In total, over 21,000 unique genes are represented on the ECSG array, corresponding to each of the 137,600 chromosomal genes and 2,868 plasmid-borne genes from the sequences available at the time of design. The gene content of individual strains can be determined in an absolute manner without a comparative reference strain using the Affymetrix probe set design strategy incorporating 11 perfect match and 11 mismatch probes for each gene target and a rigorous custom probe set summarization technique (9) (see the supplemental material).
Hierarchical clustering of array data from 380 temporally and geographically diverse E. coli and Shigella isolates representing several pathotypes and over 100 serotypes illustrates the broad genomic diversity that exists within this species (Fig. 1A). The branches of the dendrogram are color coded into eight clusters, and isolates of the O104 serogroup examined in this study fall within the diverse cluster 6. The O104:H4 GOS and RG isolates form a very distinct subgroup within this cluster (Fig. 1B). Similarly, the O104:H21 MON isolates are unique and distinct within this cluster (Fig. 1B). In the FDA microarray database, the closest neighbor to the German outbreak strain is the O139:H28 ETEC strain E24377A. This is the same close association reflected by the whole-genome phylogeny by Mellmann et al. (11).
Fig 1.

(A) Hierarchical clustering of RMA-summarized microarray data employing a database of 380 historical reference and outbreak strains from internal FDA collections integrated with O104 outbreak strains analyzed in this study. The resulting dendrogram resolves into eight large color-differentiated clusters. (B) Enlargement of cluster 6 region containing O104 isolates. Note branch heights, which are representative of the number of gene differences occurring between individual strains.
At the tip of the hierarchical tree, the branch lengths correspond to the number of gene differences visualized by scatter plot comparisons of strain pairs where each spot represents the robust multiarray average (RMA)-summarized probe set intensity (4). The examination of these presence/absence calls for a paired set of strains is an effective comparative genome graphic, especially when the outbreak strain is used as the reference (Fig. 2). When the array scatter plot data for isolate pairs are examined, M-GOS and C-GOS are genotypically indistinguishable and have correspondingly minimal branch lengths on the tree (Fig. 1B and 2A). Similarly, the two MON isolates have minimal branch lengths and are indistinguishable. In contrast, the RG isolates have numerous detectable gene differences, as demonstrated by their branch lengths and scatter plot comparisons (Fig. 1B and 2B). In aggregate, the pairwise scatter plots display only some 10 to 200 gene differences among these four O104:H4 strains (Fig. 2A to D). Much larger numbers of gene differences are apparent between the M-GOS strain and the more distant MON1 and E24377A isolates (Fig. 2E and F, respectively).
Fig 2.
Microarray scatter plots of pairwise comparisons demonstrating gene-level differences between strains analyzed in this study.
A descriptive analysis of the O104:H4 German strain and the two Republic of Georgia strains revealed 253 gene differences (for strain-specific comparisons, see Table S1 in the supplemental material). Of these, 97 had functional annotation, with most (>50%) of the annotated differences being phage related. A noteworthy difference in the probe signals was observed for the tetracycline and mercury resistance probe sets, which were absent in the RG2 strain. We followed this observation with PCR amplification to detect six genes (dfrA, sulI, sulII, strA, merA, and tetA) found at the selC insertion hot spot in the genome sequence of the German outbreak strain (for primer sequences, see Table S2 in the supplemental material). Primer pairs for tetA and merA failed to amplify PCR products from RG2, but generated amplicons of the correct sizes from M-GOS, C-GOS, and RG1 (data not shown).
Chromosome mapping.
Optical mapping provides a rapid, broad overview of the arrangement of a bacterial chromosome, including prominent inversions, insertions, and deletions, the content of which can be quickly surmised when comparatively assessed with reference genomes (10). Bacterial isolates were gently lysed to generate high-molecular-weight DNA that was immobilized, digested with BamHI, and stained with fluorescent dye. Restriction fragments were assembled into overlapping contiguous regions to created a closed, circular map with a lower limit of detection of 750 to 1,000 bp using the Argus optical mapping system (OpGen, Inc.; see the supplemental material).
Pairwise map comparisons to the German outbreak strain, M-GOS, revealed differences ranging from 0.4% to 75%. The Georgian strains RG1 and RG2 are 0.4% and 1.1% different from M-GOS, respectively (Fig. 3A and B) and 1% different from each other (Fig. 3C). Strain 55989, an O104:H4 EAEC isolate shown previously (11) to be closely related to M-GOS, was 6% different (Fig. 3D). In contrast, the optical maps of the O104:H21 MON1, O139:H28 E24377A, and O157:H7 Sakai isolates are 21.5%, 30.6%, and 75% different from the M-GOS map, respectively (Fig. 3E to G).
Fig 3.
Pairwise alignments of optical and in silico sequence-based reference maps. Aligned restriction fragments are green, and unaligned fragments are white. Locations of the wrbA, serX, and selC islands are indicated by the red, blue, and orange boxes, respectively. The location of the yehV island in the O104:H21 MON1 strain is shown by the yellow box. The stx1 prophage at yehV and the locus of enterocyte effacement (LEE) island at selC in the O157:H7 Sakai strain are depicted as purple and light blue boxes, respectively. Regions of difference among the M-GOS, RG1, and RG2 O104:H4 strains described in the text are indicated by the arrows.
While map differences of <10% generally reflect insertions and deletions (indels), larger map differences (>25%) contain significant amounts of syntenic variant regions (SVRs), comparably sized but unaligned fragment blocks distinct from indels. These unaligned segments usually represent slightly divergent restriction fragment profiles of otherwise similar DNA segments and account for the majority of differences in comparisons of strains with map differences greater than about 25%.
Genomic landscape hallmarks.
The optical maps of the German outbreak strain (M-GOS) and Georgian (RG1 and RG2) O104:H4 isolates all contain similar unique insertions at hallmark wrbA-serX and selC chromosomal loci that differentiate the EAEC O104:H4 cluster (our Fig. 3A and see Fig. 3 in reference 11). These insertions, in addition to a number of other differences, can be distinguished in the comparison of the M-GOS optical map and the EAEC 55989 in silico map (Fig. 3D). The German and Georgian isolates contain a large 146-kbp insertion near the wrbA locus, a site often occupied by lambdoid prophages in other pathogenic E. coli strains. Sequence data from a number of German outbreak isolates show a 61-kbp Shiga toxin 2 gene (stx2)-containing prophage at wrbA, with the remainder of the 146-kbp insertion attributable to a tellurite resistance island inserted 30 kbp downstream from wrbA at serX. This 86-kbp insertion at serX is similar to but distinct from a tellurite resistance cassette found at serX in O157:H7 strains. The third hallmark is a 48-kbp insertion found at selC, a common island integration site in several pathotypes of E. coli. Sequence data from the German outbreak identify this region as a multidrug resistance island (17).
Although the two Georgian isolates share unique insertions found in the German O104:H4 outbreak strain, they have several marquee differences that account for a 1% difference in their optical maps. Relative to M-GOS, both RG1 and RG2 have a 14-kbp deletion near thrW, the site of a number of cryptic prophages in other E. coli strains (Fig. 3A and B, arrows). In addition, there are three chromosomal differences between RG1 and RG2 (Fig. 3C, arrows). RG2 contains a 49-kbp insertion near potB and a 17-kbp insertion near argW; both loci are common prophage integration sites in other pathogenic E. coli strains. The third difference is a 9.4-kbp deletion at selC in RG2. Notably, this partial deletion of the larger selC multidrug resistance island correlates with array and PCR data for the tetracycline and mercury resistance genes missing in RG2.
Finally, although MON1 is stx2 positive (7), there is no insertion at the wrbA locus comparable to the insertions observed in the M-GOS and RG maps. Instead, MON1 contains a 53-kbp insertion near yehV, a common locus for stx phages and a likely site for the stx2 genes in this strain (Fig. 3E).
Conclusions.
The 2011 German O104:H4 outbreak strain represents a chimera with a collection of virulence attributes that are similar to those of the 2009 Georgian strains at key chromosomal loci, unlike EAEC 55989. Thus, serotyping information alone provides little to no insight into strain relatedness nor fundamental information about a strain's virulence attributes, despite its use as a primary characteristic in outbreaks. To emphasize the point, phylogenetic studies have shown that the independent acquisition of virulence genes creates pathotypes that extend across the E. coli phylogenetic landscape (15). Consequently, development of group-, clade-, or gene-specific assays for prototypic classes of pathogenic E. coli must be balanced by the need to delineate productive virulence combinations for surveillance across previously placid serogroups (1, 5).
Our comparative genomic studies demonstrate rapid identification of similar and dissimilar isolates, informative difference measures, as well as scoring of novel combinations of virulence factors and genes in prophages and pathogenicity islands. The power of our analyses involves comparative database profiling. Moreover, when employed in concert, the current value-added benefits of these two analyses also involve speed and instant annotation of genetic content. We note that in addition, the rapid generation of optical maps can be used to facilitate closure of whole-genome sequencing efforts by providing scaffolds to anchor sequence reads as a powerful tool for detecting misassemblies (8, 16). Regardless of database development (robustness) and maintenance for any high-resolution format, these techniques should prove independently useful as companions for whole-genome sequencing pipelines to save time, effort, and expense.
Supplementary Material
ACKNOWLEDGMENTS
We thank the Massachusetts Department of Public Health (MDPH) State Lab for a clinical isolate of the German sprout-associated outbreak strain (M-GOS), in addition to the CDC as an independent source for the O104:H4 MDPH strain (C-GOS) and strains from the Republic of Georgia (RG1 and -2). We also thank Peter Feng (FDA-CFSAN) for the O104:H21 isolates (MON1 and -2) and for being a direct connection to the MDPH for the M-GOS isolate. Finally, we acknowledge Joan Shields (FDA-CFSAN) for facilitating timely transfer of the O104:H4 strains from the CDC.
Seminal funding and support were received from the National Bioforensics Analysis Center and the Department of Homeland Security for microarray and optical mapping program development; we acknowledge Joseph E. LeClerc (FDA-CFSAN) for contributions to this work. This study was funded by the Center for Food Safety and Applied Nutrition.
The views presented in this article do not necessarily reflect those of the Food and Drug Administration.
Footnotes
Published ahead of print 30 December 2011
Supplemental material for this article may be found at http://aem.asm.org/.
REFERENCES
- 1. Bae WK, et al. 2006. A case of hemolytic uremic syndrome caused by Escherichia coli O104:H4. Yonsei Med. J. 47:437–439 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Bernier C, Gounon P, Le Bouguénec C. 2002. Identification of an aggregative adhesion fimbria (AAF) type III-encoding operon in enteroaggregative Escherichia coli as a sensitive probe for detecting the AAF-encoding operon family. Infect. Immun. 70:4302–4311 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Bielaszewska M, et al. 2011. Characterization of the Escherichia coli strain associated with an outbreak of haemolytic uraemic syndrome in Germany, 2011: a microbiological study. Lancet Infect. Dis. 11:671–676 [DOI] [PubMed] [Google Scholar]
- 4. Bolstad BM, Irizarry RA, Astrand M, Speed TP. 2003. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19:185–193 [DOI] [PubMed] [Google Scholar]
- 5. Bosilevac JM, Koohmaraie M. 2011. Prevalence and characterization of non-O157 Shiga toxin-producing Escherichia coli isolates from commercial ground beef in the United States. Appl. Environ. Microbiol. 77:2103–2112 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Brzuszkiewicz E, et al. 2011. Genome sequence analyses of two isolates from the recent Escherichia coli outbreak in Germany reveal the emergence of a new pathotype: entero-aggregative-haemorrhagic Escherichia coli (EAHEC). Arch. Microbiol. 193:883–891 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Feng P, Weagant SD, Monday SR. 2001. Genetic analysis for virulence factors in Escherichia coli O104:H21 that was implicated in an outbreak of hemorrhagic colitis. J. Clin. Microbiol. 39:24–28 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Giongo A, Tyler HL, Zipperer UN, Triplett EW. 2010. Two genome sequences of the same bacterial strain, Gluconacetobacter diazotrophicus PAl 5, suggest a new standard in genome sequence submission. Stand. Genomic Sci. 15:309–317 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Jackson SA, Patel IR, Barnaba T, Leclerc JE, Cebula TA. 2011. Investigating the global genomic diversity of Escherichia coli using a multi-genome DNA microarray platform with novel gene prediction strategies. BMC Genomics 12:349. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Kotewicz ML, Mammel MK, LeClerc JE, Cebula TA. 2008. Optical mapping and 454 sequencing of Escherichia coli O157:H7 isolates linked to the US 2006 spinach-associated outbreak. Microbiology 154:3518–3528 [DOI] [PubMed] [Google Scholar]
- 11. Mellmann A, et al. 2011. Prospective genomic characterization of the German enterohemorrhagic Escherichia coli O104:H4 outbreak by rapid next generation sequencing technology. PLoS One 6:e22751. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Mossoro C, et al. 2002. Chronic diarrhea, hemorrhagic colitis, and hemolytic-uremic syndrome associated with HEp-2 adherent Escherichia coli in adults infected with human immunodeficiency virus in Bangui, Central African Republic. J. Clin. Microbiol. 40:3086–3088 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Rasko DA, et al. 2008. The pangenome structure of Escherichia coli: comparative genomic analysis of E. coli commensal and pathogenic isolates. J. Bacteriol. 190:6881–6893 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Rasko DA, et al. 2011. Origins of the E. coli strain causing an outbreak of hemolytic-uremic syndrome in Germany. N. Engl. J. Med. 365:709–717 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Reid SD, Herbelin CJ, Bumbaugh AC, Selander RK, Whittam TS. 2000. Parallel evolution of virulence in pathogenic Escherichia coli. Nature 406:64–67 [DOI] [PubMed] [Google Scholar]
- 16. Reslewic S, et al. 2005. Whole-genome shotgun optical mapping of Rhodospirillum rubrum. Appl. Environ. Microbiol. 71:5511–5522 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Rohde H, et al. 2011. Open-source genomic analysis of Shiga toxin-producing E. coli O104:H4. N. Engl. J. Med. 365:718–724 [DOI] [PubMed] [Google Scholar]
- 18. Touchon M, et al. 2009. Organised genome dynamics in the Escherichia coli species results in highly diverse adaptive paths. PLoS Genet. 5:e1000344. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.



