Skip to main content
Journal of Bacteriology logoLink to Journal of Bacteriology
. 2006 Sep 8;188(22):7713–7721. doi: 10.1128/JB.01043-06

Design of a Seven-Genome Escherichia coli Microarray for Comparative Genomic Profiling

Hanni Willenbrock 1, Anne Petersen 1,2, Camilla Sekse 2, Kristoffer Kiil 1, Yngvild Wasteson 2, David W Ussery 1,*
PMCID: PMC1636325  PMID: 16963574

Abstract

We describe the design and evaluate the use of a high-density oligonucleotide microarray covering seven sequenced Escherichia coli genomes in addition to several sequenced E. coli plasmids, bacteriophages, pathogenicity islands, and virulence genes. Its utility is demonstrated for comparative genomic profiling of two unsequenced strains, O175:H16 D1 and O157:H7 3538 (Δstx2::cat) as well as two well-known control strains, K-12 W3110 and O157:H7 EDL933. By using fluorescently labeled genomic DNA to query the microarrays and subsequently analyze common virulence genes and phage elements and perform whole-genome comparisons, we observed that O175:H16 D1 is a K-12-like strain and confirmed that its φ3538 (Δstx2::cat) phage element originated from the E. coli 3538 (Δstx2::cat) strain, with which it shares a substantial proportion of phage elements. Moreover, a number of genes involved in DNA transfer and recombination was identified in both new strains, providing a likely explanation for their capability to transfer φ3538 (Δstx2::cat) between them. Analyses of control samples demonstrated that results using our custom-designed microarray were representative of the true biology, e.g., by confirming the presence of all known chromosomal phage elements as well as 98.8 and 97.7% of queried chromosomal genes for the two control strains. Finally, we demonstrate that use of spatial information, in terms of the physical chromosomal locations of probes, improves the analysis.


The Escherichia coli species is a complex group of bacteria comprising several intestinal and extraintestinal pathogroups as well as commensal bacteria that are normal inhabitants of the intestinal tract of all warm-blooded animals and humans. Shiga toxin-producing E. coli (STEC) has emerged as an important food-borne pathogen causing diarrhea, hemorrhagic colitis, and hemolytic uremic syndrome. Healthy ruminants such as cattle and sheep are regarded as the primary reservoir of STEC, which may be pathogenic to humans depending on the genomic content and combination of pathogenicity factors.

The Shiga toxins (Stx) are the main pathogenicity factors of STEC. Stx-encoding genes (stx) are located on lamboid bacteriophages known as stx phages (17, 30). The stx phages are not only passive vectors for the dissemination of stx but also genetic entities where the characteristics of the phage itself may influence toxin production and, thus, virulence of the host bacterium (35, 36). Dissemination of stx genes by transduction is the most likely mechanism for intra- and intergenic spread of stx and subsequent development of new STEC. The host range of stx phages is highly variable, and phage transduction into E. coli and Shigella strains has been shown in different laboratory and animal experiments (1, 12, 16, 28, 34). Evidence for transduction of the bacteriophage φ3538 (stx2::cat) from E. coli O157:H7 3538 (Δstx2::cat) (28) has been shown in porcine loops (34) and recently by feeding sheep with E. coli O157:H7 3538 (Δstx2::cat) (C. Sekse, H. Solheim, A. M. Urdahl, Y. Wasteson, et al., unpublished data). This latter experiment resulted in the isolation of a transductant, E. coli O175:H16 D1, from sheep feces. Consequently, E. coli O157:H7 3538 (Δstx2::cat) and E. coli O175:H16 D1 both contain φ3538 (Δstx2::cat), a detoxified derivative of an stx2 phage from a human E. coli O157:H7 type strain in which most of the stx2 is replaced by a chloramphenicol acetyltransferase gene, cat (28). However, little is known about host specificity of the stx phages and similarities and differences of E. coli donor and recipient strains taking part in the transduction event.

Genome sizes among natural isolates of E. coli vary considerably, ranging over more than a million base pairs (4). Furthermore, substantial diversity and genetic polymorphism exists even within the set of “core genes” found in most E. coli genomes (3, 8, 10, 20). Comparative genomic profiling using microarray chips designed to cover entire genomes is one strategy to obtain information about the variability between different strains of the same species and indications of horizontal gene transfer (3, 10, 21). Many commercial chips contain oligonucleotides from only one genome, such as the Campylobacter jejuni and Streptococcus pneumoniae chips and the E. coli K-12 chip (Ocimum Biosolutions, Affymetrix). The new E. coli Genome 2.0 array from Affymetrix covers four genomes: K-12 MG1655 and three pathogenic E. coli strains (CFT073 and two O157:H7-type strains). With at least seven E. coli genome sequences now publicly available, it is possible to design high-density microarrays covering all seven of the fully sequenced genomes, in addition to selected genes for virulence factors, plasmids, phages, and mobile elements.

High-density oligonucleotide arrays provide large amounts of data. Consequently, automated analysis tools are necessary to identify probes corresponding to the presence or absence of specific genomic segments. Comparative genomic DNA hybridization experiments of bacterial genomes typically use either simple cutoff values to partition data points into present and absent DNA sequence segments, e.g., based on estimates from known reference hybridizations (3) or based on standard deviation estimates (11). However, the physical chromosomal position (mapping) of a probe is often ignored when analyzing this type of data. Statistical approaches for this purpose have been widely developed for copy number analyses in cancer research. These methods use statistics for partitioning probes into sets with the same copy number (corresponding to the same level of DNA). Recent advances and evaluation of their performance have demonstrated their usefulness and superiority compared to one-probe-at-a-time approaches (38).

In early 2005, when this study began, seven completely sequenced E. coli genomes were publicly available, including both pathogenic and nonpathogenic strains. These genomes vary in size from approximately 4.6 Mbp to 5.5 Mbp, and among these, there is a considerable amount of diversity as illustrated by the matrix shown in Fig. 1A, which compares the coding sequence overlap between the seven different E. coli genomes. Moreover, next to the matrix, their relatedness is illustrated by a phylogenetic tree, based on their 16S rRNAs. The low relatedness of CFT073 to the other strains may also be illustrated by several large distinct chromosomal regions that contain genes unique to the CFT073 genome compared to other E. coli genomes (Fig. 1B).

FIG. 1.

FIG. 1.

Comparison of sequenced E. coli genomes. (A) BLAST matrix comparing the seven known genomes. The diagonal (red) represents internal homologues, and the other boxes (green) show the numbers and percentages of homologues for E. coli genomes in the columns found in E. coli genomes in the rows. On the right side is a phylogenetic tree of the strains based on alignment of 16S rRNAs. (B) BLAST atlas comparing the seven sequenced E. coli genomes. Here, the CFT073 genome is used as a reference, and for each gene in this genome, the best match in the other genomes is plotted in the various circles.

Here we describe the design and use of a high-density oligonucleotide microarray covering seven sequenced E. coli genomes as well as several sequenced E. coli plasmids, bacteriophages, pathogenicity islands, and virulence genes. The performance of this microarray is evaluated, and its utility is illustrated for the hybridization of genomic DNA in order to compare two uncharacterized E. coli strains which have not been sequenced with the seven known, sequenced E. coli strains. Recent advances in analysis of genomic DNA hybridization data were exploited. In particular, the physical mapping information was used to classify genes detected in the hybridization data into present and absent chromosomal segments.

MATERIALS AND METHODS

In this paper, we distinguish between the sequenced E. coli strains for which probes were designed on our custom-made microarray chip and the genomic DNA from E. coli experimental strains that were actually hybridized to the custom-designed microarrays.

Microarray probe design.

For probe design, the following sequences were considered (Table 1): whole-genome sequences of the seven E. coli strains, K-12 MG1655 (6), K-12 W3110 (14), O157:H7 EDL933 (24), O157:H7 RIMD0509952 (15), CFT073 (37), 042 (Sanger Institute, unpublished), and E2348/69 (Sanger Institute, unpublished). These strains will be referred to as MG1655, W3110, EDL933, RIMD0509952, CFT073, 042, and E2348, respectively. Additionally, 104 E. coli genes involved in virulence (8), 39 E. coli bacteriophages, 29 E. coli plasmids, three genomic islands from E. coli strain Nissle 1917 (13), and four pathogenic islands from E. coli strain 536 (9, 29) were extracted from GenBank release 146 (see our supplementary information at http://www.cbs.dtu.dk/∼hanni/Ecolichip1 for a detailed list).

TABLE 1.

Overview of known sequenced E. coli genomes considered for the microarray probe design

Strain Isolate Size (bp) No. of EasyGene genes No. of probes Reference
K-12 MG1655 4,639,675 4,122 141,483 6
K-12 W3110 4,641,433 4,153 141,285 14
O157:H7 EDL933 5,528,445 4,990 139,445 24
O157:H7 RIMD0509952 5,498,450 4,986 141,691 15
CFT073 5,231,428 4,653 127,261 37
O42 5,241,977 4,607 130,869 Sanger Institute, unpublished
E2348/69 5,074,835 4,599 124,103 Sanger Institute, unpublished

The probe design software OligoWiz (http://www.cbs.dtu.dk/services/OligoWiz2) (18a) was used to place probes within both unique areas and conserved areas of sequences shared by two or more open reading frames predicted by EasyGene (http://www.cbs.dtu.dk/services/EasyGene) (16a). Conservation scores for aligned sequences were used by OligoWiz to place probes in the most conserved areas. Additional probes were placed in the 200-bp upstream regions of E. coli MG1655. A total of 271,693 E. coli-specific probes were designed based on these sequences.

E. coli experimental strains and culture conditions.

Experimentally, we examined the four E. coli strains, W3110 (14), EDL933 (19), O157:H7 3538 (Δstx2::cat) (referred to in the following as strain 3538) (28), and O175:H16 D1 (referred to in the following as strain D1) (Sekse et al., unpublished), and bacteriophage φ3538 (Δstx2::cat). The strains were grown overnight in Luria-Bertani broth with continuous agitation (26), and DNA was isolated using the QIAGEN Genomic Tip 500/G (QIAGEN, Hilden, Germany) and the Genomic DNA buffer set (QIAGEN). Independent triplicates of genomic DNA from each strain were prepared according to the manufacturer's protocol. φ3538 (Δstx2::cat) was induced from E. coli 3538 (Δstx2::cat) with mitomycin C, and DNA was extracted and purified as described by Muniesa et al. (18). Independent duplicates of the phage DNA were prepared.

Microarray labeling and hybridization.

Seven micrograms of genomic DNA was fragmented with 0.7 U of DNase I (Amersham Biosciences) for 10 to 12 min at 37°C in 1× One-Phor All Plus buffer (Amersham Biosciences) to obtain fragments of 50 to 200 bp. Fragmented DNA was labeled according to the manufacturer's instructions (Affymetrix Inc.; http://www.affymetrix.com/Auth/support/downloads/manuals/expression_s3_manual.pdf) for terminal labeling of fragmented cDNA derived from mRNA for prokaryotic arrays. The labeled DNA was hybridized to custom-made NimbleExpress arrays (Affymetrix) for 15 to 17 h at 45°C. Standard protocols from Affymetrix for hybridization, washing, and staining were followed using a hybridization oven, a Fluidics Station 450, and a GeneChip Scanner 3000 (Affymetrix).

Data analysis.

Exact sequence matching was used to map each probe to specific chromosomal locations in the seven E. coli design genomes and to specific locations within the 39 bacteriophage elements, 26 EDL933 or MG1655 genomic phage elements, four pathogenicity islands, and 104 virulence genes. In the subsequent data analysis, a position-dependent segmentation algorithm was employed to partition data points into present and absent sequence segments. For this, we used circular binary segmentation (22) as implemented in DNAcopy developmental version 1.2.1 available for the R statistical language (http://www.bioconductor.org/). As recommended by those authors, the data were first smoothed and subsequently segmented. Segmentation was followed by merging the output with MergeLevels (38). In cases where the algorithm was not able to find an optimal threshold, the threshold was fixed at the median absolute difference between segmented values assigned by DNAcopy and observed log2 intensities.

For the analysis of specific chromosomal genes, phage elements, and virulence genes, only genes or phage elements to which at least five probes mapped were considered. Log2 intensities were analyzed using the above-described segmentation approach. For chromosomal genes, it was safe to assume that a majority of them were present. Thereby, the present level was determined as the median value of merged segment means. For the analyses, segments with mean values at or above the level closest to the median for experimental strains and to the median of probes located in the known BP-933W phage sequence for φ3538 (Δstx2::cat) experiments were classified as present [BP-933W is the known sequenced equivalent of φ3538 (Δstx2::cat)]. Chromosomal genes were considered present if at least two of the three replicate experiments had present probes spanning at least 90% of the covered gene sequence. Virulence genes and phage elements were inspected visually if they met one of the following three criteria in at least one analyzed sample: (i) at least 10% of sequence in present segments, (ii) a continuous segment spanning at least 100 bp, or (iii) at least 5% of present probes in the largest segment.

Hierarchical cluster analysis was based on measurements for all probes using Pearson correlation distances and complete linkage. To reduce experimental data for replicate experiments into one set of probe values for each experimental strain, a one-sided Student's t test was used to estimate a P value between 0 and 1 for each probe, where a P value close to 0 corresponded to a probe being significantly below the median intensity for the three replicate experiments for a given experimental strain and, consequently, significantly absent. Corresponding sets of theoretical binary P values of either 0 or 1 were constructed for each of the seven known E. coli strains, where 0 corresponded to no match anywhere in the sequenced genome and 1 corresponded to at least one match.

Atlases were created using the GeneWiz software (23). The BLAST atlases were constructed as described previously (31). Common E. coli genes as well as strain-specific genes were identified by BLASTP version 2.2.11 (2), using 1e-10 as the E-value cutoff and a minimum alignment ratio of 0.75 (the alignment length divided by the length of the longest compared gene).

Microarray accession number.

The microarray data have been deposited in the Gene Expression Omnibus database (http://www.ncbi.nlm.nih.gov/geo/) with the series accession number GSE4690.

RESULTS AND DISCUSSION

Visualization.

Probe intensities were visualized in whole-genome hybridization atlases, as shown in Fig. 2, for each of the seven known E. coli genomes considered in this study. Probes were mapped to each of the seven fully sequenced E. coli strains by sequence similarity to the known sequence, and the resulting probe coverage patterns are visible in the innermost circle. The probes appeared well distributed for all strains, while several distinct regions existed for individual strains. Corresponding median intensities were visualized for each experimental E. coli strain (second to fifth circles) as well as φ3538 (Δstx2::cat) phage experiments (outermost circle). It was possible to identify true distinct regions while neglecting gaps in the outer circles that were due to poor probe coverage. This allowed us to identify areas unique to each experimental strain. For instance, all experimental strains were missing large portions of the CFT073 genome at various sites (Fig. 2C). As expected from the comparison to known sequenced genomes in Fig. 1B, these regions were also missing in the two known E. coli strains included as control experiments (W3110 and EDL933).

FIG. 2.

FIG. 2.

Hybridization atlases, visualizing median probe intensities for the four experimental strains and phage φ3538 (Δstx2::cat), mapped to the seven known E. coli genomes. A region of extremely high similarity is clearly visible in the atlases for both E. coli O157:H7-type strains (red outermost circles in panels D and E). Log intensities were median centered at 1. See our supplementary material at the website http://www.cbs.dtu.dk/∼hanni/Ecolichip1 for full-size atlases.

Many probes were unique for individual strains, as evidenced by several gaps in the measured intensities. For example, EDL933 mapped to W3110 had gaps, while W3110 probe intensities covered the entire W3110 genome (Fig. 2B). Moreover, both intensity patterns for the two control strains, EDL933 and W3110, closely resembled their corresponding probe coverage patterns, as expected (Fig. 2B and E).

Experimental strains D1 and 3538 (Δstx2::cat) possess the same bacteriophage, φ3538 (Δstx2::cat), which has been transferred from strain 3538 (Δstx2::cat) to strain D1 (Sekse et al., unpublished). This phage is very similar to the BP-933W phage element in E. coli EDL933 located at Mbp ∼1.33 to ∼1.39, and a region of extremely high similarity is clearly visible in the atlases for both E. coli O157:H7-type strains (Fig. 2D and E). A zoom of the BP-933W phage area on EDL933 clarifies the closer resemblance of phage φ3538 (Δstx2::cat) with the corresponding phage element from 3538 (Δstx2::cat) and D1, rather than with the BP-933W phage element from E. coli EDL933 (Fig. 3).

FIG. 3.

FIG. 3.

Zoom of the BP-933W phage area on EDL933, with known genes indicated. Intensity measurements for EDL933 experiments (C) were clearly as expected from the probe coverage pattern (G), and both the experimental strains 3538 (Δstx2::cat) (C) and D1 (F) have intensity patterns clearly similar to that of φ3538 (Δstx2::cat) (A).

Strain comparison.

To investigate how the E. coli D1, W3110, 3538 (Δstx2::cat), and EDL933 strains were related to each other, an unsupervised cluster analysis of all three replicate experiments for each of the four experimental strains was performed, based on intensities for all probes on the microarray, as shown in Fig. 4A. Here, D1 appears more closely related to W3110 than to the other experimental strains, while 3538 clusters with EDL933. Since the 3538 (Δstx2::cat) strain had been serotyped as O157:H7 (same as EDL933), we expected these two to be more closely related to each other than to the other experimental strains.

FIG. 4.

FIG. 4.

(A) Hierarchical cluster analysis based on all probe intensities for the three replicate experiments for each of the four experimental E. coli strains. (B) Cluster analysis based on continuous P values between 0 and 1 (t test) for the four experimental E. coli strains (indicated in bold and with the postfix “exp”) and binary values of either 0 or 1 based on theoretical probe absence or presence for all seven sequenced E. coli strains considered in this study. Both cluster analyses are based on Pearson correlation distances and complete linkage.

Because experimental data to characterize the strains further with regard to their resemblance to other E. coli strains were not available, we attempted to construct theoretical data, based on all seven known E. coli strains considered in this study (see Materials and Methods for details). In this cluster analysis, control strains clustered as expected from their phylogenetic tree based on their 16S rRNAs (Fig. 1A), although experimental noise was significant. Thus, based on experimentally determined probe values, the K-12-type strains (and O157:H7-type strains) were more closely clustered than with corresponding theoretical strains (Fig. 4B). However, since the two K-12 strains, MG1655 and W3110, are almost indistinguishable in terms of their genomic sequence, this result was expected. Moreover, this analysis confirms that D1 is much more related to K-12 strains than to other known strains, such as E2348 and CFT073.

Analysis of strain D1 and strain 3538 genes.

Among the seven E. coli strains used for chip design, 3,475 genes were found to be in common by BLAST analysis (the complete list may be found in our supplementary material at http://www.cbs.dtu.dk/∼hanni/Ecolichip1). Of these E. coli “core” genes, ∼3,100 were identified in D1 and 3538 samples, indicating that the D1 and 3538 strains have slightly different subsets of E. coli core genes than the seven E. coli design strains. This is consistent with the observation that the number of E. coli core genes tends to decrease as the full genomic sequences of new E. coli strains continue to become available (H. Tipmann and D. W. Ussery, unpublished results). A thorough discussion of E. coli core genes will be presented elsewhere, since there are now at least 20 sequenced E. coli genomes available for such an analysis (5).

Among noncore genes, we searched for genes specific to each of the seven E. coli design strains, where genes specific to either the K-12-type or the O157:H7-type strains were combined into lists of K-12- and O157-specific genes (i.e., present in either W3110, MG1655, or both but differing from non-K-12 strains, or present in either EDL933, RIMD0509952, or both but differing from non-O157:H7 strains).

D1 genes specific to the seven E. coli design strains were analyzed further. A total of 150 K-12-specific genes were found, supporting the previous finding that D1 resembles the K-12 strains. Furthermore, the finding of 210 genes specific to O157:H7 indicates that D1 has acquired many O157:H7-specific genes, in accordance with the known transfer of the 3538 phage. Although D1 has many O157:H7-specific genes, it has much less than the known O157:H7-type strain E. coli 3538, for which a total of 543 genes specific to O157:H7 were identified.

Identified D1 and 3538 genes specific to the seven E. coli design strains were annotated by BLASTP comparison to the National Center for Biotechnology Information's nonredundant database (http://www.ncbi.nlm.nih.govhttp://www.ncbi.nlm.nih.gov). Predicted genes, for which a reliable match was found, were examined more closely (refer to our supplementary information at http://www.cbs.dtu.dk/∼hanni/Ecolichip1 for a detailed list). For the D1 genes specific to the seven E. coli design strains, we identified a large number of O157:H7 chromosomal phage genes (discussed further in “Benchmarking,” below), while the majority of K-12-specific genes were unrelated to pathogenicity, e.g., genes in the phenylacetic acid degradation operon, genes involved in energy/metabolism, and membrane proteins. The CFT073-specific genes mainly consisted of genes involved in metabolism (e.g., pyruvate dehydrogenase) or translation/transcription (e.g., rpoC, RpoD, and DNA polymerase 1). Among the 042-specific genes were eight putative phage elements, a putative IS element, two putative transposases, and the cat gene. The latter was expected, since a similar cat gene is present in φ3538 (Δstx2::cat). Finally, a whole series of conjugal transfer proteins (7 of TrbA to TrbJ and 17 of TraB to TraQ) were identified among E2348-specific genes. These genes comprise a large section of the E2348 chromosome and are clearly visible for D1 samples in the 5-Mb region of Fig. 2F. This demonstrates that D1 is susceptible to foreign DNA and might have facilitated the uptake of the E. coli 3538 genomic phage φ3538 (Δstx2::cat). Moreover, we found that for strain 3538, all but 2 of its 30 genes specific to E. coli E2348 were transposases, indicative of elevated levels of recombination in E. coli 3538 compared to other O157:H7-type strains. This further provides a likely explanation for the observed transfer of φ3538 (Δstx2::cat) from strain 3538 to strain D1.

D1 pathogenicity.

To further characterize the experimental strains D1 and 3538 (Δstx2::cat), the data for probes covering known virulence genes and phage elements were analyzed. A minimum of five probes mapped within 96 of the 104 known virulence genes and within all 39 non-MG1655 and non-EDL933 bacteriophages and all four pathogenicity islands.

After removal of sequences absent in all samples (see Materials and Methods for details on filtering criteria), the numbers of sequences were further decreased to 21 (of 96) virulence genes, 14 (of 39) bacteriophages, and two (of four) pathogenicity islands.

Results are illustrated for these remaining virulence genes (Fig. 5) and for phage sequences and pathogenicity islands (see also supplementary Fig. S1 at the website http://www.cbs.dtu.dk/∼hanni/Ecolichip1) by which present and missing fragments were clearly visible. While W3110 had few virulence factors, EDL933 had many, including the stx genes. Based on virulence genes, D1 clustered with the K-12-type strain (W3110) as seen when clustering was based on all probe data (Fig. 4A), indicating that D1 and W3110 have more virulence genes in common with each other than with the other strains. Furthermore, as expected, EDL933 and 3538 (Δstx2::cat) have more common virulence gene segments than with the other strains.

FIG. 5.

FIG. 5.

E. coli virulence genes, as illustrated by log2 probe intensities (gray dots) overlaid with segmentation/merging results. Red lines correspond to segments identified as present in the experiment. Dark gray lines are those segments identified as not present. Only virulence genes with a segment present in at least one sample are included. Experiments are clustered according to segmentation and merging results (left). The sample is indicated to the right. Note: the order in which the virulence genes are concatenated does not signify importance.

By further analyzing the virulence genes present in strain D1, we found that it did not have any hemolysin genes (ehxA) or type III secretion genes, which are located at the locus of enterocyte effacement in E. coli O157:H7 (espA, -B, and -D and tir) and the eae gene and which were present in EDL933 and 3538 (Δstx2::cat), as expected, since they are human pathogens (enterohemorrhagic E. coli) (7, 19). Almost the complete sequence of bacteriophage V from Shigella flexneri was found in the genome of D1 and may be responsible for transferring φ3538 (Δstx2::cat) to D1, as the V bacteriophage plays an important role in serotype conversion and is associated with antigenic variation.

Although D1 has acquired genes often found in emerging pathogens, it is evident from the analysis of virulence genes that D1 is probably still a commensal E. coli and not yet a pathogen, due to its relatedness to K-12 strains. While the K-12 strain is a commensal bacterium originally found in a stool sample from a diphtheria patient in 1922, it later developed into different substrains, none of which have been reported to cause illnesses. D1 is from a stool sample from a sheep, and its serotype, O175:H16, has only been reported in the literature on a few occasions. While it can belong to a Shiga toxin-producing E. coli, no illness has been related to this serotype (25, 27, 32, 33), consistent with our findings.

Interestingly, based on the phage analysis (see our supplementary Fig. S1 at the website http://www.cbs.dtu.dk/∼hanni/Ecolichip1), the D1 genome clusters with the phage φ3538 (Δstx2::cat) and 3538 (Δstx2::cat) samples rather than with W3110 samples, in this case disregarding EDL933- and MG1655-specific genomic phage elements. This indicates that D1 shares a significant proportion of phage elements with 3538 (Δstx2::cat). Supporting this is the pattern of present segments in common for the Shiga toxin-related phage elements Stx1, Stx2-I, Stx2-II, and VT2-Sa. Especially noticeable is the fact that although the same phage elements are present for EDL933 samples, the exact pattern differs, indicating obvious divergence in the phage sequence and confirming that a transfer of the phage φ3538 (Δstx2::cat) element has taken place from 3538 (Δstx2::cat) to D1.

Hence, based on the above results, we can conclude that D1 is a nonpathogenic K-12-like strain with an increased ability to obtain foreign DNA, of which it has acquired a significant amount of 3538 (Δstx2::cat) phage elements. Nonetheless, the present analysis does not include potential virulence genes carried on plasmids but, rather, only chromosomal genes. Therefore, D1 plasmids have to be purified and analyzed in a similar fashion with regard to potential virulence genes in order to say more about their possible role in pathogenicity.

Benchmarking.

To estimate whether the above results reflected actual true biology, a number of quality issues were explored, including variability between replicate experiments and comparisons of results from control experiments to the known sequences.

First, the performance of the custom-designed DNA microarray was evaluated further by analyzing the control strains, W3110 and EDL933, after mapping them to each other. By varying the threshold cutoff for calling absence/presence on raw data, a detailed performance analysis on the probe level could be achieved (Fig. 6). In this way, it was possible to view how a gain in sensitivity (fraction of present probes that were identified) would concurrently increase the false-positives rate. The performance when using segmentation and merging was clearly above the receiver operating characteristic (ROC) curve for the simple threshold approach, indicating a superior performance and confirming that segmentation approaches improved the analysis.

FIG. 6.

FIG. 6.

ROC curve showing the performance for different analysis approaches. Black lines and triangles, W3110 samples mapped to the EDL933 genome; grey line and circles, EDL933 samples mapped to the W3110 genome. The plot shows the performance when applying a threshold to log2 intensities (black and grey lines) and when segmenting and merging log2 intensities (grey circles and black triangles). Black and grey lines indicate the sensitivity as a function of the false-positive rate (1 − specificity). Grey circles and black triangles show that the performance when using segmentation and merging was clearly above the ROC curve for the simple threshold approach.

Next, the 25 control EDL933 and MG1655 genomic phage elements were analyzed in the same way that virulence genes and non-EDL933 and non-MG1655 genomic phages were analyzed (see our supplementary Fig. S2 at the website http://www.cbs.dtu.dk/∼hanni/Ecolichip1). Analysis of these genomic phage elements confirmed the reliability of our analysis approach, as they were all identified as present in the expected experimental strain. Thus, all K-12 isolate MG1655 phage elements were identified in their full length in the K-12 isolate W3110 samples, and also all O157:H7 isolate EDL933 phage elements were identified for the EDL933 samples. Only the end fragment of CP-933R was missing in both EDL933 and 3538 (Δstx2::cat) samples. However, since this fragment was part of an unstable cryptic prophage, it may easily have been lost, since it is not useful to the bacteria.

Generally, the variability between replicate DNA samples was low, i.e., only a small fraction of genes were not found to be present or absent consistently across all replicates. For example, between replicate W3100 samples mapped to the EDL933 genome, the number of genes identified only differed by 0.9%. For the control strains, W3110 and EDL933, 98.8% and 97.7% of all genes, respectively, were identified as present in their corresponding samples. Moreover, sensitivities of 0.92 and 0.94 were obtained when confirming the presence of W3110 genes in EDL933 samples and EDL933 genes in W3110 samples, respectively, while maintaining a false discovery rate at 0.05. A closer examination of the false positives revealed that a majority of these corresponded to genes which might have been misclassified as negative by BLASTP (e.g., an E-value close to the cutoff), while false negatives were most likely falsely predicted genes. Consequently, results obtained using our seven-genome E. coli microarray platform are highly accurate and reflect true biology. Moreover, if repeated with high-quality gene annotations when they become available, the sensitivity and false discovery rate may even prove better than initially anticipated.

Acknowledgments

We thank Peter Hallin for assistance with probe design and the BLAST matrix and the Sanger Institute for providing sequence data for E. coli strains produced by the Microbial Sequencing Group at the Sanger Institute (http://www.sanger.ac.uk/Projects/Escherichia_Shigella/).

This study was supported partly by grant no. 147145 from the Research Council of Norway (A.P., C.S., and Y.W.) and by The Danish Center for Scientific Computing and The Danish Technical Research Council (H.W., K.K., and D.W.U.).

Footnotes

Published ahead of print on 8 September 2006.

REFERENCES

  • 1.Acheson, D. W., J. Reidl, X. Zhang, G. T. Keusch, J. J. Mekalanos, and M. K. Waldor. 1998. In vivo transduction with Shiga toxin 1-encoding phage. Infect. Immun. 66:4496-4498. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Altschul, S. F., T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25:3389-3402. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Anjum, M. F., S. Lucchini, A. Thompson, J. C. Hinton, and M. J. Woodward. 2003. Comparative genomic indexing reveals the phylogenomics of Escherichia coli pathogens. Infect. Immun. 71:4674-4683. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Bergthorsson, U., and H. Ochman. 1998. Distribution of chromosome length variation in natural isolates of Escherichia coi. Mol. Biol. Evol. 15:6-16. [DOI] [PubMed] [Google Scholar]
  • 5.Binnewies, T. T., Y. Motro, P. F. Hallin, O. Lund, D. Dunn, T. La, D. J. Hampson, M. Bellgard, T. M. Wassenaar, and D. W. Ussery. 2006. Ten years of bacterial genome sequencing: comparative-genomics-based discoveries. Funct. Integr. Genomics 6:165-185. [DOI] [PubMed] [Google Scholar]
  • 6.Blattner, F. R., G. Plunkett III, C. A. Bloch, N. T. Perna, V. Burland, M. Riley, J. Collado-Vides, J. D. Glasner, C. K. Rode, G. F. Mayhew, J. Gregor, N. W. Davis, H. A. Kirkpatrick, M. A. Goeden, D. J. Rose, B. Mau, and Y. Shao. 1997. The complete genome sequence of Escherichia coli K-12. Science 277:1453-1474. [DOI] [PubMed] [Google Scholar]
  • 7.Caprioli, A., S. Morabito, H. Brugere, and E. Oswald. 2005. Enterohaemorrhagic Escherichia coli: emerging issues on virulence and modes of transmission. Vet. Res. 36:289-311. [DOI] [PubMed] [Google Scholar]
  • 8.Dobrindt, U., F. Agerer, K. Michaelis, A. Janka, C. Buchrieser, M. Samuelson, C. Svanborg, G. Gottschalk, H. Karch, and J. Hacker. 2003. Analysis of genome plasticity in pathogenic and commensal Escherichia coli isolates by use of DNA arrays. J. Bacteriol. 185:1831-1840. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Dobrindt, U., G. Blum-Oehler, G. Nagy, G. Schneider, A. Johann, G. Gottschalk, and J. Hacker. 2002. Genetic structure and distribution of four pathogenicity islands (PAI I536 to PAI IV536) of uropathogenic Escherichia coli strain 536. Infect. Immun. 70:6365-6372. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Fukiya, S., H. Mizoguchi, T. Tobe, and H. Mori. 2004. Extensive genomic diversity in pathogenic Escherichia coli and Shigella strains revealed by comparative genomic hybridization microarray. J. Bacteriol. 186:3911-3921. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Gagne, S. E., R. Jensen, A. Polvi, M. Da Costa, D. Ginzinger, J. T. Efird, E. A. Holly, T. Darragh, and J. M. Palefsky. 2005. High-resolution analysis of genomic alterations and human papillomavirus integration in anal intraepithelial neoplasia. J. Acquir. Immune Defic. Syndr. 40:182-189. [DOI] [PubMed] [Google Scholar]
  • 12.Gamage, S. D., A. K. Patton, J. F. Hanson, and A. A. Weiss. 2004. Diversity and host range of Shiga toxin-encoding phage. Infect. Immun. 72:7131-7139. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Grozdanov, L., C. Raasch, J. Schulze, U. Sonnenborn, G. Gottschalk, J. Hacker, and U. Dobrindt. 2004. Analysis of the genome structure of the nonpathogenic probiotic Escherichia coli strain Nissle 1917. J. Bacteriol. 186:5432-5441. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Hayashi, K., N. Morooka, Y. Yamamoto, K. Fujita, K. Isono, S. Choi, E. Ohtsubo, T. Baba, B. L. Wanner, H. Mori, and T. Horiuchi. 21. Feb. 2006, posting date. Highly accurate genome sequences of Escherichia coli K-12 strains MG1655 and W3110. Mol. Syst. Biol. 2:2006.0007. [Epub ahead of print.] [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Hayashi, T., K. Makino, M. Ohnishi, K. Kurokawa, K. Ishii, K. Yokoyama, C. G. Han, E. Ohtsubo, K. Nakayama, T. Murata, M. Tanaka, T. Tobe, T. Iida, H. Takami, T. Honda, C. Sasakawa, N. Ogasawara, T. Yasunaga, S. Kuhara, T. Shiba, M. Hattori, and H. Shinagawa. 2001. Complete genome sequence of enterohemorrhagic Escherichia coli O157:H7 and genomic comparison with a laboratory strain K-12. DNA Res. 8:11-22. [DOI] [PubMed] [Google Scholar]
  • 16.James, C. E., K. N. Stanley, H. E. Allison, H. J. Flint, C. S. Stewart, R. J. Sharp, J. R. Saunders, and A. J. McCarthy. 2001. Lytic and lysogenic infection of diverse Escherichia coli and Shigella strains with a verocytotoxigenic bacteriophage. Appl. Environ. Microbiol. 67:4335-4337. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16a.Larsen, T. S., and A. Krogh. 2003. EasyGene—a prokaryotic gene finder that ranks ORFs by statistical significance. BMC Bioinformatics 4:21. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Miao, E. A., and S. I. Miller. 1999. Bacteriophages in the evolution of pathogen-host interactions. Proc. Natl. Acad. Sci. USA 96:9452-9454. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Muniesa, M., M. de Simon, G. Prats, D. Ferrer, H. Panella, and J. Jofre. 2003. Shiga toxin 2-converting bacteriophages associated with clonal variability in Escherichia coli O157:H7 strains of human origin isolated from a single outbreak. Infect. Immun. 71:4554-4562. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18a.Nielsen, H. B., R. Wernersson, and S. Knudsen. 2003. Design of oligonucleotides for microarrays and perspectives for design of multi-transcriptome arrays. Nucleic Acids Res. 31:3491-3496. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.O'Brien, A. D., J. W. Newland, S. F. Miller, R. K. Holmes, H. W. Smith, and S. B. Formal. 1984. Shiga-like toxin-converting phages from Escherichia coli strains that cause hemorrhagic colitis or infantile diarrhea. Science 226:694-696. [DOI] [PubMed] [Google Scholar]
  • 20.Ochman, H., and I. B. Jones. 2000. Evolutionary dynamics of full genome content in Escherichia coi. EMBO J. 19:6637-6643. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Ogura, Y., K. Kurokawa, T. Ooka, K. Tashiro, T. Tobe, M. Ohnishi, K. Nakayama, T. Morimoto, J. Terajima, H. Watanabe, S. Kuhara, and T. Hayashi. 2006. Complexity of the genomic diversity in enterohemorrhagic Escherichia coli O157 revealed by the combinational use of the O157 Sakai OligoDNA microarray and the whole genome PCR scanning. DNA Res. 13:3-14. [DOI] [PubMed] [Google Scholar]
  • 22.Olshen, A. B., E. S. Venkatraman, R. Lucito, and M. Wigler. 2004. Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics 5:557-572. [DOI] [PubMed] [Google Scholar]
  • 23.Pedersen, A. G., L. J. Jensen, S. Brunak, H. H. Staerfeldt, and D. W. Ussery. 2000. A DNA structural atlas for Escherichia coi. J. Mol. Biol. 299:907-930. [DOI] [PubMed] [Google Scholar]
  • 24.Perna, N. T., G. Plunkett III, V. Burland, B. Mau, J. D. Glasner, D. J. Rose, G. F. Mayhew, P. S. Evans, J. Gregor, H. A. Kirkpatrick, G. Posfai, J. Hackett, S. Klink, A. Boutin, Y. Shao, L. Miller, E. J. Grotbeck, N. W. Davis, A. Lim, E. T. Dimalanta, K. D. Potamousis, J. Apodaca, T. S. Anantharaman, J. Lin, G. Yen, D. C. Schwartz, R. A. Welch, and F. R. Blattner. 2001. Genome sequence of enterohaemorrhagic Escherichia coli O157:H7. Nature 409:529-533. [DOI] [PubMed] [Google Scholar]
  • 25.Pradel, N., V. Livrelli, C. De Champs, J. B. Palcoux, A. Reynaud, F. Scheutz, J. Sirot, B. Joly, and C. Forestier. 2000. Prevalence and characterization of Shiga toxin-producing Escherichia coli isolated from cattle, food, and children during a one-year prospective study in France. J. Clin. Microbiol. 38:1023-1031. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Sambrook, J., E. F. Fritsch, and T. Maniatis. 1989. Molecular cloning: a laboratory manual, 2nd ed. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y.
  • 27.Scheutz, F., T. Cheasty, D. Woodward, and H. R. Smith. 2004. Designation of O174 and O175 to temporary O groups OX3 and OX7, and six new E. coli O groups that include Verocytotoxin-producing E. coli (VTEC): O176, O177, O178, O179, O180 and O181. APMIS 112:569-584. [DOI] [PubMed] [Google Scholar]
  • 28.Schmidt, H., M. Bielaszewska, and H. Karch. 1999. Transduction of enteric Escherichia coli isolates with a derivative of Shiga toxin 2-encoding bacteriophage φ3538 isolated from Escherichia coli O157:H7. Appl. Environ. Microbiol. 65:3855-3861. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Schneider, G., U. Dobrindt, H. Bruggemann, G. Nagy, B. Janke, G. Blum-Oehler, C. Buchrieser, G. Gottschalk, L. Emody, and J. Hacker. 2004. The pathogenicity island-associated K15 capsule determinant exhibits a novel genetic structure and correlates with virulence in uropathogenic Escherichia coli strain 536. Infect. Immun. 72:5993-6001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Shaikh, N., and P. I. Tarr. 2003. Escherichia coli O157:H7 Shiga toxin-encoding bacteriophages: integrations, excisions, truncations, and evolutionary implications. J. Bacteriol. 185:3596-3605. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Skovgaard, M., L. J. Jensen, C. Friis, H. H. Stærfeldt, P. Worning, S. Brunak, and D. W. Ussery. 2002. The atlas visualisation of genome-wide information, p. 49-63. In B. Wren and N. Dorrell (ed.), Methods in microbiology, vol. 33. Academic Press, London, United Kingdom. [Google Scholar]
  • 32.Stephan, R., and L. E. Hoelzle. 2000. Characterization of Shiga toxin type 2 variant B-subunit in Escherichia coli strains from asymptomatic human carriers by PCR-RFLP. Lett. Appl. Microbiol. 31:139-142. [DOI] [PubMed] [Google Scholar]
  • 33.Stephan, R., S. Ragettli, and F. Untermann. 2000. Prevalence and characteristics of verotoxin-producing Escherichia coli (VTEC) in stool samples from asymptomatic human carriers working in the meat processing industry in Switzerland. J. Appl. Microbiol. 88:335-341. [DOI] [PubMed] [Google Scholar]
  • 34.Toth, I., H. Schmidt, M. Dow, A. Malik, E. Oswald, and B. Nagy. 2003. Transduction of porcine enteropathogenic Escherichia coli with a derivative of a Shiga toxin 2-encoding bacteriophage in a porcine ligated ileal loop system. Appl. Environ. Microbiol. 69:7242-7247. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Wagner, P. L., D. W. Acheson, and M. K. Waldor. 1999. Isogenic lysogens of diverse Shiga toxin 2-encoding bacteriophages produce markedly different amounts of Shiga toxin. Infect. Immun. 67:6710-6714. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Wagner, P. L., M. N. Neely, X. Zhang, D. W. Acheson, M. K. Waldor, and D. I. Friedman. 2001. Role for a phage promoter in Shiga toxin 2 expression from a pathogenic Escherichia coli strain. J. Bacteriol. 183:2081-2085. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Welch, R. A., V. Burland, G. Plunkett III, P. Redford, P. Roesch, D. Rasko, E. L. Buckles, S. R. Liou, A. Boutin, J. Hackett, D. Stroud, G. F. Mayhew, D. J. Rose, S. Zhou, D. C. Schwartz, N. T. Perna, H. L. Mobley, M. S. Donnenberg, and F. R. Blattner. 2002. Extensive mosaic structure revealed by the complete genome sequence of uropathogenic Escherichia coi. Proc. Natl. Acad. Sci. USA 99:17020-17024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Willenbrock, H., and J. Fridlyand. 2005. A comparison study: applying segmentation to array CGH data for downstream analyses. Bioinformatics 21:4084-4091. [DOI] [PubMed] [Google Scholar]

Articles from Journal of Bacteriology are provided here courtesy of American Society for Microbiology (ASM)

RESOURCES