Abstract
Escherichia coli O157, Salmonella enterica O30, and Citrobacter freundii F90 have identical O-antigen structures, as do E. coli O55 and S. enterica O50. The O-antigen gene cluster sequences for E. coli O157 and E. coli O55 have been published, and the genes necessary for O-antigen biosynthesis have been identified, although transferase genes for glycosidic linkages are only generic and have not been allocated to specific linkages. We determined sequences for S. enterica O30 and C. freundii F90 O-antigen gene clusters and compared them to the sequence of the previously described E. coli O157 cluster. We also determined the sequence of the S. enterica O50 O-antigen gene cluster and compared it to the sequence of the previously described E. coli O55 cluster. For both the S. enterica O30-C. freundii F90-E. coli O157 group and the S. enterica O50-E. coli O55 group of O antigens, the gene clusters have identical or nearly identical organizations. The two sets of gene clusters had comparable overall levels of similarity in their genes, which were lower than the levels determined for housekeeping genes for these species, which were 55 to 65% for the genes encoding glycosyltransferases and O-antigen processing proteins and 75 to 93% for the nucleotide-sugar pathway genes. Nonetheless, the similarity of the levels of divergence in the five gene clusters required us to consider the possibility that the parent gene cluster for each structure was in the common ancestor of the species and that divergence is faster than expected for the common ancestor hypothesis. We propose that the identical O-antigen gene clusters originated from a common ancestor, and we discuss some possible explanations for the increased rate of divergence that is seen in these genes.
Lipopolysaccharide is an important component of the outer membrane of gram-negative bacteria and usually consists of three distinct regions: lipid A, core oligosaccharide, and O antigen. The O antigen functions as a receptor for bacteriophages and is also important in the host immune response. It consists of a number of repeats of an oligosaccharide, the O unit, which generally has between two and six sugar residues (16). Extensive variation is seen in O-antigen structures, and this variation is determined by the nature, order, and linkage of the different sugars within the polysaccharide. For example, 186 and 54 O antigens have been documented in the Escherichia coli (including Shigella) and Salmonella enterica typing schemes, respectively (4, 12, 14), and 45 O-antigen-based serogroups have been identified in the closely related organism Citrobacter freundii (6). Indeed, many species that have been studied have 40 or more serogroups defined by their O antigens (6), clearly demonstrating the enormous diversity of these structures in gram-negative bacteria.
The genes involved in O-antigen biosynthesis are generally found on the chromosome as an O-antigen gene cluster, and in E. coli, S. enterica, and C. freundii these gene clusters generally lie between the galF and gnd genes (15). We, among others, have undertaken an extensive analysis of many O-antigen gene clusters to determine the genetic basis of O-antigen evolution. More than 20 E. coli and 7 S. enterica O-antigen gene clusters have been sequenced (17), but until now no sequence data regarding a C. freundii O-antigen gene cluster has been reported. Overall, O-antigen gene clusters have G+C contents lower than the genome average (usually less than 40% in E. coli and S. enterica, compared to the usual 51%). This atypical G+C content seen in so many O-antigen gene clusters provides strong genetic evidence that the O-antigen gene clusters were acquired, by interspecies lateral transfer, from different bacterial species—in each case a species whose genome has an average G+C content lower than that of these gram-negative bacteria. It is presumed that the extensive diversity of O antigens is a result of this lateral gene transfer. Indeed, evidence for recombination events involving O-antigen gene clusters within and between different species has been found (5, 19, 21, 22, 28).
E. coli, S. enterica, and C. freundii are closely related. However, a combination of serological and structural analyses of the corresponding O antigens has revealed that they have very few structures in common. Only three structures are found in both E. coli and S. enterica; one structure is found in E. coli O111 and S. enterica O35 (8), the second structure is found in E. coli O55 and S. enterica O50 (8, 11), and the third structure is found in E. coli O157, S. enterica O30, and C. freundii F90 (3, 13, 24). Other examples include the C. freundii O35 and S. enterica O59 O antigens, which are also identical (9).
In general, we can envisage three possible explanations for the presence of a common O antigen in two different species. The most simple explanation is that the O-antigen gene cluster was present in the common ancestral species. In this case the genomic organizations of the individual gene clusters would be similar, and the level of variation between the genes would be typical of the level of variation found in housekeeping genes. Second, the O-antigen structures may have arisen as a result of acquisition by interspecies lateral transfer of the cluster since species divergence. In this scenario the genomic organizations of the gene clusters would be similar, but the level of variation between the genes would be unrelated to the level of variation expected for typical E. coli, S. enterica, and C. freundii housekeeping genes; it would be less if the variation is among the species being considered and greater if the variation is from more divergent species. Finally, the corresponding O-antigen gene clusters could have arisen independently, in which case the genomic organization of the individual gene clusters would be different and the level of variation in the genes would generally be high.
The fact that there are very few O antigens common to the related and well-studied species E. coli and S. enterica shows that there has been extreme turnover of O antigens since species divergence. In this paper we examine the basis for the presence of identical O-antigen structures in these species and also C. freundii. The E. coli O111 and S. enterica O35 O-antigen gene clusters have been sequenced previously, and it has been proposed that the most likely reason for the presence of identical O antigens in these two organisms is that the two clusters diverged from a common ancestor (27). The E. coli O157 and E. coli O55 O-antigen gene clusters have also been sequenced previously (25, 26) (accession numbers AF061251 and AF461121), and the genes necessary for O-antigen biosynthesis have been identified. We report here additional sequencing of the S. enterica O30, C. freundii F90, and S. enterica O50 O-antigen gene clusters, which allowed analysis of the genetics of all O antigens common to E. coli and S. enterica. Below we comment on the basis for the identity of different O-antigen structures across species.
MATERIALS AND METHODS
Abbreviations.
d-Col, d-colitose; l-Fuc, l-fucose; d-Gal, d-galactose; d-Gal2NAc, 2-N-acetylgalactosamine; d-Glc, d-glucose; d-Glc2NAc, d-2-N-acetylglucosamine; d-Per, d-perosamine; d-Per4NAc, d-4-N-acetylperosamine; CA, colanic acid.
Bacterial strains.
S. enterica O30 (lab stock number M284) and S. enterica O50 (lab stock number M290) were obtained from the Institute of Medical and Veterinary Sciences, Adelaide, Australia.
C. freundii F90 (lab stock number M1972) was obtained from N. Strockbine, Centers for Disease Control and Prevention, Atlanta, Ga. (3).
Construction of random shotgun libraries.
Chromosomal DNA was prepared by using Wizard DNA preparation kits from Promega. The O-antigen clusters were PCR amplified from the chromosomal DNA by using the Expand Long Template PCR system (Roche). The primers used were 1523 (5′-ATTGTGGCTGCAGGGATCAAAGAAATC) and degenerate primer 1524 (5′-TAGTCRCGCTGNGCCTGRATYARGTTMGC) (M = A or C; N = A, C, G, or T; R = A or G; Y = C or T), which bind to the upstream galF gene and the downstream gnd gene, respectively. The PCR cycles were as follows: 94°C for 2 min; 30 cycles of 94°C for 10 s, 60°C for 30 s, and 68°C for 15 min; and then 68°C for 7 min. The PCR products were sheared by using Geneworks Hydroshear according to the manufacturer's instructions. The DNA was then purified by using a Wizard PCR DNA preparation kit (Promega) and was resuspended in 35 μl of water. Eight nanograms of DNA was subjected to T4 DNA polymerase repair and single deoxyribosyladenine tailing with a Novagen single deoxyribosyladenine tailing kit. The reaction product (85 μl) was then extracted with chloroform-isoamyl alcohol (24:1) and ligated to pGEM-T-easy (Promega) according to the manufacturer's instructions. Ligation was carried out overnight at 4°C, and the ligated DNA was precipitated and resuspended in 20 μl of water before it was electroporated into E. coli JM109 and plated on agar plates containing 5-bromo-4-chloro-3-indolyl β-d-galactopyranoside and isopropyl-β-d-1-thiogalactopyranoside (IPTG). A DNA template was prepared from the resultant colonies by using a 96-well-format Millipore plasmid DNA miniprep kit. Three microliters of DNA was sequenced with primers M13F (5′-TGTAAAACGACGGCCAGT) and M13R (5′-CAGGAAACAGCTATGAC).
Specific primers were designed to PCR amplify any regions of the DNA in which sequence was missing. Each PCR was performed in a 50-μl (total volume) mixture by using Taq polymerase (NEB) as recommended by the protocol. The PCR cycles were as follows: 94°C for 2 min; 30 cycles of 94°C for 30 s, 56°C for 30 s, and 72°C for 2 min; and 72°C for 5 min. Two microliters of the PCR products was electrophoresed on an agarose gel to check for amplified DNA and was subsequently purified by using a Wizard PCR DNA preparation kit (Promega) and resuspended in 35 μl of water. Three microliters of DNA was sequenced with the same primers used for PCR amplification.
PCR of upstream and downstream regions of S. enterica O50.
Chromosomal DNA was prepared by using Wizard DNA preparation kits from Promega. The gne gene, which encodes N-acetylglucosamine 4-epimerase (2), was PCR amplified by using primers 5278 (5′-ACAGATTGGTGATGTTCG) and 5280 (5′-GATTTCTTTGATCCCTGCAGCCAC), which bind at the 5′ end of the gne gene and in the downstream galF gene, respectively. The PCR was performed in a 50-μl (total volume) mixture by using Taq polymerase (NEB) as recommended by the protocol. The PCR cycles were as follows: 94°C for 2 min; 30 cycles of 94°C for 30 s, 56°C for 30 s, and 72°C for 2 min; and 72°C for 5 min. Two microliters of the PCR products was electrophoresed on an agarose gel to check for amplified DNA and was subsequently purified by using a Wizard PCR DNA preparation kit (Promega) and resuspended in 35 μl of water. Three microliters of DNA was separately sequenced with primers 5278 and 5280. A similar method was used for PCR amplification of the downstream regions.
Sequencing and analysis.
Sequencing was performed with an Applied Biosystems 377 automated DNA sequencer. Sequence data were assembled and analyzed by using the Australian National Genomic Information Service, which incorporates several sets of programs (A. H. Reisner, C. A. Bucholtz, J. Smelt, and S. McNeil, Proc. 26th Annu. Hawaii Int. Conf. Systems Sci., 1993). BLAST and PSI-BLAST (1) were used for searching databases, including the GenBank and Pfam protein motif databases, for possible functions. Sequence alignment and comparisons were performed by using the ClustalW program (23). The TMHMM v2.0 analysis program (http://www.cbs.dtu.dk/services/TMHMM-2.0/) was used to identify potential transmembrane segments from the amino acid sequence.
Nucleotide sequence accession numbers.
The DNA sequences of the three O-antigen gene clusters have been deposited in the GenBank database under accession numbers AY730592, AY730593, and AY730594.
RESULTS
E. coli O157, S. enterica O30, and C. freundii F90 O-antigen gene clusters.
The E. coli O157, S. enterica O30, and C. freundii F90 O antigens contain d-Per4NAc, d-Glc, l-Fuc, and d-Gal2NAc (Fig. 1A), and their O-antigen gene clusters would be expected to contain all the genes necessary for biosynthesis of the O antigen. The E. coli O157 O-antigen gene cluster has been sequenced previously and is 14.0 kb long (26). It contains 12 genes, all of which have the same transcriptional direction, and includes the genes required for the biosynthesis of both GDP-d-Per and GDP-l-Fuc, namely, manB, manC, gmd, per, and fcl (Fig. 2A). There is a putative N-acetyltransferase gene, wbdR, at the 3′ end of the O157 O-antigen gene cluster, which was presumed to be involved in the conversion of GDP-d-Per to GDP-Per4NAc. WbdR is in Pfam family PF00132, which includes N-acetyltransferases, and there is no other obvious role for it. In addition, the gene cluster contains a gmm gene. gmm is generally found in E. coli and S. enterica O-antigen gene clusters that include gmd, and Gmm is thought to control the accumulation of certain GDP-sugars (27). There are three genes that encode putative glycosyltransferases in the gene cluster (wbpN, wbpO, and wbpP), as well as the two O-antigen processing genes, wzx and wzy.
FIG. 1.
O-antigen structures. (A) E. coli O157-S. enterica O30-C. freundii F90 O-antigen structure. (B) E. coli O55-S. enterica O50 O-antigen structure.
FIG. 2.
O-antigen gene clusters. (A) E. coli O157, S. enterica O30, and C. freundii F90 O-antigen gene clusters. (B) E. coli O55 and S. enterica O50 O-antigen gene clusters. Glycosyltransferase genes whose designations begin with w are indicated by the final letter of the designation (e.g., E. coli O157 wbdN is indicated by N, and E. coli O55 wbgM is indicated by M).
By using primers specific for sequences in both the galF and gnd genes, the S. enterica O30 and C. freundii F90 O-antigen gene clusters were amplified by long-range PCR and sequenced, as described in Materials and Methods. These clusters were 14.3 and 14.2 kb long, respectively. Sequence analysis of both gene clusters predicted the presence of 11 open reading frames, all having the same transcriptional direction. Both gene clusters have the same genes and the same organization (Fig. 2A), which is nearly identical to that of the E. coli O157 O-antigen gene cluster; the only difference is that the N-acetyltransferase gene, wbdR, at the 3′ end of the E. coli O157 gene cluster is absent in the other two clusters. Thus, although both the S. enterica O30 and C. freundii F90 O antigens contain d-Per4NAc, neither of the corresponding O-antigen gene clusters contains wbdR or any other N-acetyltransferase gene. wbdR is situated downstream of manB and is adjacent to a Hinc repeat (H-rpt) remnant. H-rpts (29) have been identified in several other O-antigen gene clusters at recombination junctions, including the S. enterica D2 O-antigen gene cluster (28), and are thought to be mediators of gene lateral transfer, and a model has been proposed for this role (28). Thus, it is likely that the S. enterica O30 and C. freundii F90 O-antigen gene clusters represent the ancestral gene cluster whose corresponding O antigen contained d-Per. The acquisition of the E. coli O157 wbdR gene, which was probably mediated by the H-rpt, most likely occurred more recently and allowed the biosynthesis of GDP-d-Per4NAc. If this is the case, then there is presumably an N-acetyltransferase gene on both the S. enterica O30 and C. freundii F90 chromosomes that is involved in the conversion of GDP-d-Per to GDP-d-Per4NAc, although attempts to identify these genes by PCR and Southern blotting methods based on the sequence of wbdR were unsuccessful.
The orders of the 11 genes common to the E. coli O157, S. enterica O30, and C. freundii F90 O-antigen gene clusters are identical, indicating that the clusters have a common ancestor. If the O-antigen gene clusters were acquired via a lateral transfer event(s), the levels of similarity would be unrelated to the levels of similarity generally observed for housekeeping genes in these species. In order to ascertain the relationships of conserved housekeeping genes across the different species, we compared six housekeeping genes that have been sequenced previously in E. coli, S. enterica, and C. freundii. The levels of identity between equivalent genes in E. coli and S. enterica ranged from 86.3 to 91.8% (Table 1). This is in accordance with Sharp's observation that 93% of E. coli and S. enterica housekeeping genes have levels of identity between 76.3 and 100% (18). When either E. coli or S. enterica was compared to C. freundii, the levels of identity of the housekeeping genes were very similar, ranging from 84.5 to 89.7% (Table 1). Table 2 shows the levels of amino acid identity for the putative proteins encoded by the genes in each of the O-antigen gene clusters.
TABLE 1.
Levels of DNA identity for six E. coli, S. enterica, and C. freundii housekeeping genes
Gene | % DNA identity for:
|
||
---|---|---|---|
C. freundii and S. enterica | E. coli and S. enterica | C. freundii and E. coli | |
adk | 87.64 | 87.12 | 88.04 |
gyrB | 85.65 | 88.45 | 84.56 |
mdh | 85.39 | 86.25 | 87.38 |
metG | 88.01 | 87.78 | 88.23 |
purA | 88.49 | 87.98 | 89.7 |
recA | 88.6 | 91.37 | 89.19 |
TABLE 2.
Levels of amino acid identity for the proteins encoded in the E. coli O157, S. enterica O30, and C. freundii F90 O-antigen gene clusters and G+C content of each gene in the clusters
O-antigen gene cluster gene | % Amino acid identity for:
|
G+C content (%)
|
||||
---|---|---|---|---|---|---|
C. freundii F90 and S. enterica O30 | C. freundii F90 and E. coli 0157 | E. coli O157 and S. enterica O30 | C. freundii F90 | E. coli O157 | S. enterica O30 | |
galF | 92.0 | 90.9 | 93.6 | 50.0 | 50.5 | 51.7 |
wbdN | 62.5 | 59.4 | 61.1 | 29.8 | 29.9 | 29.3 |
wzy | 60.6 | 54.8 | 59.5 | 29.8 | 30.0 | 28.0 |
wbdO | 62.1 | 59.3 | 60.4 | 30.5 | 30.2 | 27.6 |
wzx | 63.8 | 62.5 | 69.4 | 32.7 | 31.3 | 30.6 |
per | 80.2 | 81.3 | 83.8 | 35.3 | 34.7 | 34.6 |
wbpP | 64.5 | 60.2 | 66.5 | 32.9 | 31.6 | 33.0 |
gmd | 93.2 | 87.0 | 90.8 | 45.1 | 38.9 | 55.4 |
fcl | 86.3 | 76.9 | 75.5 | 47.6 | 38.4 | 58.6 |
gmm | 65.1 | 59.4 | 59.4 | 43.6 | 39 | 56.8 |
manC | 78.2 | 78 | 80.9 | 36.2 | 37.6 | 39.2 |
manB | 87.1 | 88.6 | 87.7 | 57.2 | 54.1 | 61.0 |
vbdR | 37.5 | |||||
gnd | 95.6 | 96.4 | 97.2 | 51.8 | 50.8 | 52.0 |
Comparison of the E. coli O157, S. enterica O30, and C. freundii F90 O-antigen gene cluster genes revealed that the nucleotide-sugar biosynthesis proteins have levels of amino acid identity in the range from 59.4 to 93.2% and that the glycosyltransferases and O-antigen processing proteins, while more closely related to each other than to any other proteins in the databases, have lower levels of amino acid identity (54.8 to 69.4%) (Table 2). Since a recent lateral transfer event from the other species under consideration would have resulted in much higher levels of similarity between the genes, there is no support for such an event for these gene clusters. The two remaining alternatives are (i) that the gene cluster was present in the common ancestor of the species but that for the most part the genes diverged at a much higher rate than the housekeeping genes and (ii) that the divergence among the gene clusters took place in other species over a much longer period and the gene clusters were acquired independently by E. coli, S. enterica, and C. freundii from these different species (one of which was the common ancestor of these three species). However, despite the higher-than-expected levels of sequence divergence, there is support for the former hypothesis; the levels of divergence for each O-antigen gene are very similar for any two of the three species. This is also true for housekeeping genes, but the levels of divergence are higher for O-antigen genes. If the gene clusters were derived by lateral transfer from three more distantly related species, it would be remarkable if these species were equidistant from each other. We propose several explanations that could account for the different levels of divergence observed for the different genes of the gene clusters, which we discuss below after presentation of the data for S. enterica O50.
E. coli O55 and S. enterica O50 O-antigen gene clusters.
The S. enterica O50 and E. coli O55 O antigens contain d-Gal, d-Gal2NAc, d-Glc2NAc, and d-Col (Fig. 1B). The E. coli O55 O-antigen gene cluster (25) contains nine genes between galF and gnd, including the genes required for the initial stages of GDP-d-Col synthesis, namely, manB, manC, and gmm (manA is found elsewhere on the chromosome). The gene cluster is atypical in that col1 and col2, the genes required for the final stages of GDP-d-Col biosynthesis, are downstream of gnd (25), in a region that can be thought of as an extension of the typical O-antigen gene cluster. The E. coli O55 O-antigen gene cluster also contains four glycosyltransferase genes (wbgM, wbgN, wbgO, and wbgP) (25) and the O-antigen processing genes, wzx and wzy. Finally, immediately upstream of galF there is a gne gene for biosynthesis of UDP-d-Gal2NAc from UDP-d-Glc2NAc. Most O-antigen gene clusters include all of the genes between the galF and gnd genes. Clustering of genes in this manner is best explained by the selfish operon model (10), in which clustering confers selective benefit to genes that together have a function subject to lateral transfer. This is because being in a cluster facilitates lateral transfer as a group. O-antigen gene clusters are well known to undergo lateral transfer, and the E. coli O157 gene cluster is only one case that has been documented (22, 25). In the case of O antigens, lateral transfer occurs readily within a species, where it usually involves replacement of the preexisting O-antigen gene cluster by homologous recombination outside the gene cluster. The arrangement in this group of O antigens allows cotransfer but is not the most efficient arrangement and can be seen as an intermediate stage in the process of bringing all of the genes into a single cluster (25).
The S. enterica O50 O-antigen gene cluster was amplified by long-range PCR by using primers based on the galF and gnd genes, and it was sequenced as described in Materials and Methods. Genes both upstream and downstream of this region were also sequenced. The extended 21.5-kb S. enterica O50 O-antigen gene cluster has the same genes in the same order as the E. coli O55 O-antigen gene cluster (Fig. 2B), indicating that these clusters have a common ancestor. Similar to the situation seen with E. coli O157, S. enterica O30, and C. freundii F90, the level of similarity of genes within the E. coli O55 and S. enterica O50 O-antigen gene clusters is lower than the level of similarity for housekeeping genes. Furthermore, although all four S. enterica O55 glycosyltransferases and the O-antigen processing proteins are most similar to the proteins encoded by the E. coli O55 O-antigen gene cluster, the levels of similarity between the molecules are still relatively low, lower than the levels of similarity for the nucleotide-sugar biosynthesis proteins (Tables 3, 4, and 5). As with the E. coli O157-S. enterica O30-C. freundii F90 group, we discuss probable explanations for these observations below.
TABLE 3.
Levels of DNA identity for the GDP-l-Fuc and GDP-d-PerNAc biosynthesis genes in the E. coli O157, S. enterica O30, and C. freundii F90 O-antigen gene clusters
Gene | % DNA identity for:
|
||
---|---|---|---|
C. freundii F90 and E. coli O157 | C. freundii F90 and S. enterica O30 | S. enterica O30 and E. coli O157 | |
pcr | 73.78 | 74.88 | 77.68 |
gmd | 74.82 | 78.58 | 74.37 |
fcl | 70.97 | 76.09 | 67.84 |
gmm | 63.26 | 66.66 | 65.12 |
manC | 72.36 | 74.47 | 76.21 |
manB | 80.4 | 82.85 | 79.61 |
TABLE 4.
Levels of DNA identity for the GDP-d-Col biosynthesis genes in the E. coli O55 and S. enterica O50 O-antigen gene clusters
Gene | % DNA identity |
---|---|
gmd | 77.30 |
manC | 70.49 |
manB | 78.59 |
col1 | 86.25 |
col2 | 87.06 |
TABLE 5.
Levels of amino acid identity for the proteins encoded by the genes in the E. coli O55 and S. enterica O50 O-antigen gene clusters and G+C content of each gene
Gene | % Amino acid identity | G+C content (%)
|
|
---|---|---|---|
E. coli O55 | S. enterica O50 | ||
gne | 94.9 | 45.5 | 42.2 |
galF | 93.2 | 49.8 | 50 |
wbgM | 64.3 | 32.5 | 29.6 |
gmd | 91.7 | 43.7 | 54.8 |
fcl remnant | 56.8 | ||
gmm | 51.7 | 33.9 | 56.4 |
manC | 74.4 | 36.3 | 39.4 |
manB | 84.3 | 40.3 | 60.2 |
wbgN | 60.4 | 34.2 | 32.2 |
wzy | 55.0 | 29.5 | 26.6 |
wzx | 58.9 | 31.1 | 30.2 |
wbgO | 62.5 | 30 | 30 |
wbgP | 59.0 | 32.4 | 33.1 |
gnd | 92.9 | 47.5 | 52.2 |
col1 | 88.3 | 33.7 | 31.6 |
col2 | 92.5 | 33.7 | 34.4 |
ugd | 89.4 | 43.5 | 43.8 |
wzz | 67.8 | 44.6 | 49.5 |
The E. coli O55 and S. enterica O50 col1 and col2 genes have much higher levels of similarity to each other than do the genes in the traditional O-antigen gene cluster region between gnd and galF; however, even though they are situated outside the O-antigen gene clusters, they have a low G+C content. In contrast, the genes adjacent to col1 and col2 do not have a low G+C content. Wang et al. (25) suggested the recent acquisition, by interspecies lateral gene transfer, of the GDP-d-Col synthesis genes, and it is likely that the genes originated from a species that has a genome with a low average G+C content. E. coli O55 has an H-rpt remnant downstream of col2, which may have mediated the lateral transfer of col1 and col2. There is no H-rpt remnant in S. enterica O50, but there is a remnant transposase sequence downstream of col2. These data suggest one of two possibilities. The first possibility is that the acquisition of col1 and col2 occurred independently in E. coli O55 and S. enterica O50. However, this seems highly unlikely since the col1 and col2 genes not only have high levels of identity (94.5 to 95.9% similarity) but also are inserted at the same location in the genome. It is more probable that there was a single transfer of the col1 and col2 genes, perhaps involving either the H-rpt or the transposase. Since this transfer there have been further recombination events in one of the genomes, but the remnants of both the H-rpt and the transposase are too small to relate either in any specific way to the transfer event.
The presence of GDP-sugar synthesis pathway genes in the O-antigen gene cluster indicates that before acquisition of col1 and col2, the ancestral gene cluster included a sugar pathway for a GDP-sugar other than GDP-d-Col. Incorporation of the two GDP-d-Col synthesis genes by lateral transfer, probably mediated by H-rpt and/or a transposase, subsequently allowed synthesis of GDP-d-Col. We suggest that the ancestral GDP-sugar was l-Fuc, which is very closely related to d-Col, because in addition to manA, manB, manC, and gmd, which are common to many GDP-sugar pathways (17), there is a remnant of the GDP-fucose-specific fcl gene situated between gmd and gmm in the S. enterica O50 O-antigen gene cluster. Except for an additional transferase gene, this is the same gene order as the order of the GDP-l-Fuc biosynthesis genes in the E. coli and S. enterica CA clusters. There is no fcl gene in the E. coli O55 O-antigen gene cluster, presumably due to more extensive deletions than in S. enterica O50, which have occurred since the acquisition of the GDP-d-Col synthesis genes. It is interesting that there is no transferase gene associated with the col genes, and presumably the transferase that originally transferred Fuc now transfers Col. The product of wbgM, upstream of gnd, has similarity to GDP-l-Fuc transferases and is the most likely candidate.
Interestingly, neither the E. coli O55 nor the S. enterica O50 O-antigen gene cluster contains a complete gmm gene, as is found in all other E. coli and S. enterica clusters whose corresponding O-antigens contain a GDP-sugar that has been synthesized from GDP-mannose. However, in both clusters there is a remnant gmm gene (57% identity to O157 gmm) between gmd and manC. In S. enterica O50 this remnant gene is situated adjacent to the remnant fcl gene, and it is possible that mutation of the S. enterica O50 gmm and fcl genes resulted from a single deletion event. Similarly, it is likely that the deletion event that resulted in the mutation of E. coli O55 gmm also caused the complete deletion of the E. coli O55 fcl gene.
DISCUSSION
Very few O-antigen structures are common to E. coli, S. enterica, and/or C. freundii, and only three O antigens have been identified in both E. coli and S. enterica. These are the E. coli O111-S. enterica O35 O antigen, the E. coli O55-S. enterica O50 O antigen, and the E. coli O157-S. enterica O30-C. freundii F90 O antigen. The E. coli O111, S. enterica O35, E. coli O157, and E. coli O55 O-antigen gene cluster sequences have been published. We report here sequencing of the S. enterica O50, S. enterica O30, and C. freundii F90 O-antigen gene clusters. Thus, we now have complete gene cluster sequences for the three sets of identical O-antigen structures, and since these structures include the only known structures common to E. coli and S. enterica, this provides an opportunity for generalizations; we discuss the three structures together below. With one exception, the related gene clusters have the same number of genes and the same gene order. The exception is an acetyltransferase gene present at the 3′ end of the E. coli O157 gene cluster but not in the S. enterica O30 and C. freundii F90 gene clusters.
We first looked at the complicating issue of an interaction between the O-antigen gene clusters and the CA gene cluster. The CA gene cluster, just upstream of galF in E. coli, S. enterica, and presumably C. freundii, contains a set of GDP-Fuc pathway genes (Fig. 3). These genes have a relatively high G+C content, which interestingly is higher in E. coli K-12 than in S. enterica LT2. Of interest is the finding that in several cases the manB gene of the O-antigen gene cluster has the G+C content of the CA manB gene and a high level of sequence similarity (7, 20). This is attributed to some form of gene rearrangement that at least has the effect of a gene conversion event, in which the gene in the O-antigen gene cluster is replaced by the equivalent gene of the CA gene cluster. The same situation has been shown to apply to some gmd genes, and in the strains discussed here the findings are applicable to gmd, fcl, gmm, and manB of S. enterica O30 and O50. Each of these genes has a high G+C content (55.4 to 61.0%) (Tables 2 and 5), and the sequence is very similar to the sequences of the corresponding S. enterica CA genes (Table 6). We concluded that there were gene conversion events in which the S. enterica O30 and O50 gmd, fcl (now a remnant in S. enterica O50), gmm, and manB genes were replaced by genes from the CA gene cluster. Indeed, examination of the DNA sequence at the 3′ end of the CA and O-antigen gmm genes allowed us to identify a putative recombination site 40 bp prior to the end of the coding sequence (Fig. 4). This phenomenon also applies to the manB gene of the E. coli O157 and C. freundii gene clusters. It is interesting that the manC gene is not affected in any of the cases. The C. freundii gmd, fcl, and gmm genes may also have undergone conversion as the G+C contents are higher than those for other genes in the gene cluster; however, because of a G+C content of about 45% the situation is not as clear as it is for S. enterica, and we do not have C. freundii CA genes for sequence comparison.
FIG. 3.
Region of the E. coli CA gene cluster that contains the GDP-l-Fuc biosynthesis genes.
TABLE 6.
Levels of amino acid identity for the proteins encoded in the CA gene cluster and by O-antigen gene cluster GDP-l-Fuc biosynthesis genesa
Gene | % Amino acid identity for:
|
|||
---|---|---|---|---|
E. coli O157 and E. coli CA genes | S. enterica O30 and S. enterica CA genes | S. enterica O50 and S. enterica CA genes | E. coli O55 and E. coli CA genes | |
gmd | 89.51 | 98.65 | 97.30 | 90.86 |
fcl | 78.19 | 98.44 | 93.69 | |
gmm | 60.37 | 93.63 | 88.53 | 52.53 |
manC | 60.58 | 60.16 | 60.16 | 59.53 |
manB | 97.36 | 94.73 | 91.44 | 85.08 |
The genes in the E. coli CA cluster were compared to the genes in the E. coli O157 and E. coli O55 O-antigen gene clusters, and the genes in the S. enterica CA gene cluster were compared to the genes in the S. enterica O50 and O30 O-antigen gene clusters.
FIG. 4.
Comparison of the DNA sequences at the 5′ end of the gmm gene in the S. enterica O50 CA gene cluster and the 5′ end of the gmm gene in the S. enterica O50 O-antigen gene cluster for identification of a putative recombination site.
We have to consider seriously the possibility that the E. coli O157-S. enterica O30-C. freundii F90 and E. coli O55-S. enterica O50 groups of gene clusters were derived from two gene clusters in the common ancestor and diverged at a much higher rate than the housekeeping genes, as this would account for the consistency of the levels of divergence observed in these cases.
There are several factors that could contribute to higher-than-usual levels of divergence. First, housekeeping genes are subject to random genetic drift across species. Mutations neutral to natural selection accumulate, and sequences of related species diverge. The level of fitness loss that natural selection treats as neutral depends on the effective population size (Ne). For housekeeping genes Ne is related to the whole species, although population structure has a complicating effect. In contrast, for genes in polymorphic gene clusters, like those discussed here, Ne is much less, greatly reducing the level of fitness required for a mutant to be treated as neutral and subject to fixation by random genetic drift. Thus, a higher proportion of mildly deleterious mutations would be liable to fixation by random genetic drift, increasing the rate of sequence divergence. A second possibility is related to the presumed origin of the genes from outside the E. coli-S. enterica-C. freundii species group, based on their generally low G+C content. There could well be ongoing selection pressure for better adaptation to the enterobacterial situation after transfer from low-G+C-content species. This would lead to fixation of mutations for better adaptation and, as there are probably many routes to such adaptation, would also increase the rate of sequence divergence. Finally, there is the possible effect of the genes being in only a small proportion of the species on opportunities for recombination. Strains with any given O antigen are part of the whole species in terms of chance of DNA transfer, and in most cases such DNA is from cells with a different O antigen, although some of the same genes may be present. The effect of this in the dynamics of recombination at the molecular level are not known, but it could well influence the rate of sequence divergence for an O antigen in two related species.
If the divergence observed for genes in the E. coli O157-S. enterica O30-C. freundii F90 and E. coli O55-S. enterica O50 groups of gene clusters is due to an increased rate of divergence of these genes during derivation from gene clusters in the common ancestor, then we have to account for the situation found in the E. coli O111-S. enterica O35 pair. These clusters have levels of divergence that were considered in 2000 (27) to be due to derivation from a gene cluster in the common ancestor. If the rate of divergence in O-antigen gene clusters is generally higher than that for housekeeping genes, then this view would have to be revised, and there is the likelihood that the gene cluster was transferred from one species to the another since species divergence.
The E. coli O157-S. enterica O30-C. freundii F90 and E. coli O55-S. enterica O50 groups of gene clusters have very similar patterns, and the levels of amino acid identity are generally between 55 and 65% for glycosyltransferase and O-antigen processing genes (69.4% for Wzx of E. coli O157 and S. enterica O30) and between 75 and 93% for nucleotide-sugar pathway genes. For the E. coli O111-S. enterica O35 pair the ranges are 75 to 83% and 87% to 93%, respectively. There is a consistent pattern of greater divergence in the glycosyltransferase and O-antigen processing genes than in the nucleotide-sugar pathway genes. This has to be accounted for regardless of the time taken for divergence, as in each case it seems clear that the gene clusters have a common ancestor. It has been observed previously that nucleotide-sugar biosynthesis genes are generally more conserved than the glycosyltransferase and O-antigen processing genes, but this could be related to the different specificity of the latter classes of genes. However, if the gene clusters discussed here originated from a common ancestor, we have to ask why the glycosyltransferase and O-antigen processing genes are diverging at a higher rate than the nucleotide-sugar pathway genes. We now have the sequences of three sets of the same gene cluster in two or three species. There is good evidence that there are different rates of divergence among the genes in these gene clusters. We can only assume that there is some difference in the pressures exerted by natural selection or drift that results in this consistent pattern of divergence in the genes of O-antigen gene clusters.
It is clear that we do not at present have sufficient evidence to determine unequivocally the origins of the gene clusters for the three structures discussed here. There is a strong possibility that two of the three clusters were in the common ancestor, in which case there has been a much higher-than-normal rate of divergence for these gene clusters. There is a need for both experimental and theoretical analyses of the phenomenon and of the reasons for different rates of divergence for different classes of genes if this is indeed the case.
Acknowledgments
This work was supported by grants from the Australian Research Council.
REFERENCES
- 1.Altschul, S. F., T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25:3398-3402. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Bengoechea, J. A., E. Pinta, T. Salminen, C. Oertelt, O. Holst, J. Radziejewska-Lebrecht, Z. Piotrowska-Seget, R. Venho, and M. Skurnik. 2002. Functional characterization of Gne (UDP-N-acetylglucosamine-4-epimerase), Wzz (chain length determinant), and Wzy (O-antigen polymerase) of Yersinia enterocolitica serotype O:8. J. Bacteriol. 184:4277-4287. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Bettelheim, K. A., H. Evangelidis, J. L. Pearce, E. Sowers, and N. A. Strockbine. 1993. Isolation of a Citrobacter freundii strain which carries the Escherichia coli O157 antigen. J. Clin. Microbiol. 31:760-761. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Centers for Disease Control and Prevention. 1999. Laboratory methods for the diagnosis of epidemic dysentery and cholera. Centers for Disease Control and Prevention, Atlanta, Ga.
- 5.Curd, H., D. Liu, and P. R. Reeves. 1998. Relationships among the O-antigen gene clusters of Salmonella enterica groups B, D1, D2, and D3. J. Bacteriol. 180:1002-1007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Jansson, P.-E. 1999. The chemistry of O-polysaccharide chains in bacterial lipopolysaccharides, p. 155-178. In H. Brade, S. M. Opal, S. N. Vogel, and D. C. Morrison (ed.), Endotoxin in health and disease. Marcel Dekker, New York, N.Y.
- 7.Jensen, S. O., and P. R. Reeves. 2001. Molecular evolution of the GDP-mannose pathway genes (manB and manC) in Salmonella enterica. Microbiology 147:599-610. [DOI] [PubMed] [Google Scholar]
- 8.Kenne, L., B. Lindberg, E. Söderholm, D. R. Bundle, and D. W. Griffith. 1983. Structural studies of the O-antigens from Salmonella greenside and Salmonella adelaide. Carbohydr Res. 111:289-296. [DOI] [PubMed] [Google Scholar]
- 9.Kocharova, N. A., Y. A. Knirel, E. S. Stanislavsky, E. V. Kholodkova, C. Lugowski, W. Jachymek, and E. Promanowska. 1996. Structural and serological studies of lipopolysaccharides of Citrobacter O35 and O38 antigenically related to Salmonella. FEMS Immunol. Med. Microbiol. 13:1-8. [DOI] [PubMed] [Google Scholar]
- 10.Lawrence, J. G., and J. R. Roth. 1996. Selfish operons: horizontal transfer may drive the evolution of gene clusters. Genetics 143:1843-1860. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Lindberg, B., F. Lindh, J. Lönngren, A. A. Lindberg, and S. B. Svenson. 1981. Structural studies of the O-specific side-chain of the lipopolysaccharide from Escherichia coli O55. Carbohydr Res. 97:105-112. [DOI] [PubMed] [Google Scholar]
- 12.Lior, H. 1994. Classification of Escherichia coli, p. 31-72. In C. L. Gyles (ed.), Escherichia coli in domestic animals and humans. CAB International, Wallingford, United Kingdom.
- 13.Perry, M. B., L. MacLean, and D. W. Griffith. 1986. Structure of the O-chain polysaccharide of the phenol-phase soluble lipopolysaccharide of Escherichia coli O:157:H7. Biochem. Cell Biol. 64:21-28. [DOI] [PubMed] [Google Scholar]
- 14.Popoff, M. Y., and L. L. Minor. 1997. Antigenic formulas of the Salmonella serovars, 7th revision. W.H.O. Collaborating Centre for Reference and Research on Salmonella, Institut Pasteur, Paris, France.
- 15.Reeves, P. R. 1994. Biosynthesis and assembly of lipopolysaccharide, p. 281-314. In A. Neuberger and L. L. M. van Deenen (ed.), Bacterial cell wall, vol. 27. Elsevier Science Publishers, Amsterdam, The Netherlands. [Google Scholar]
- 16.Reeves, P. R., M. Hobbs, M. Valvano, M. Skurnik, C. Whitfield, D. Coplin, N. Kido, J. Klena, D. Maskell, C. Raetz, and P. Rick. 1996. Bacterial polysaccharide synthesis and gene nomenclature. Trends Microbiol. 4:495-503. [DOI] [PubMed] [Google Scholar]
- 17.Samuel, G., and P. R. Reeves. 2003. Biosynthesis of O-antigens: genes and pathways involved in nucleotide sugar precursor synthesis and O-antigen assembly. Carbohydr. Res. 338:2503-2519. [DOI] [PubMed] [Google Scholar]
- 18.Sharp, P. M. 1991. Determinants of DNA sequence divergence between Escherichia coli and Salmonella typhimurium: codon usage, map position, and concerted evolution. J. Mol. Evol. 33:23-33. [DOI] [PubMed] [Google Scholar]
- 19.Shepherd, J. G., L. Wang, and P. R. Reeves. 2000. Comparison of O-antigen gene clusters of Escherichia coli (Shigella) Sonnei and Plesiomonas shigelloides O17: Sonnei gained its current plasmid-borne O-antigen genes from P. shigelloides in a recent event. Infect. Immun. 68:6056-6061. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Stevenson, G., R. Lan, and P. R. Reeves. 2000. The colanic acid gene cluster of Salmonella enterica has a complex history. FEMS Microbiol. Lett. 191:11-16. [DOI] [PubMed] [Google Scholar]
- 21.Sugiyama, T., N. Kido, Y. Kato, N. Koide, T. Yoshida, and T. Yokochi. 1998. Generation of Escherichia coli O9a serotype, a subtype of E. coli O9, by transfer of the wb* gene cluster of Klebsiella O3 into E. coli via recombination. J. Bacteriol. 180:2775-2778. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Tarr, P. I., L. M. Schoening, Y. L. Yea, T. R. Ward, S. Jelacic, and T. S. Whittam. 2000. Acquisition of the rfb-gnd cluster in evolution of Escherichia coli O55 and O157. J. Bacteriol. 182:6183-6191. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Thompson, J. D., D. G. Higgins, and T. J. Gibson. 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22:4673-4680. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Vinogradov, E. V., J. W. Conlan, and M. B. Perry. 1998. Serological cross-reaction between the lipopolysaccharide O-polysaccharide antigens of Escherichia coli O157:H7 and strains of Citrobacter freundii and Citrobacter sedlakii. FEMS Microbiol. Lett. 190:157-161. [DOI] [PubMed] [Google Scholar]
- 25.Wang, L., S. Huskic, A. Cisterne, D. Rothemund, and P. R. Reeves. 2002. The O antigen gene cluster of Escherichia coli O55:H7 and identification of a new UDP-GlcNAc C4 epimerase gene. J. Bacteriol. 184:2620-2625. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Wang, L., and P. R. Reeves. 1998. Organization of Escherichia coli O157 O antigen gene cluster and identification of its specific genes. Infect. Immun. 66:3545-3551. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Wang, L., and P. R. Reeves. 2000. The Escherichia coli O111 and Salmonella enterica O35 gene clusters: gene clusters encoding the same colitose-containing O antigen are highly conserved. J. Bacteriol. 182:5256-5261. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Xiang, S. H., M. Hobbs, and P. R. Reeves. 1994. Molecular analysis of the rfb gene cluster of a group D2 Salmonella enterica strain: evidence for its origin from an insertion sequence-mediated recombination event between group E and D1 strains. J. Bacteriol. 176:4357-4365. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Zhao, S., C. H. Sandt, G. Feulner, D. A. Vlazny, J. A. Gray, and C. W. Hill. 1993. Rhs elements of Escherichia coli K-12: complex composites of shared and unique components that have different evolutionary histories. J. Bacteriol. 175:2799-2808. [DOI] [PMC free article] [PubMed] [Google Scholar]