Skip to main content
Journal of Bacteriology logoLink to Journal of Bacteriology
. 2005 Apr;187(8):2698–2704. doi: 10.1128/JB.187.8.2698-2704.2005

Nucleotide Substitution and Recombination at Orthologous Loci in Staphylococcus aureus

Austin L Hughes 1,*, Robert Friedman 1
PMCID: PMC1070384  PMID: 15805516

Abstract

The pattern of nucleotide substitution was examined at 2,129 orthologous loci among five genomes of Staphylococcus aureus, which included two sister pairs of closely related genomes (MW2/MSSA476 and Mu50/N315) and the more distantly related MRSA252. A total of 108 loci were unusual in lacking any synonymous differences among the five genomes; most of these were short genes encoding proteins highly conserved at the amino acid sequence level (including many ribosomal proteins) or unknown predicted genes. In contrast, 45 genes were identified that showed anomalously high divergence at synonymous sites. The latter genes were evidently introduced by homologous recombination from distantly related genomes, and in many cases, the pattern of nucleotide substitution made it possible to reconstruct the most probable recombination event involved. These recombination events introduced genes encoding proteins that differed in amino acid sequence and thus potentially in function. Several of the proteins are known or likely to be involved in pathogenesis (e.g., staphylocoagulase, exotoxin, Ser-Asp fibrinogen-binding bone sialoprotein-binding protein, fibrinogen and keratin-10 binding surface-anchored protein, fibrinogen-binding protein ClfA, and enterotoxin P). Therefore, the results support the hypothesis that exchange of homologous genes among S. aureus genomes can play a role in the evolution of pathogenesis in this species.


Genomic evolution is believed to involve the interactions of processes such as genetic drift, natural selection, and recombination (17). The recent availability of complete sequences of a number of conspecific bacterial genomes for the first time has made it possible to study the effects of these processes on a genome-wide scale (16, 25). In the case of bacteria, recombination is a particularly important process, since it may involve transfer from widely divergent backgrounds of groups of genes conferring adaptive traits on the bacterium, including those involved in antibiotic resistance or in pathogenicity (5, 6).

Comparisons of complete genomes of different Staphylococcus aureus isolates illustrated that, in addition to a conserved core of shared genes, different isolates have independently acquired different sets of large mobile elements carrying genes responsible for virulence or drug resistance (11). The best-studied such element is the staphylococcal cassette chromosome mec (SSCmec) element, which confers resistance against the β-lactam family of antibiotics (3, 10). However, no study has so far addressed the extent to which the set of genes shared among S. aureus genomes have come to exhibit divergent sequence patterns as a result of homologous recombination between genomes.

Comparisons of pairs of closely related bacterial genomes have revealed the presence of certain orthologous gene pairs that show anomalously high divergence at synonymous nucleotide sites (12, 13). Since synonymous mutations do not affect amino acid sequence, they are generally expected not to be subject to strong natural selection. In some species of Bacteria, there is evidence of selection on codon usage, but selection coefficients are estimated to be quite low (2, 9). Thus, synonymous mutations accumulate mainly as a function of mutation rate and evolutionary time; and in the absence of high intragenomic variation in mutation rate, a gene pair showing much greater than average synonymous divergence between two related genomes is indicative of a homologous recombination event that brought in one of the two orthologs from a genetic background distinct from the rest of the genome (12, 13).

Here we apply this reasoning to a set of orthologous genes (i.e., genes descended from a common ancestor without gene duplication) found in each of five complete genomes of S. aureus. We used phylogenetic analysis to identify sister pairs of genomes and then used multivariate statistical methods to identify pairs of orthologs that were unusually divergent at synonymous sites between these pairs of genomes. Then we used the organismal phylogeny to reconstruct the recombinational events that gave rise to these divergent sequences.

MATERIALS AND METHODS

Sequences.

The following five complete genomes of S. aureus were used in analyses: the hospital-acquired methicillin-resistant S. aureus (MRSA) strain MRSA252 (NC_002952), the hospital-acquired MRSA and vancomycin intermediately susceptible strain N315 (NC_002745), the hospital-acquired MRSA and vancomycin intermediately susceptible strain Mu50 (NC_002758), the community-acquired MRSA strain MW2 (NC_003923), and the community-acquired methicillin-susceptible S. aureus (MSSA) strain MSSA476 (NC_002953). To provide an outgroup for rooting the phylogenetic tree, we used genomes of Bacillus halodurans (NC_002570), Bacillus subtilis (NC_000964), Listeria innocua (NC_003212), and Listeria monocytogenes (NC_003210).

In order to assign genes to gene families, we applied the BLASTCLUST program (1), which assembles families by homology search using a single-link method, to the sets of predicted protein sequences from the above-mentioned genomes. We used a different set of homology search criteria depending on the goals of the search. In order to identify a set of putative orthologs found in a single copy in both all S. aureus genomes and all outgroup genomes, we used an E value of 10−6 for the BLASTP homology search and scored a match between two sequences only if they were at least 30% identical over at least 50% of their length. Using these criteria, we found 506 families represented by a single member in each of the nine genomes. In order to identify a set of putative orthologs found in a single copy in all S. aureus genomes, we used a stricter set of criteria as follows: we used an E value of 10−6 for the BLASTP homology search and scored a match between two sequences only if they were at least 50% identical over at least 70% of their length. These criteria identified 2,129 gene families represented by a single member in each S. aureus genome (see Table S1 in the supplemental material).

Phylogenetic analyses.

Homologous sequences were aligned at the amino acid level with the CLUSTALW program (23), and phylogenetic analyses were applied to the concatenated amino acid sequence of the 506 protein families represented by a single copy in both the S. aureus and outgroup genomes. The following methods of phylogenetic reconstruction were used: (i) the maximum parsimony (MP) method, implemented in the PAUP* program (22); (ii) the quartet maximum-likelihood method (QML), implemented in the PUZZLE 5.2 program (21); and (iii) the neighbor-joining (NJ) method (19), implemented in the MEGA2 program (15). The NJ tree was based on the gamma-corrected amino acid distance, with the shape parameter (a = 0.64) estimated by the PUZZLE 5.2 program. In the MP and NJ trees, the reliability of the internal branches was assessed by bootstrapping (7); 1,000 bootstrap pseudosamples were used. In the QML tree, the percentage of puzzling steps supporting a given branch provides a conservative test of the reliability of the branch, analogous to bootstrapping (21).

Nucleotide substitution.

After the amino acid sequence alignment was imposed on the DNA sequences, the number of synonymous nucleotide substitutions per synonymous site (dS) and the number of nonsynonymous nucleotide substitutions per nonsynonymous site (dN) were estimated by a maximum-likelihood method (27) using the software package PAML (26). Genes with anomalously high degrees of synonymous divergence were identified by k-means clustering (12). We conducted nonhierarchical k-means clustering using McQueen's algorithm (14). This is a method of creating clusters of observed multivariate data points such that variability within clusters is minimized and variability between clusters is maximized. All statistical analyses were conducted with the Minitab statistical package, release 13 (http://www.minitab.com/).

RESULTS

Phylogenetic analyses.

All methods of phylogenetic analysis yielded the same results when applied to the concatenated amino acid sequence of 506 orthologous genes present in a single copy in each S. aureus and outgroup genome (Fig. 1). In both NJ and MP trees, all branches received 100% bootstrap support, and in the QML tree, all branches were supported in 100% of the puzzling steps. Consistent with previously published phylogenies (4, 8, 11), MW2 and MSSA476 formed a sister pair, as did Mu50 and N315; and MRSA252 formed an outgroup to these two sister pairs (Fig. 1).

FIG. 1.

FIG. 1.

Topology of phylogenetic trees constructed by the NJ, MP, and QML methods based on 506 orthologous genes present in S. aureus and outgroup genomes (151,846 aligned amino acid residues). Numbers on the branches represent the percentage of 1,000 bootstrap samples supporting the branch in both NJ and MP trees; the values were the same in both trees and also were identical to the proportion of puzzling steps supporting the branches in the QML tree.

Nucleotide substitution.

By homology search, we identified 2,129 putative orthologous genes present in a single copy in each of the five S. aureus genomes. For each of these genes, we estimated dS and dN for each pairwise comparison among the genomes. The results were consistent with the phylogenetic analysis (Fig. 1) in that both median and mean dS and dN were lowest for the comparisons between the two sister pairs of genomes (i.e., MW2 versus MSSA476 and Mu50 and N315) and highest for the comparisons with MRSA252 (Table 1). Median values were consistently lower than mean values (Table 1), indicating that the distributions of dS and dN were skewed to the right. In comparisons between the two sister pairs of genomes, median values of both dS and dN were zero (Table 1). Thus, over half of the orthologous gene pairs showed no synonymous or nonsynonymous differences between the sister pairs of genomes.

TABLE 1.

Numbers of synonymous and nonsynonymous substitutions per site at 2,127 orthologous loci in comparisons between S. aureus genomes

Genome Median (mean ± SE) no. of substitutions/site
dS
dN
MSSA476 Mu50 N315 MRSA252 MSSA476 Mu50 N315 MRSA252
MW2 0.0000 (0.0028 ± 0.0012) 0.0130 (0.0284 ± 0.0017) 0.0130 (0.0290 ± 0.0018) 0.0438 (0.0841 ± 0.0035) 0.0000 (0.0004 ± 0.0001) 0.0011 (0.0032 ± 0.0002) 0.0010 (0.0032 ± 0.0002) 0.0029 (0.0074 ± 0.0004)
MSSA476 0.0131 (0.0298 ± 0.0020) 0.0130 (0.0304 ± 0.0021) 0.0439 (0.0855 ± 0.0036) 0.0011 (0.0034 ± 0.0003) 0.0010 (0.0034 ± 0.0003) 0.0030 (0.0076 ± 0.0004)
Mu50 0.0000 (0.0013 ± 0.0005) 0.0439 (0.0834 ± 0.0036) 0.0000 (0.0003 ± 0.0001) 0.0031 (0.0075 ± 0.0004)
N315 0.0439 (0.0839 ± 0.0034) 0.0031 (0.0075 ± 0.0004)

Surprisingly, we found 108 orthologous genes (see Table S2 in the supplemental material) that were identical at synonymous sites among all five S. aureus genomes. In the comparison between the sister pairs, identity at synonymous sites was unsurprising; indeed, the median dS was zero (Table 1). But in more distant comparisons, particularly those between MRSA252 and the other genomes, a lack of synonymous difference was striking given the overall mean level of synonymous divergence (Table 1).

One possible explanation for such homogeneity at synonymous sites among genomes is that these 108 genes were recently transferred by recombination events among genomes. An alternative hypothesis is that these genes are subject to some unusual constraint at synonymous sites that substantially reduces the rate of synonymous substitution. In order to decide between these hypotheses, we compared the 108 genes without synonymous differences to the other genes in our sample (Table 2). Both median and mean length (number of aligned codons) were significantly lower in these 108 genes than in the other genes (Table 2). Mean percent G+C at third codon positions was slightly but significantly lower in the 108 genes than in the other genes (Table 2). And both mean and median dN in the comparison between other genomes at the outgroup MRSA252 were significantly lower in the 108 genes than in the other genes (Table 2).

TABLE 2.

Comparison of orthologous genes identical at synonymous sites among five S. aureus genomes with other orthologous genes

Parameter Median (mean ± SE) for comparison of:
Genes identical at synonymous sites (n = 108) Other genes (n = 2019)a
No. of codons 97.5 (105.9 ± 4.6) 286.0*** (318.0 ± 4.4)**
G+C at 3rd position (%) 21.8 (22.1 ± 0.6) 22.4* (22.6 ± 0.1)
Mean dN vs MRSA252 0.0000 (0.0021 ± 0.0005) 0.0033*** (0.0078 ± 0.0004)**
a

Two-tailed tests of equality of medians(Mann-Whitney U test) and means(t test) between genes identical at synonymous sites and other genes: *, P < 0.05; **, P < 0.01; ***, P < 0.0001.

Evidence of recombination.

Excluding the 108 loci lacking synonymous substitutions, we used a k-means clustering procedure in order to identify genes with highly unusual patterns of synonymous substitution. The procedure was applied to the pairwise dS values among MW2, MSA476, Mu50, and N315. We repeated the k-means analysis with increasing k until a set of highly unusual genes was identified. The minimum value which identified this set of highly unusual genes was k = 3 (Table 3). This procedure identified three clusters, two of which (clusters 2 and 3) were characterized by extraordinarily high median and mean values of dS between the two sister pairs of genomes: that is, in comparisons between MW2 or MSSA476 and Mu50 or N315 (Table 3).

TABLE 3.

Median and mean dS for clusters of genes identified by k-means clustering

Comparison Median (mean ± SE) dS for clustera:
1 (n = 1,976) 2 (n = 35) 3 (n = 8)
MSSA476 vs MW2 0.0000 (0.0004 ± 0.0001) 0.0000 (0.0426 ± 0.0281) 0.0000 (0.4800 ± 0.2440)
MSSA476 vs Mu50 0.0139 (0.0213 ± 0.0001) 0.3220 (0.3313 ± 0.0223) 1.0968 (1.2130 ± 0.1560)
MSSA476 vs N315 0.1390 (0.0213 ± 0.0006) 0.3725 (0.3746 ± 0.0237) 1.0968 (1.2130 ± 0.1560)
MW2 vs Mu50 0.0139 (0.0213 ± 0.0006) 0.3220 (0.3333 ± 0.0233) 0.8274 (0.8170 ± 0.2360)
MW2 vs N315 0.0138 (0.0212 ± 0.0006) 0.3701 (0.3759 ± 0.0243) 0.8274 (0.8170 ± 0.2360)
Mu50 vs N315 0.0000 (0.0006 ± 0.0002) 0.0000 (0.0484 ± 0.0293) 0.0000 (0.0000 ± 0.0000)
a

Tests of equality of medians (Kruskal-Wallis test) and means(one-way analysis of variance) among clusters were all significant at P < 0.001.

The two clusters of unusual genes included a total of 43 loci (Table 3). That the values of dS in clusters 2 and 3 were unusually high for our data set is confirmed by comparing the median and mean values for these clusters (Table 3) with the overall median and mean values for the comparisons among these four genomes (Table 1). The median and mean values for cluster 2 were an order of magnitude greater than the overall median and mean values, while those for cluster 3 were nearly 2 orders of magnitude greater than the overall median and mean values (Tables 1 and 3). Indeed, in many comparisons among genomes, synonymous sites were nearly saturated with changes, as indicated by estimates of dS greater than 1.0 (Table 3).

In spite of the high dS values, the corresponding values of dN were not exceptionally high (Table 4). As with dS, there were significance differences among clusters with respect to both median and mean dN (Table 3). The fact that the dN values were moderate implies that the anomalously high dS values were not caused by faulty alignment. Although somewhat divergent in amino acid sequence, the genes in clusters 2 and 3 were extraordinarily divergent at synonymous sites only.

TABLE 4.

Median and mean dN for clusters of genes identified by k-means clustering

Comparison Median (mean ± SE) dN for clustera:
1 (n = 1,976) 2 (n = 35) 3 (n = 8)
MSSA476 vs MW2 0.0000 (0.0002 ± 0.0001) 0.0000 (0.0038 ± 0.0023) 0.0009 (0.0565 ± 0.0276)
MSSA476 vs Mu50 0.0012 (0.0024 ± 0.0001) 0.0237 (0.0335 ± 0.0059) 0.1628 (0.1467 ± 0.0204)
MSSA476 vs N315 0.0011 (0.0023 ± 0.0001) 0.0273 (0.0379 ± 0.0061) 0.1628 (0.1467 ± 0.0204)
MW2 vs Mu50 0.0011 (0.0024 ± 0.0001) 0.0237 (0.0336 ± 0.0059) 0.1173 (0.1030 ± 0.0265)
MW2 vs N315 0.0010 (0.0023 ± 0.0001) 0.0273 (0.0381 ± 0.0061) 0.1173 (0.1030 ± 0.0265)
Mu50 vs N315 0.0000 (0.0002 ± 0.0000) 0.0000 (0.0068 ± 0.0040) 0.0000 (0.0000 ± 0.0000)
a

Tests of equality of medians (Kruskal-Wallis test) and means(one-way analysis of variance) among clusters were all significant at P < 0.005.

By examining in detail the pattern of synonymous substitution among the five genomes at these 43 loci (see Table S1 in the supplemental material), we reconstructed the most likely scenarios of homologous recombination that gave rise to the unusually high dS values seen in certain comparisons (Fig. 2 and Table 5). The most commonly observed pattern (seen in 19 of 43 genes) was one in which the members of the sister pairs were identical to each other or nearly so at synonymous sites, but the two pairs were very divergent from each other and from MRSA252. This pattern is most easily explained by the occurrence of two independent recombination events, one involving recombination in the ancestor of MW2 and MSSA476 and the other in the ancestor of Mu50 and N315 (Fig. 2A).

FIG. 2.

FIG. 2.

Hypothetical scenarios for recombination events in the history of S. aureus genomes.

TABLE 5.

Candidates for intergenomic recombination identified by k-means clustering

Locus identification (MW2, N315) Protein function Recombination patterna
MW0035, SA0054 Unknown F
MW0037, SA0056 Unknown F
MW0038, SA0057 Cassette chromosome recombinase B D
MW0039, SA0058 Cassette chromosome recombinase A D
MW0041, SA0060 Unknown D
MW0206, SA0222 Staphylocoagulase A
MW0322, SA0334 TatC sec-independent protein translocase A
MW0325, SA0337 Helix-turn-helix XRE family-like protein B
MW0337, SA0349 Mechanosensitive ion channel A
MW0338, SA0350 DUF951 protein F
MW0339, SA0351 GTP-binding protein C
MW0382, SA0382 Exotoxin A
MW0394, SA0393 Exotoxin A
MW0395, SA0393 Unknown A
MW0404, SA0408 Unknown B
MW0405, SA0409 Unknown B
MW0516, SA0519 Ser-Asp fibrinogen-binding bone sialoprotein-binding protein (SdrC) A
MW0518, SA0521 Ser-Asp fibrinogen-binding bone sialoprotein-binding protein (SdrE) A
MW0551, SA0519 Fibrinogen and keratin-10 binding surface-anchored protein A
MW0764, SA0742 Fibrinogen-binding protein ClfA A
MW0862, SA0840 Phosphatidyl/ethanolamine-binding protein (PEBP) B
MW1038, SA1001 Unknown A
MW1206, SA1155 Cardiolipin synthetase A
MW1328, SA1271 Alanine dehydrogenase B
MW1354, SA1297 3-Phosphoskimate 1-carboxyvinyltransferase (AroA) A
MW1355, SA1298 3-Dehydroquinate synthase (AroB) A
MW1356, SA1299 Chorismate synthase (AroC) A
MW1738, SA1618 Unknown C
MW1889, SA1761 Enterotoxin P E
MW1895, SA1765 Phage tail length tape measure protein C
MW1928, SA1799 Unknown E
MW1932, SA1801 Phage anti-repressor G
MW1962, SA1843 Receptor histidine kinase (AgrC) B
MW2132, SA2008 Alpha-acetolactate synthase A
MW2254, SA2125 Formimoglutamase C
MW2320, SA2186 Uroporphyrin-III C-methyl transferase(NasF) A
MW2321, SA2187 Assimilatory nitrite reductase A
MW2396, SA2260 Glucose-1-dehydrogenase B
MW2449, SA2317 Acetyl transferase B
MW2466, SA2333 Hydroxymethylglutaryl-CoAb reductase(MraA) C
MW2507, SA2373 Unknown
MW2551, SA2423 Clumping factor B A
MW2576, SA2448 Flavoprotein oxygenase B
a

Based on hypothetical recombination scenarios illustrated in Fig. 2.

b

CoA, coenzyme A.

In such cases, the source of the recombinant gene was unknown. Among genes showing this pattern were the three linked genes of the aro cluster (aroA, aroB, and aroC) and certain genes important for pathogenesis, such as two linked genes encoding exotoxins and two linked genes encoding Ser-Asp fibrinogen-binding bone sialoprotein-binding proteins (Table 5). For some genes, Mu50 and N315 were very similar to each other but highly divergent from all other genomes, suggesting a recombination in the ancestor of these two genomes (Fig. 2B). Similarly, there were genes for which MW2 and MSSA476 were very similar to each other but highly divergent from all other genomes, suggesting a recombination in the ancestor of these two genomes (Fig. 2C).

The genes for cassette chromosome recombinases A and B and a linked gene of unknown function (Table 5) in MSSA476 were highly divergent at synonymous sites from the orthologous genes in the other genomes. This pattern suggests a recombination into MSSA476 from an unknown source (Fig. 2D). Alternatively, it is possible that MSSA476 possesses the ancestral form of this set of linked genes and that the sequences of these genes in MW2, Mu50, N315, and MRSA252 have resulted from independent events of recombination from an unknown source. Similarly, N315 possessed a sequence at the locus encoding enterotoxin P (Table 5) that was highly divergent at synonymous sites from the orthologous genes in the other genomes, suggesting a recombination into that genome from an unknown source (Fig. 2E).

Finally, there were cases in which certain genomes showed sequences identical at both synonymous and nonsynonymous sites to those of the distantly related genome MRSA252 but highly divergent from the other genomes. Such a pattern suggests a recent recombination from MRSA252 or from a genome closely related to MRSA252 (Fig. 2F and G). Alternatively, independent events of recombination may have occurred in both MRSA252 and these other genomes. In the case of three genes of unknown function, a sequence identical to that of MRSA252 was shared by Mu50 and N315 (Table 5), suggesting recombination into the ancestor of these two genomes (Fig. 2F). In the gene for a phage anti-repressor (Table 5), the MRSA252-like sequence was found only in N315, suggesting recombination into that genome (Fig. 2G).

DISCUSSION

We examined the pattern of nucleotide substitution at 2,129 orthologous loci among five genomes of S. aueus, which included two sister pairs of closely related genomes (MW2/MSSA476 and Mu50/N315). Unusual patterns of substitution were identified at a number of loci. There were 108 loci that showed no synonymous differences among the five genomes, in spite of the fact that the average gene showed substantial synonymous site divergence except in the comparison among sister pairs. These 108 loci were found to have atypically short coding regions; G+C content at the third position was slightly more biased than those of other genes; and the genes were unusually conserved at synonymous sites.

These 108 loci included a number of known genes encoding short, highly conserved proteins: for example, 25 of 108 loci (23.1%) encoded ribosomal proteins (see Table S2 in the supplemental material). Thus, it seems likely that most of these genes are subject to strong functional constraints at the amino acid level. The absence of synonymous substitution suggests that there may be additional strong constraints on synonymous codon usage. However, a number of the predicted proteins were of unknown function (see Table S2 in the supplemental material). Thus, it is possible that some of these predicted genes do not really correspond to protein-coding genes but rather represent noncoding sequences that are conserved for other reasons. A possible example was the set of predicted orthologous loci represented by MW060 in the MW2 genome; the predicted protein in this case is only 26 amino acids long (see Table S2 in the supplemental material).

A second group of unusual genes were 45 genes that showed anomalously high levels of synonymous substitution. At these loci, the value of dS in one or more comparisons among the four most closely related genomes (MW2, MSSA476, Mu50, and N315) was 1 or even 2 orders of magnitude higher than those seen at typical loci. There is no known form of natural selection that can cause such an elevation in the rate of synonymous substitution (12). The only known form of selection affecting synonymous sites is selection on synonymous codon usage, but this selection is likely to be purifying and thus to reduce the rate of synonymous substitution rather than enhance it (20). Therefore, any gene that has an unusually high dS in the comparison between two closely related genomes is likely to have been recombined into one of the two genomes by homologous recombination from a more distantly related genome (12, 13).

In the case of the 45 genes identified by unusually divergent values of dS in the present data set, we examined the pattern of synonymous substitution in order to reconstruct the hypothetical recombination event or events. The most common type of event involved independent recombination into the common ancestors of each of the two sister pairs (Fig. 2A). Several of these genes showing evidence of this pattern were potentially important for pathogenesis, including genes encoding staphylocoagulase, exotoxins, and Ser-Asp fibrinogen-binding bone sialoprotein-binding proteins SdrC and SdrE (Table 5). The latter in particular are known to play a key role in hematogenous tissue infection of humans (24). In a number of cases, the genes involved in putative recombination events were linked together in a cluster, suggesting that recombination events can span a number of linked loci. A striking example involved the linked aroA, aroB, and aroC genes (Table 4), which function in the synthesis of aromatic amino acids (18).

Putatively recombinant genes detected by our approach included proteins apparently introduced by mobile genetic elements such as phage (phage tail length tape measure protein and phage anti-repressor; Table 5) and the SCCmec element. The latter included the linked genes for cassette chromosome recombinases A and B, in which MSSA476 had a sequence highly divergent at synonymous sites from those of the other genomes (Table 5). The latter pattern suggested that either (i) MSSA476 received these genes by recombination or (ii) MSSA476 possesses the ancestral form of this set of linked genes and the sequences of these genes in MW2, Mu50, N315, and MRSA252 have resulted from independent events of recombination from the same source. Given the phylogeny of the genomes (Fig. 1), at least three independent recombination events (to MW2, to MRSA252, and to the ancestor of Mu50 and N315) would be required under the latter hypothesis. Thus, this hypothesis is not as parsimonious as the hypothesis of a single recombination event into MSSA476. However, the less parsimonious hypothesis is attractive because of the association of the cassette chromosome recombinases with the SSCmec island, which is found in MW2, Mu50, N315, and MRSA252 but not in MSSA476.

It was also of interest that both Mu50 and N315 possessed three genes that were identical at synonymous sites to the corresponding genes of the distantly related MRSA252 genome but highly divergent from the corresponding genes of MW2 and MSSA476 (Table 5). Similarly, the phage anti-repressor gene of N315 was identical at both synonymous and nonsynonymous sites to that of MRSA252 but divergent from homologues in more closely related genomes (Table 5). These patterns suggest recombination events in which the donor sequence was MRSA252 or a closely related genome (Fig. 2F and G). Again a less parsimonious alternative is possible, which would involve independent recombination events from an unknown source into both MRSA252 and these other genomes.

Although much less divergent at nonsynonymous sites than at synonymous sites (Tables 3 and 4), the genes introduced by recombination did show differences at the amino acid level. The results thus indicate that homologous recombination in S. aureus can be a source of genes encoding proteins having amino acid sequence differences and thus potential functional differences. The proteins encoded by these recombinant loci include some with known or likely functions in pathogenesis (e.g., staphylocoagulase, exotoxin, Ser-Asp fibrinogen-binding bone sialoprotein-binding protein, fibrinogen and keratin-10 binding surface-anchored protein, fibrinogen-binding protein ClfA, and enterotoxin P). Thus, the results support the hypothesis that exchange of homologous genes among S. aureus genomes plays a role in the evolution of pathogenesis in this species.

Supplementary Material

[Supplemental material]

Acknowledgments

This research was supported by grant GM43940 from the National Institutes of Health to A.L.H.

Footnotes

Supplemental material for this article may be found at http://jb.asm.org/.

REFERENCES

  • 1.Altschul, S. F., T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25:3389-3402. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Bulmer, M. 1991. The selection-mutation-drift theory of synonymous codon usage. Genetics 129:897-907. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Crisóstomo, M. I., H. Weseth, A. Tomasz, M. Chung, D. C. Oliveira, and H. de Lencastre. 2001. The evolution of methicillin resistance in Staphylococcus aureus: similarity of genetic backgrounds in historically early methicillin-susceptible and -resistant isolates and contemporary epidemic clones. Proc. Natl. Acad. Sci. USA 98:9865-9870. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Enright, M. C., D. A. Robinson, G. Randle, E. J. Feil, H. Grundmann, and B. G. Spratt. 2002. The evolutionary history of methicillin-resistant Staphylococcus aureus (MRSA). Proc. Natl. Acad. Sci. USA 99:7687-7692. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Feil, E. J., and B. G. Spratt. 2001. Recombination and the population structures of bacterial pathogens. Annu. Rev. Microbiol. 55:561-590. [DOI] [PubMed] [Google Scholar]
  • 6.Feil, E. J., M. C. Enright, and B. G. Spratt. 2000. Estimating the relative contributions of mutation and recombination to clonal diversification: a comparison between Neisseria meningitides and Streptococcus pneumoniae. Res. Microbiol. 151:465-469. [DOI] [PubMed] [Google Scholar]
  • 7.Felsenstein, J. 1985. Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39:783-791. [DOI] [PubMed] [Google Scholar]
  • 8.Fitzgerald, J. R., S. D. Reid, E. Ruotsalainen, T. J. Tripp, M. Y. Liu, R. Cole, P. Kuusela, P. M. Schlievert, A. Järvinen, and J. M. Musser. 2003. Genome diversification in Staphylococcus aureus: molecular evolution of a highly variable chromosomal region encoding the staphylococcal extoxin-like family of proteins. Infect. Immun. 71:2827-2838. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Grantham, R., C. Gautier, M. Gouy, R. Mercier, and A. Pave. 1980. Codon catalog usage and the genome hypothesis. Nucleic Acids Res. 8:r49-r62. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Hiramatsu, K., C. Longzhu, M. Kuroda, and T. Ito. 2001. The emergence and evolution of methicillin-resistant Staphylococcus aureus. Trends Microbiol. 9:486-493. [DOI] [PubMed] [Google Scholar]
  • 11.Holden, M. T. G., E. J. Feil, J. A. Linsay, S. J. Peacock, N. P. J. Day, M. C. Enright, T. J. Foster, C. E. Moore, L. Hurst, R. Atkin, A. Barron, N. Bason, S. D. Bentley, C. Chillingworth, T. Chillingworth, C. Churcher, L. Clark, C. Corton, A. Cronin, J. Doggestt, L. Dowd, T. Feltwell, Z. Hance, B. Harris, H. Hauser, S. Holroyd, K. Jagels, K. D. James, N. Lennard, A. Line, R. Mayes, S. Moule, K. Mungall, D. Ormond, M. A. Quail, E. Rabbinowitsch, K. Rutherford, M. Sanders, S. Sharp, M. Simmonds, K. Stevens, S. Whitehead, B. G. Barrell, B. G. Spratt, and J. Parkhill. 2004. Complete genomes of two clinical Staphylococcus aureus strains: evidence for the rapid evolution of virulence and drug resistance. Proc. Natl. Acad. Sci. USA 101:9786-9791. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Hughes, A. L., and R. Friedman. 2004. Patterns of sequence divergence in 5′ intergenic spacers and linked coding regions in 10 species of pathogenic bacteria reveal distinct recombinational histories. Genetics 168:1795-1803. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Hughes, A. L., R. Friedman, and M. Murray. 2002. Genomewide pattern of synonymous nucleotide substitution in two complete genomes of Mycobacterium tuberculosis. Emerg. Infect. Dis. 8:1342-1346. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Johnson, R., and D. Wichern. 1992. Applied multivariate statistical methods, 3rd ed. Prentice-Hall, Englewood Cliffs, N.J.
  • 15.Kumar, S., K. Tamura, I. B. Jakobsen, and M. Nei. 2001. MEGA2: molecular evolutionary genetics analysis software. Bioinformatics 17:1244-1245. [DOI] [PubMed] [Google Scholar]
  • 16.Lan, R., and P. R. Reeves. 2000. Intraspecies variation in bacterial genomes: the need for a species genome concept. Trends Microbiol. 8:396-401. [DOI] [PubMed] [Google Scholar]
  • 17.Nei, M. 1987. Molecular evolutionary genetics. Columbia University Press, New York, N.Y.
  • 18.O'Connell, C., P. A. Pattee, and T. J. Foster. 1993. Sequence and mapping of the aroA gene of Staphylococcus aureus 8325-4. J. Gen. Microbiol. 139:1449-1460. [DOI] [PubMed] [Google Scholar]
  • 19.Saitou, N., and M. Nei. 1987. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4:406-425. [DOI] [PubMed] [Google Scholar]
  • 20.Sharp, P. M. 1991. Determinants of DNA sequence divergence between Escherichia coli and Salmonella typhimurium: codon usage, map position, and concerted evolution. J. Mol. Evol. 33:23-33. [DOI] [PubMed] [Google Scholar]
  • 21.Strimmer, K., and A. von Haeseler. 1996. Quartet puzzling: a quartet maximum-likelihood method for reconstructing tree topologies. Mol. Biol. Evol. 13:964-969. [Google Scholar]
  • 22.Swofford, D. L. 2002. PAUP*: phylogenetic analysis using parsimony (*and other methods). Sinauer, Sunderland, Mass.
  • 23.Thompson, J. D., D. G. Higgins, and T. Gibson. 1994. CLUSTALW: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22:4673-4680. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Tristan, A., L. Ying, M. Bes, J. Etienne, F. Vandenesch, and G. Lina. 2003. Use of multiplex PCR to identify Staphylococcus aureus adhesions involved in human hematogenous infections. J. Clin. Microbiol. 41:4465-4467. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Whittam, T. S., and A. C. Bumbaugh. 2002. Inferences from whole-genome sequences of bacterial pathogens. Curr. Opin. Genet. Dev. 12:719-725. [DOI] [PubMed] [Google Scholar]
  • 26.Yang, Z. 1997. PAML: a program package for phylogenetic analysis by maximum likelihood. Comput. Appl. Biosci. 13:555-556. [DOI] [PubMed] [Google Scholar]
  • 27.Yang, Z., and R. Nielsen. 2000. Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. Mol. Biol. Evol. 17:32-43. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

[Supplemental material]

Articles from Journal of Bacteriology are provided here courtesy of American Society for Microbiology (ASM)

RESOURCES