Heat shock protein 60 sequence comparisons: Duplications, lateral transfer, and mitochondrial evolution

Samuel Karlin; Luciano Brocchieri

doi:10.1073/pnas.97.21.11348

. 2000 Oct 10;97(21):11348–11353. doi: 10.1073/pnas.97.21.11348

Heat shock protein 60 sequence comparisons: Duplications, lateral transfer, and mitochondrial evolution

Samuel Karlin ^1,^*, Luciano Brocchieri ¹

PMCID: PMC17203 PMID: 11027334

Abstract

Heat shock proteins 60 (GroEL) are highly expressed essential proteins in eubacterial genomes and in eukaryotic organelles. These chaperone proteins have been advanced as propitious marker sequences for tracing the evolution of mitochondrial (Mt) genomes. Similarities among HSP60 sequences based on significant segment pair alignment calculations are used to deduce associations of sequences taking into account GroEL functional/structural domain differences and to relate HSP60 duplications pervasive in α-proteobacterial lineages to the dynamics of lateral transfer and plasmid integration. Multiple alignments with consensuses are determined for 10 natural groups. The group consensuses sharpen the similarity contrasts among individual sequences. In particular, the Mt group matches best with the classical α-proteobacteria and closely with Rickettsia but significantly worse with the rickettsial groups Ehrlichia and Orientia. However, across broad protein sequence comparisons, there appears to be no consistent prokaryote whose protein sequences align best with animal Mt genomes. There are plausible scenarios indicating that the nuclear-encoded HSP60 (and HSP70) sequences functioning in Mt are results of lateral transfer and are probably derived from an α-proteobacterium. This hypothesis relates to the plethora of duplicated HSP60 sequences among the classical α-proteobacteria contrasted with no duplications of HSP60 among other clades of proteobacterial genomes. Evolutionary relations are confounded by differential selection pressures, convergence, variable mutational rates, site variability, and lateral gene transfer.

Keywords: HSP60, GroEL, mitochondria, protein similarity, gene duplication

Heat shock protein 60 (GroEL) is an abundant essential protein in all Escherichia coli life stages (e.g., during log growth and in stationary phase) and in most bacteria (1). The family of HSP60 proteins is well studied for its role as chaperone facilitators of protein folding and in rescuing the cell from stress conditions (e.g., see review in ref. 2). HSP60 proteins are ubiquitous in eubacterial cells and also function in the mitochondrial (Mt) and plastid organelles of eukaryotes. In particular, HSP60 (and HSP70) facilitate bidirectional traffic between the mitochondrion and the cytoplasm. HSP60 protein structures form heptameric rings that dimerize in a barrel-like complex with a central plane of symmetry (3, 4). Each monomer folds into a structure divided into three domains: Domain E occupies the Equatorial section of the double ring, Domain A consists of the Apical part, and the Intermediate Domain I connects the previous two. Each Domain contributes a specific function in the HSP60 complex. Domain E includes most of the connections between monomers of the same ring and between rings and contains the ATP/ADP/Mg²⁺-binding pocket. Domain I closes on the binding pocket, providing essential residues for ATP hydrolysis. Domain A binds to HSP10 (GroES) and to the target substrate. Proteins of the extended HSP60 family (eubacterial GroEL, eukaryotic Tcp1, and archaeal thermosome) are highly expressed. This property, observed in many evolutionary lineages, may be related to the mobility of the HSP60 genes within and between genomes. Duplication and horizontal gene transfer of HSP60 may promote functional adaptation and differentiation.

Several chaperone and degradation proteins that function in Mt organelles of eukaryotes have been proposed as good marker sequences for tracing the evolution of Mt genomes (5, 6). The current Mt endosymbiont hypothesis, inferred from rRNA gene sequence comparisons, proposes that Mt genomes were acquired from a Gram-negative α-proteobacterium. Viale and Arakaki (7), by using the neighbor-joining algorithm (8) applied to HSP60 protein sequences, proposed that Mt sequences are most related to the Rickettsia tsutsugamushi sequence, now reclassified as Orientia tsutsugamushi (9). Andersson et al. (10) proposed Rickettsia prowazekii (RICPR in the SwissProt genus–species nomenclature; see Fig. 1) as the likely endosymbiont forebear of the Mt organelle. Recently, this interpretation has been supplanted by some unspecified member of the α-proteobacterial clade (e.g., refs. 11, 12). However, analysis of different protein families reveals no consistent prokaryotic organism most similar to mitochondria (see below). Increasing writings now advocate the view that the first eukaryotes and mitochondria arose in unison (13–15). It is also recognized that genomes of many organisms, especially prokaryotes and primitive eukaryotes, consist of “heterogeneous unions,” “consortia,” and chimeras to which lateral transfer and/or close associations have substantially contributed (13–16).

SSPA similarities of bacterial and organellar HSP60 sequences.

The main objective of this paper is to evaluate similarities between HSP60 sequences on the basis of significant segment pair alignment (SSPA) calculations, with special attention to Mt evolution (the SSPA method is formally described in Methods), taking into account GroEL structural properties, paralogs, and influences of lateral transfer.

Methods

SSPA.

For convenience, we outline the SSPA protocol (for elaborations, see refs. 17, 18). A pairwise amino acid similarity matrix s(i,j) (e.g., blosum62; for review, see ref. 19) is used to score pairwise amino acid similarities. Given two sequences to be aligned, pairs of sequence segments are identified that attain an aggregate score exceeding the score attained for corresponding random sequences of the same composition with probability < 0.01. Extant high-scoring matching segments among protein sequences putatively imply conservation because of essential biological structure/function. The global similarity between two protein sequences is scored as follows: first, all HSSPs (high-scoring segment pairs) significant at the 1% level are identified. Next, the HSSPs are combined into a consistent alignment. The alignment (SSPA) score is the maximal value with respect to all sets of consistent matching sequence segments calculated by summing HSSP segment scores and then normalizing to allow comparison of proteins of different sizes and quality. For the sequence pairings with at least one hit (i.e., a segment having a significantly high SSPA match), additional segments are identified by using a lower probability threshold (typically 0.50). The use of the second reduced threshold helps to fill in regions between the more significant HSSP. The SSPA scores are used to deduce groupings of sequences. A group is deemed coherent if the SSPA scores within the group almost invariably exceed the SSPA scores with sequences not in the group, and if the scores with sequences of other groups are consistent for all members of the groups.

Data.

From more than 150 distinct HSP60 sequences found, a culled collection was formed retaining 43 representative sequences of mutual SSPA scores not exceeding 80% (Fig. 1). These include sequences from six β- or γ-proteobacteria, four α-proteobacteria, four Rickettsia/Orientia/Ehrlichia, one ɛ-type (Helicobacter pylori), six singular Gram(−), five low G + C Gram(+), three high G + C Gram(+), one mycoplasma, two cyanobacteria, seven sequences classified as Mt-like, including one hydrogenosomal (Hy) sequence from the aMt eukaryote Trichomonas vaginalis (TRIVA), and four Plastid sequences.

The classical α-proteobacterial sequences divide into two major subgroups, one important in nitrogen fixation (e.g., Rhizobium spp.) and a second found predominantly in soil and marine habitats and performing anoxygenic photosynthesis (e.g., Rhodobacter spp.). A tentative third group, the Rickettsiales, including the obligate intracellular parasites Rickettsia, Orientia, and Ehrlichia genera, has been grouped with α-proteobacteria, apparently on the basis of 16S rRNA gene comparisons. However, genome signature and protein comparisons (see Figs. 1 and 2) indicate drastic discrepancies between classical α and Rickettsiales (15). The classical α genomes are pervasively of high G + C content (>60%), whereas rickettsial genomes are of low G + C content (<32%).

SSPA Similarities between eubacterial and mitochondrial sequences are indicated for different protein families. Abr, *Azospirillum brasilense*; Aca, *Acantamoeba castellanii*; Ama, *Allomyces macrogynus* (Chytridiomycota); Ath, *Arabidopsis thaliana*; Bca, *Bacillus coldotenax*; Bsu, *Bacillus subtilis*; Cab, *Clostridium acetobutylicum*; Ccr, *Chondrus crispus* (Rhodophyta); Cel, *Caenorhabditis elegans*; Cma, *Cucurbita maxima* (pumpkin); Cre, *Chlamydomonas reinhardtii*; Dme, *Drosophila melanogaster*; Eco, *Escherichia coli*; Gga, *Gallus gallus*; Hum, human; Lta, *Leishmania tarentolae*; Mmu, mouse; Mpo, *Marchantia polymorpha* (liverwort); Ncr, *Neurospora crassa*; PARDE, *Paracoccus denitrificans*, Pfa, *Plasmodium falciparum*; Pwi, *Prototheca wickerhamii* (Chlorophyta); Ram, *Reclinomonas americana*; Rca, RHOCA, *Rhodobacter capsulatus*; Rme, *Rhizobium meliloti*; Rpr, *Rickettsia prowazekii*; Rsp, *Rhodobacter sphaeroides*; Sau, *Streptomyces aureofaciens*; Sce, *Saccharomyces cerevisiae*; Smu, *Streptococcus mutans*; Zma, maize. ¹α, *Bradyrhizobium japonicum*, *Rhodobacter leguminosarum*, *R. sphaeroides*; Nem, nematodes *Ascaris suum* (pig roundworm) and *C. elegans*; Fun, fungi *Emericella nidulans*, *N. crassa*, *S. pombe*, and *S. cerevisiae*; Pla, plants *A. thaliana* and maize; Kpl, kinetoplastida *L. terentolae* and *Trypanosoma brucei brucei*. ²α, *P. denitrificans* and *R. sphaeroides*; Nem, *A. suum* and *C. elegans*; Fun, *N. crassa*, *S. cerevisiae*, and *S. pombe*; Pla, plants *D. carota* (carrot) and maize. ³Mam, human and mouse; Nem, *A. suum* and *C. elegans*; Fun, *N. crassa*, *S. pombe* and *Schizophyllum commune*. ⁴Pla, *A. thaliana* and liverwort. ⁵Pla, *A. thaliana* and wheat. ⁶α, *Rhodospirillum rubrum* and *R. capsulatus*. ⁷Fun, *Candida glabrata* and *S. cerevisiae*. ⁸α, *R. sphaeroides*, *R. capsulatus*, and *R. rubrum*; Ver, vertebrates human, mouse, and chick; Fun, *N. crassa* and *S. cerevisiae*. ⁹α, *R. meliloti* and *C. crescentus*. ¹⁰γ, *H. influenzae*, *E. coli*, and *Pseudomonas putida*; Gr(−), Gram(−) BORBU, CHLTR, HELPY, TREPA, and SYNY3; Gr(+), Gram(+) BACSU and *S. aureofaciens*.

Results and Discussion

SSPA Values Among HSP60 Bacterial Sequences.

We implemented the SSPA methodology to ascertain similarities among the 43 representative HSP60 sequences. Regions of divergence and conservation among sequences were analyzed through their multiple alignment (secured with the interalign protocol) reported in ref. 20.

HSP60 sequences of the β and γ-proteobacterial sequences (ECOLI, CHRVI, AMOPS, BORPE, COXBU, and NEIFL; see Fig. 1) have SSPA scores mutually high in the range 73–80 and <71 (usually much less) with the other sequences. Classical α-proteobacterial sequences (CAUCR, AGRTU, and BRAJA) also cluster (SSPA values 75–79). ZYMMO vs. classical α produces similarity scores 67–70 but significantly lower (61) with (β + γ)-proteobacteria, relegating ZYMMO to be a distant relative of the α-proteobacteria. The alignment of the RICPR sequence relative to classical α-proteobacterial HSP60 sequences produces SSPA scores about 68 to 71. However, contrary to RICPR comparisons, the three sequences of ORITS, EHRCH, and EHRRI align to the α-sequences at the diminished SSPA levels 53–59. HSP60 sequence similarities unambiguously separate Orientia and Ehrlichia from each other (SSPA value 50) and from all currently available bacterial groups.

The Gram(+) sequences divide into two groups: low G + C (BACSU, STAEP, LACLA, CLOPE, and CLOTM) and high G + C (STRCO, MYCLE, but excluding MYCTU), with SSPA scores 66–77 within groups. The SSPA scores between these groups are in the range 60–66, not dissimilar from their scores with most other bacterial sequences. Other singular Gram(−) bacterial sequences (TREPA, LEPIN, BORBU, THETH, CHLTR, and PORGI) and the two Cyanobacteria SYNY3 and SYNP7 score among themselves in the range 60–67 as they score with most other bacterial sequences including proteobacteria and Gram(+) in support of their classification as isolated and/or early branching bacterial lineages.

Mt HSP60 Sequences.

Apart from the outlier sequence PLAFG-Mt from the protist Plasmodium falciparum, Mt sequences mutually score in the range 47–65, the highest score being between the two animal sequences HUMAN-Mt and HELVI-Mt (nocturnal moth Heliothis virescens), and the lowest score being between HELVI-Mt and TRYCR-Mt. Comparisons of RICPR to Mt HSP60 sequences yield SSPA scores in the range 48–63, similar to the scores of classical α-proteobacteria vs. Mt (50–63). The HSP60 ORITS aligns to Mt sequences about 10 points lower (43).

Similarities and Differences in the Structural Domains.

As noted earlier, each GroEL monomer possesses three structural Domains: Apical, Intermediate, and Equatorial. Similarity comparisons of SSPA scores were ascertained separately for the sequences over each of the three structural domains of HSP60 (for detailed data, see http://gea.stanford.edu/ luciano/hsp60.sspa). The RICPR sequence similarities to Mt sequences parallel those of classical α-proteobacteria in Domains E and A but are lower in Domain I (SSPA values 62–68 vs. 64–73). For ORITS, similarity to Mt is about 10 points lower than that of α-proteobacteria in all three structural domains. ORITS has scores about equal to RICPR in Domain I but divergent in Domains A and E. Ehrlichia sequences, in comparisons with Mt sequences, are equivalent to Orientia in Domains A and I but are sharply divergent in Domain E, with assessments of similarities 20 points lower than for classical α-proteobacteria and lower than for any other group. The reduced similarities of HELVI-Mt with most other sequences can be attributed to its pronounced divergence within Domain A. The diminished matching exhibited by the ZYMMO sequence vs. classical αproteobacterial sequences can be explained by differences in Domain E, where ZYMMO vs. Mt scores are 41–48, individually lower than those of (β + γ)-proteobacteria, 45–54, and of α-proteobacteria, 49–60. Within Domain A, the ZYMMO sequence scores like a classical α-proteobacterium (ZYMMO/α-proteobacteria, 80–84). The high G + C Gram(+) Mycobacterium tuberculosis is lower in both Domains A and E but not in Domain I. Mycoplasma genitalium diverges in all three domains. PLAFG-Mt is the most deviant sequence in all three domains, with lowest SSPA scores in Domain E, generally less than 30 and marginally elevated SSPA scores in Domains A and I (30).

Multiple Alignment of HSP60 Consensus Group Sequences.

We used the multiple alignment program iteralign (20) to identify blocks of alignment among HSP60 sequences (see also http://gea.stanford.edu/luciano/hsp60.alignment). A consensus sequence is derived that best summarizes the residue composition of the alignment. All HSP60 sequences align with few indels. Actually, bacterial sequences match from the N terminus on (apart from one to three residues). By contrast, organelle sequences generally include an expanded N-terminal segment (presumably a peptide leader sequence) of variable length of 23–68 amino acids. The C-terminal region is unaligned or poorly aligned and generally contains repetitive elements (20).

Multiple alignments were determined separately for each of the following HSP60 groups: (i) six (β + γ)-proteobacteria; (ii) four classical α-proteobacteria; (iii) the single RICPR sequence; (iv) the “sister” Rickettsial sequences ORITS, EHRCH, and EHRRI; (v) seven singular Gram(−); (vi) five low C + G Gram(+); (vii) three high G + C Gram(+); (viii) two cyanobacteria; (ix) four Chl sequences; (x) five Mt (excluding PLAFG-Mt) + 1 Hy sequence. The sequence names are given in Table 1. The consensus from the multiple alignments of the consensuses produced the impressively high similarity 91% to the global individual consensus. Predictably (18), SSPA values among group consensuses sharpen the contrasts among individual and group sequences (Table 1). In particular, the Mt group aligns best with the classical α-proteobacteria (SSPA score 66) but registers very close scores (64, 65) to other Gram(−) group sequences. The consensus of the group ORITS, EHRCH, and EHRRI aligns best with RICPR (66) but is significantly lower with the consensus Mt sequence (55). Consistent with the endosymbiont hypothesis, the chloroplast sequences score about 73 with cyanobacterial sequences and <68 (typically much less) in comparisons with all other sequences.

Table 1.

SSPA values of HSP60 group consensuses

	1	2	3	4	5	6	7	8	9	10
6 β + γ proteobacteria	100	76	66	59	73	65	63	64	63	64
4 α-proteobacteria	76	100	75	63	74	67	62	64	64	66
RICPR	66	75	100	66	62	58	67	60	58	63
ORITS + 2 Ehrlichia's	59	63	66	100	61	58	51	54	55	55
7 singular Gram(−)	73	74	62	61	100	73	67	70	67	65
5 low G + C Gram(+)	65	67	58	58	73	100	69	68	68	60
3 high G + C Gram(+)	63	62	67	51	67	69	100	65	62	55
2 cyanobacteria	64	64	60	54	70	68	65	100	73	56
4 Chl sequences	63	64	58	55	67	68	62	73	100	59
5 Mt + 1 Hy sequence	64	66	63	55	65	60	55	56	59	100

Open in a new tab

SSPA similarities are between consensus sequences derived from the alignment of the following groups: (i) β + γ proteobacteria ECOLI, CHRVI, AMOPS, COXBU, BORPE, NEIFL; (ii) α-proteobacteria ZYMMO, CAUCR, AGRTU, BRAJA; (iii) RICPR; (iv) Divergent Rickettsiales ORITS, EHRCH, EHRRI; (v) singular Gram(−) CHLTR, PORGI, TREPA, LEPIN, BORBU, THETH; (vi) low G + C Gram(+) BACSU, STAEP, LACLA, CLOPE, CLOTM; (vii) high G + C Gram(+) STRCO, MYCLE, MYCTU; (viii) cyanobacteria SYNY3, SYNP7; (ix) chloroplast sequences PYRSA-Chl, GALSU-Chl, WHEAT-Chl, ARATH-Chl; (x) Mt and hydrogenosomal sequences HUMAN-Mt, HELVI-Mt, YEAST-Mt, MAIZE-Mt, TRYCR-Mt, TRIVA-Hy. See legend to Fig. 1 for full names.

Duplications of the HSP60 Gene.

Many species contain multiple copies of HSP60, a priori paralogs, with high mutual SSPA values. Notably, several α-proteobacterial genomes feature multiple HSP60 copies. Specifically, Rhizobium meliloti (RHIME) possesses at least five distinct HSP60 sequences (21, 22) of mutual SSPA scores in the range 75–95; Bradyrhizobium japonicum (BRAJA) contains at least five distinct HSP60 sequences (23) with high SSPA scores; and Rhodobacter sphaeroides (RHOSH) contains two HSP60 sequences of about 75% identity. Multiple HSP60 sequences also exist in the Cyanobacterium Synechocystis sp (two copies), in the Gram(+) Streptomyces lividus (two), M. leprae (two), and M. tuberculosis (two), the respective pairs being all about 80% similar. Strikingly, of the aggregate proteobacterial collection, multiple HSP60 sequences have to date been identified only among α-proteobacteria.

The primitive aMt eukaryote T. vaginalis contains two very similar HSP60 sequences of mutual SSPA score 73. One of the proteins functions in the hydrogenosome organelle (24), and the other is of unknown localization. To date, only a single copy of HSP60 has been found in Giardia lamblia (25) and in Entamoeba histolytica (26) with low similarities to eubacterial and higher eukaryote organellar sequences, in the range 30–35% and 40–45% identity, respectively. P. falciparum carries at least two HSP60 sequences (27, 28). One is Mt like, and the other shows highest similarity (42%) to the GALSO (red alga) HSP60 sequence.

A version of HSP60 binds to the rubisco protein in the chloroplast. Multiple such sequences have been to date identified in Arabidopsis thaliana, in the pea plant, in Brassica, and in the green alga Chlamydomonas reinhardii. The Tcp1 and thermosome proteins are recognized as the eukaryotic and archaeal homologues of HSP60 although their SSPA scores with respect to eubacterial HSP60 range only from 0 to 10% identity. In surveying the current complete genomes, we found that A. fulgidus, A. pernix, and M. thermoautotrophicum contain two thermosome sequences, whereas M. jannaschii contains a single sequence. Sulfolobus acidocaldarius possesses at least two homologues.

Expression of GroEL Genes.

The pattern of expression of GroEL genes has been well studied in BRAJA (23). BRAJA GroEL-3 (the third copy, 58 kDa) synthesis is coregulated with the nitrogen fixation system, whose genes are transcribed from a σ⁵⁴ promoter. It is possible that this GroEL protein is a requirement for NifA folding and nitrogenase assembly (29). GroEL-2 and GroEL-4 apparently are constitutively expressed. GroEL-1 is under heat shock control. GroEL-5 expression is regulated independently of NifA by cellular oxygen conditions. Moreover, GroEL-2, GroEL-4, and GroEL-5 can functionally replace each other (23). Thus, the five GroEL/GroES complexes are differentially regulated, allowing a flexible response to varying environmental conditions and physiological needs. In RHIME, the DNA-binding activity of NodD requires the product of GroELc (22). This gene is encoded on one of the two megaplasmids of RHIME, possibly allowing for lateral transfer during plasmid movements.

Why Multiple Copies of a Gene?

(i) Gene duplication conceivably can increase the expression level of the encoded protein at various times and places and under special conditions. (ii) The duplicated copies can functionally diverge or participate in heterooligomer complexes. This appears to be the nature of some HSP60 structures, for example, HSP60 rubisco-binding proteins in plastids. Duplicated genes freed from functional constraints can evolve faster and adapt to new needs. (iii) Duplication may provide insurance against extreme fluctuation of expression and against mutation or other detrimental events. (iv) The genome may be simply large enough to tolerate duplicated benign genes. Mechanisms for duplication and mobility include transposition, recombination, conjugation, transformation, and transduction.

It is accepted that DNA sequences can be laterally transferred between organisms (30, 31) and have been transferred in evolution from cytoplasmic organelles to the nucleus and/or between organelles (32). The presence of multiple copies suggests mobility of HSP60 genes in α-proteobacteria. The multiplicity of HSP60 sequences attests to its dynamic character and may suggest high intrinsic potential for lateral transfer from some α-proteobacterium.

Other Protein Sequence Comparisons.

It is useful to summarize SSPA similarity values for various classes of protein sequences, emphasizing classical α- proteobacteria, RICPR, and Mt sequences. A manifest conclusion emerging from the data is the lack of a prokaryotic group that is consistently most similar to animal Mt sequences (see Fig. 2).

Proteins encoded in animal Mt genomes.

For cytochrome oxidase I (CoxI), CoxIII, ATPase F₁, cytochrome c, and NADH units 2 and 4, the Mt sequences match better with at least some classical α-sequences than with RICPR. For the proteins NADH 5, NADH 11, and cytochrome b sequences, similarity attainments of α-proteobacteria vs. Mt are about the same as for RICPR vs. Mt. The proteins CoxII and NADH 7 show the alignment inequality RICPR vs. Mt > α vs. Mt.

Mt aminoacyl-tRNA synthetases.

Arginyl: yeast Mt vs. γproteobacterial sequences reach 19–22% identity, 3-fold better than the comparisons of yeast Mt vs. RICPR showing only about 7% identity. Aspartyl: yeast Mt vs. BORBU attains the SSPA level 31, which dominates yeast Mt vs. RICPR 22. Threonyl: fungal Mt sequences from Schizosaccharomyces pombe, Saccharomyces cerevisiae, and Candida albicans compared with γ-proteobacteria and BACSU carry 30–36% identity but in alignment to RICPR, 27–29% identity. Tyrosyl: BACSU matches with yeast Mt at 38% identity, whereas RICPR vs. yeast Mt have 28% identity (data not shown). Glutamyl: yeast Mt vs. BACSU carries 31% identity compared with Mt vs. RICPR of 22% identity.

Chaperones and proteases functioning in the Mt (data not shown).

For the Lon degradosome gene: (BACSU vs. Mt) 38–40% > (α vs. Mt) 34–38% and RICPR vs. Mt 33–35% identity. For the metallo protease FtsH: Mt sequences tentatively match best with Streptococcus pneumoniae with substantially diminished identity to RICPR. ClpP: RICPR vs. γ, 70–75%, RICPR vs. α, 62–66%, indicating for this degradation protein that RICPR is more similar to γ-types than to α-types.

Other proteins.

For the proteins DnaA, elongation factor EF-Tu (data not shown), superoxide dismutase (SOD), and the RNA polymerase unit β', RICPR matches to γ bacterial sequences significantly better than in matching to classical α-types. For example, with respect to the detoxification protein SOD, RICPR vs. γ-types align at 50–56% identity, but RICPR vs. RHOCA shows only 40% identity. The σ 70 factor aligns RECAM-Mt poorly with RICPR and α-proteobacteria.

Functional Specialization and Selective Convergence of HSP60 Proteins.

Although organisms of recent common origin are expected to exhibit higher sequence similarity, evolutionary relations can be obscured by convergence, lateral gene transfer, variable mutation rates, site variability, etc. Moreover, proteins may be subject to variable selective pressures, depending on physiological and/or ecological conditions. This can cause sequence divergence at different rates in different organisms (the problem of unequal evolutionary rates). A few characters suggest some form of functional differentiation among HSP60 proteins: (i) Many HSP60 sequences include the iterated C-terminal tripeptide GGM. The function of these tails is unknown, but similar iterations are present as C-terminal elements in the unrelated HSP70 chaperone proteins. Some HSP60 proteins, however, do not have these repeats but incorporate instead C-terminal tails emphasizing multiple histidines. Notably, MYCTU (Mycobacterium tuberculosis) has two HSP60 genes, one coding for a protein with C-terminal GGM elements and a second with multiple histidines. It is possible that the switch between these types corresponds to different functional specializations. (ii) Among the ATP/ADP-binding positions of HSP60, position 52 is a conserved lysine (K) in proteobacteria, some nonproteobacterial Gram-negative and Mt sequences, but is a pervasive asparagine (N) in Gram-positive sequences and Cyanobacteria. Many other positions surrounding the ATP/ADP-binding pocket switch from highly conserved residues to other residues in some sequences. Switch positions may suggest different mechanisms of coupling ATP hydrolysis to the substrate refolding process. (iii) There are differences in the ways the GroEL protein complex assembles and functions. For example, in mitochondria, cpn60 can function as a single ring, whereas two rings are needed in E. coli (33).

Convergent evolution has never been proved between molecular sequences (e.g., ref. 34). However, the same environmental conditions that may suggest the evolution of Rickettsiales into mitochondria may also suggest convergent evolution of the HSP60 and other sequences. The endosymbiotic lifestyle of mitochondria and parasitic lifestyle of Rickettsia correspond to specializations of their metabolism that result in reduction and simplification of their genomic and protein content. Consider also switch positions with respect to ATP-binding sites (20) that may relate to convergent function specialization.

HSP60s of mitochondria and of many Rickettsiales appear to have been subjected to fast evolutionary rates, as reflected by the fact that they generate long branches in phylogenetic tree reconstructions (e.g., ref. 7). At the same time, the estimated common ancestors of these groups are separated by much shorter distances. Similarities involving Mt sequences and Rickettsial sequences may be sufficiently low to place them in the region where phylogenetic information is largely lost. In clustering Mt with Rickettsiales, the phenomenon of long branch attraction may be at play.

Perspectives.

Methods have been developed (e.g., refs. 8, 35) that seek to reconstruct evolutionary relations by building tree-like topologies, where branch lengths are putatively proportional to evolutionary time, and the branch topology reflects events of speciation. However, tree-making procedures rest on uncertain assumptions and problematic approximations (36). Results are often influenced by many factors, including problems with alignments, definition of homology, lateral gene transfer, gene loss, data set, clustering method, and models of evolution. In particular, different protocols have been proposed to estimate evolutionary distances between pairs of sequences based on their sequence similarity.

Our analysis indicates that HSP60 contained in mitochondria is closest (by SSPA Analyses) to the classical α-proteobacteria and RICPR. Does this imply that the Mt is the remnant of an ancient α-proteobacterial organism? This inference is not supported by other genomic characters [e.g., genomic signature (15), other protein comparisons; see text]. Current studies of molecular evolution emphasize lateral gene transfer as a major evolutionary mechanism (14, 30, 31). Lateral transfer among all organisms (bacteria, fungi, plants, animals, protists, etc.) may be involved in promotion of new species (30). The ubiquitous role of lateral transfer in affecting and shaping prokaryotic and eukaryotic species is increasingly appreciated. For example, reacquisition and dissemination of antibiotic resistance genes via mobile genetic elements (e.g., conjugative plasmids, phages, free DNA, and transposons) is an established paradigm of lateral transfer.

Primitive organisms probably engaged in much reduction, acquisition, and lateral transfer of DNA, producing chimeric genomes. We propose that the nuclear encoded HSP60 sequences functioning in Mt are a result of lateral transfer and are probably derived from a classical α-proteobacterial progenitor. One of the requirements for the successful acquisition of a laterally transferred gene is its utility to the recipient organism. HSP60 proteins facilitate folding of a great variety of proteins, acting on substrates whose selection seems to be solely constrained by their size (37). One would expect that genes highly advantageous to the recipient organism bear potential for successful interspecies gene flow. From this perspective, the nonspecificity of HSP60 proteins to their targets makes them likely viable gene acquisitions. In this context, α-proteobacteria possess multiple features that suggest that they may be likely donors of HSP60 genes: (i) they possess unusual facility for HSP60 gene duplication and transposition, as indicated by the plethora of HSP60 sequence duplications among classical α-proteobacteria, in contrast to no paralogs of HSP60 sequences in the other clades (γ, β, δ, ɛ) of proteobacterial genomes; (ii) α-proteobacteria establish close spatial and functional relations, often endosymbiotic, with plant eukaryotic organisms. Lateral gene transfer provides a means of quick response to strong selection pressures reflected by virulence factors ranging from toxin production to immune evasions. Examples include Ti plasmids of Agrobacterium tumefaciens, nodulation plasmids of Rhizobium, and virulence plasmids of Shigella and Yersinia; (iii) copies of α-proteobacterial HSP60 genes have been found to reside on extrachromosomal elements of the α-proteobacterial genome, e.g., the megaplasmids pSyma and pSymb of R. meliloti (22).

Acknowledgments

We thank B. E. Blaisdell and A. M. Campbell for critical reading of the manuscript. This work was supported in part by National Institutes of Health Grants 5R01GM10452–36 and 5R01HG00335–12 and by National Science Foundation Grant DMS9704552–002.

Abbreviations

Mt: mitochondrial
SSPA: significant segment pair alignment

References

1.Karlin S, Mrázek J. J Bacteriol. 2000;182:5238–5250. doi: 10.1128/jb.182.18.5238-5250.2000. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Sigler P, Zhaohui X, Rye H S, Burston S G, Fenton W A, Horwich A L. Annu Rev Biochem. 1998;67:581–608. doi: 10.1146/annurev.biochem.67.1.581. [DOI] [PubMed] [Google Scholar]
3.Boisvert D C, Wang J, Otwinowski Z, Horwich A L, Sigler P B. Nat Struct Biol. 1996;3:170–177. doi: 10.1038/nsb0296-170. [DOI] [PubMed] [Google Scholar]
4.Xu Z, Horwich A L, Sigler P B. Nature (London) 1997;388:741–750. doi: 10.1038/41944. [DOI] [PubMed] [Google Scholar]
5.Budin K, Philippe H. Mol Biol Evol. 1998;15:943–956. doi: 10.1093/oxfordjournals.molbev.a026010. [DOI] [PubMed] [Google Scholar]
6.Gupta R S. Microbiol Mol Biol Rev. 1998;62:1435–1491. doi: 10.1128/mmbr.62.4.1435-1491.1998. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Viale A M, Arakaki A K. FEBS Lett. 1994;341:146–151. doi: 10.1016/0014-5793(94)80446-x. [DOI] [PubMed] [Google Scholar]
8.Saitou N, Nei M. Mol Biol Evol. 1987;4:406–425. doi: 10.1093/oxfordjournals.molbev.a040454. [DOI] [PubMed] [Google Scholar]
9.Tamura A, Ohashi N, Urakami H, Mijamura S. Int J Syst Bacteriol. 1995;45:589–591. doi: 10.1099/00207713-45-3-589. [DOI] [PubMed] [Google Scholar]
10.Andersson S G, Zomorodipour A, Andersson J O, Sicheritz-Ponten T, Alsmark U C, Podowski R M, Naslund A K, Eriksson A S, Winkler H H, Kurland C G. Nature (London) 1998;396:133–140. doi: 10.1038/24094. [DOI] [PubMed] [Google Scholar]
11.Gray M W, Burger G, Franz Lang B. Science. 1999;283:1476–1481. doi: 10.1126/science.283.5407.1476. [DOI] [PubMed] [Google Scholar]
12.Andersson S G, Kurland C G. Curr Opin Microbiol. 1999;2:535–541. doi: 10.1016/s1369-5274(99)00013-2. [DOI] [PubMed] [Google Scholar]
13.Martin W, Müller M. Nature (London) 1998;392:37–41. doi: 10.1038/32096. [DOI] [PubMed] [Google Scholar]
14.Lopez-Garcia P, Moreira D. Trends Biochem Sci. 1999;24:88–93. doi: 10.1016/s0968-0004(98)01342-5. [DOI] [PubMed] [Google Scholar]
15.Karlin S, Brocchieri L, Mrázek J, Campbell A M, Spormann A M. Proc Natl Acad Sci USA. 1999;96:9190–9195. doi: 10.1073/pnas.96.16.9190. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Moreira D, Lopez-Garcia P. J Mol Evol. 1998;47:517–530. doi: 10.1007/pl00006408. [DOI] [PubMed] [Google Scholar]
17.Karlin S, Weinstock G M, Brendel V. J Bacteriol. 1995;177:6881–6893. doi: 10.1128/jb.177.23.6881-6893.1995. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Brocchieri L, Karlin S. J Mol Biol. 1998;276:249–264. doi: 10.1006/jmbi.1997.1527. [DOI] [PubMed] [Google Scholar]
19.Brendel V. Adv Comput Biol. 1996;2:121–160. [Google Scholar]
20.Brocchieri L, Karlin S. Protein Sci. 2000;9:476–486. doi: 10.1110/ps.9.3.476. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Rusanganwa E, Gupta R S. Gene. 1993;126:67–75. doi: 10.1016/0378-1119(93)90591-p. [DOI] [PubMed] [Google Scholar]
22.Ogawa J, Long R. Genes Dev. 1995;9:714–729. doi: 10.1101/gad.9.6.714. [DOI] [PubMed] [Google Scholar]
23.Fischer H M, Babst M, Kaspar T, Acuña G, Arigoni F, Hennecke H. EMBO J. 1993;12:2901–2912. doi: 10.1002/j.1460-2075.1993.tb05952.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Bui E T N, Bradley P J, Johnson P. Proc Natl Acad Sci USA. 1996;93:9651–9656. doi: 10.1073/pnas.93.18.9651. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Roger A J, Svard S G, Tovar J, Clark C G, Smith M W, Gillin F D, Sogin M L. Proc Natl Acad Sci USA. 1998;95:229–234. doi: 10.1073/pnas.95.1.229. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Clark C G, Roger A J. Proc Natl Acad Sci USA. 1995;95:6518–6521. doi: 10.1073/pnas.92.14.6518. [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Holloway S P, Min W, Inselburg J W. Mol Biochem Parasitol. 1994;64:25–32. doi: 10.1016/0166-6851(94)90131-7. [DOI] [PubMed] [Google Scholar]
28.Syin C, Goldman N D. Mol Biochem Parasitol. 1996;79:13–19. doi: 10.1016/0166-6851(96)02633-3. [DOI] [PubMed] [Google Scholar]
29.Govezensky D, Greener T, Segal G, Zamir A. J Bacteriol. 1991;173:6339–6346. doi: 10.1128/jb.173.20.6339-6346.1991. [DOI] [PMC free article] [PubMed] [Google Scholar]
30.de la Cruz I, Davies I. Trends Microbiol. 2000;8:128–133. doi: 10.1016/s0966-842x(00)01703-0. [DOI] [PubMed] [Google Scholar]
31.Campbell A M. Theor Popul Biol. 2000;57:71–77. doi: 10.1006/tpbi.2000.1454. [DOI] [PubMed] [Google Scholar]
32.Martin W, Herrmann R G. Plant Physiol. 1998;118:9–17. doi: 10.1104/pp.118.1.9. [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Nielsen K L, Cowan N J. Mol Cell. 1998;2:93–99. doi: 10.1016/s1097-2765(00)80117-3. [DOI] [PubMed] [Google Scholar]
34.Doolittle R F. Trends Biochem Sci. 1994;19:15–18. doi: 10.1016/0968-0004(94)90167-8. [DOI] [PubMed] [Google Scholar]
35.Fitch W M. Syst Zool. 1997;20:406–416. [Google Scholar]
36.Brocchieri, L. (2000) Theor. Popul. Biol., in press. [DOI] [PubMed]
37.Houry W A, Frischman D, Eckerskorn C, Lottspelch F, Hartl F U. Nature (London) 1999;402:147–154. doi: 10.1038/45977. [DOI] [PubMed] [Google Scholar]

[B1] 1.Karlin S, Mrázek J. J Bacteriol. 2000;182:5238–5250. doi: 10.1128/jb.182.18.5238-5250.2000. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B2] 2.Sigler P, Zhaohui X, Rye H S, Burston S G, Fenton W A, Horwich A L. Annu Rev Biochem. 1998;67:581–608. doi: 10.1146/annurev.biochem.67.1.581. [DOI] [PubMed] [Google Scholar]

[B3] 3.Boisvert D C, Wang J, Otwinowski Z, Horwich A L, Sigler P B. Nat Struct Biol. 1996;3:170–177. doi: 10.1038/nsb0296-170. [DOI] [PubMed] [Google Scholar]

[B4] 4.Xu Z, Horwich A L, Sigler P B. Nature (London) 1997;388:741–750. doi: 10.1038/41944. [DOI] [PubMed] [Google Scholar]

[B5] 5.Budin K, Philippe H. Mol Biol Evol. 1998;15:943–956. doi: 10.1093/oxfordjournals.molbev.a026010. [DOI] [PubMed] [Google Scholar]

[B6] 6.Gupta R S. Microbiol Mol Biol Rev. 1998;62:1435–1491. doi: 10.1128/mmbr.62.4.1435-1491.1998. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B7] 7.Viale A M, Arakaki A K. FEBS Lett. 1994;341:146–151. doi: 10.1016/0014-5793(94)80446-x. [DOI] [PubMed] [Google Scholar]

[B8] 8.Saitou N, Nei M. Mol Biol Evol. 1987;4:406–425. doi: 10.1093/oxfordjournals.molbev.a040454. [DOI] [PubMed] [Google Scholar]

[B9] 9.Tamura A, Ohashi N, Urakami H, Mijamura S. Int J Syst Bacteriol. 1995;45:589–591. doi: 10.1099/00207713-45-3-589. [DOI] [PubMed] [Google Scholar]

[B10] 10.Andersson S G, Zomorodipour A, Andersson J O, Sicheritz-Ponten T, Alsmark U C, Podowski R M, Naslund A K, Eriksson A S, Winkler H H, Kurland C G. Nature (London) 1998;396:133–140. doi: 10.1038/24094. [DOI] [PubMed] [Google Scholar]

[B11] 11.Gray M W, Burger G, Franz Lang B. Science. 1999;283:1476–1481. doi: 10.1126/science.283.5407.1476. [DOI] [PubMed] [Google Scholar]

[B12] 12.Andersson S G, Kurland C G. Curr Opin Microbiol. 1999;2:535–541. doi: 10.1016/s1369-5274(99)00013-2. [DOI] [PubMed] [Google Scholar]

[B13] 13.Martin W, Müller M. Nature (London) 1998;392:37–41. doi: 10.1038/32096. [DOI] [PubMed] [Google Scholar]

[B14] 14.Lopez-Garcia P, Moreira D. Trends Biochem Sci. 1999;24:88–93. doi: 10.1016/s0968-0004(98)01342-5. [DOI] [PubMed] [Google Scholar]

[B15] 15.Karlin S, Brocchieri L, Mrázek J, Campbell A M, Spormann A M. Proc Natl Acad Sci USA. 1999;96:9190–9195. doi: 10.1073/pnas.96.16.9190. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B16] 16.Moreira D, Lopez-Garcia P. J Mol Evol. 1998;47:517–530. doi: 10.1007/pl00006408. [DOI] [PubMed] [Google Scholar]

[B17] 17.Karlin S, Weinstock G M, Brendel V. J Bacteriol. 1995;177:6881–6893. doi: 10.1128/jb.177.23.6881-6893.1995. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B18] 18.Brocchieri L, Karlin S. J Mol Biol. 1998;276:249–264. doi: 10.1006/jmbi.1997.1527. [DOI] [PubMed] [Google Scholar]

[B19] 19.Brendel V. Adv Comput Biol. 1996;2:121–160. [Google Scholar]

[B20] 20.Brocchieri L, Karlin S. Protein Sci. 2000;9:476–486. doi: 10.1110/ps.9.3.476. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B21] 21.Rusanganwa E, Gupta R S. Gene. 1993;126:67–75. doi: 10.1016/0378-1119(93)90591-p. [DOI] [PubMed] [Google Scholar]

[B22] 22.Ogawa J, Long R. Genes Dev. 1995;9:714–729. doi: 10.1101/gad.9.6.714. [DOI] [PubMed] [Google Scholar]

[B23] 23.Fischer H M, Babst M, Kaspar T, Acuña G, Arigoni F, Hennecke H. EMBO J. 1993;12:2901–2912. doi: 10.1002/j.1460-2075.1993.tb05952.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B24] 24.Bui E T N, Bradley P J, Johnson P. Proc Natl Acad Sci USA. 1996;93:9651–9656. doi: 10.1073/pnas.93.18.9651. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B25] 25.Roger A J, Svard S G, Tovar J, Clark C G, Smith M W, Gillin F D, Sogin M L. Proc Natl Acad Sci USA. 1998;95:229–234. doi: 10.1073/pnas.95.1.229. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B26] 26.Clark C G, Roger A J. Proc Natl Acad Sci USA. 1995;95:6518–6521. doi: 10.1073/pnas.92.14.6518. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B27] 27.Holloway S P, Min W, Inselburg J W. Mol Biochem Parasitol. 1994;64:25–32. doi: 10.1016/0166-6851(94)90131-7. [DOI] [PubMed] [Google Scholar]

[B28] 28.Syin C, Goldman N D. Mol Biochem Parasitol. 1996;79:13–19. doi: 10.1016/0166-6851(96)02633-3. [DOI] [PubMed] [Google Scholar]

[B29] 29.Govezensky D, Greener T, Segal G, Zamir A. J Bacteriol. 1991;173:6339–6346. doi: 10.1128/jb.173.20.6339-6346.1991. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B30] 30.de la Cruz I, Davies I. Trends Microbiol. 2000;8:128–133. doi: 10.1016/s0966-842x(00)01703-0. [DOI] [PubMed] [Google Scholar]

[B31] 31.Campbell A M. Theor Popul Biol. 2000;57:71–77. doi: 10.1006/tpbi.2000.1454. [DOI] [PubMed] [Google Scholar]

[B32] 32.Martin W, Herrmann R G. Plant Physiol. 1998;118:9–17. doi: 10.1104/pp.118.1.9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B33] 33.Nielsen K L, Cowan N J. Mol Cell. 1998;2:93–99. doi: 10.1016/s1097-2765(00)80117-3. [DOI] [PubMed] [Google Scholar]

[B34] 34.Doolittle R F. Trends Biochem Sci. 1994;19:15–18. doi: 10.1016/0968-0004(94)90167-8. [DOI] [PubMed] [Google Scholar]

[B35] 35.Fitch W M. Syst Zool. 1997;20:406–416. [Google Scholar]

[B36] 36.Brocchieri, L. (2000) Theor. Popul. Biol., in press. [DOI] [PubMed]

[B37] 37.Houry W A, Frischman D, Eckerskorn C, Lottspelch F, Hartl F U. Nature (London) 1999;402:147–154. doi: 10.1038/45977. [DOI] [PubMed] [Google Scholar]

PERMALINK

Heat shock protein 60 sequence comparisons: Duplications, lateral transfer, and mitochondrial evolution

Samuel Karlin

Luciano Brocchieri

Abstract

Figure 1.

Methods

SSPA.

Data.

Figure 2.

Results and Discussion

SSPA Values Among HSP60 Bacterial Sequences.

Mt HSP60 Sequences.

Similarities and Differences in the Structural Domains.

Multiple Alignment of HSP60 Consensus Group Sequences.

Table 1.

Duplications of the HSP60 Gene.

Expression of GroEL Genes.

Why Multiple Copies of a Gene?

Other Protein Sequence Comparisons.

Proteins encoded in animal Mt genomes.

Mt aminoacyl-tRNA synthetases.

Chaperones and proteases functioning in the Mt (data not shown).

Other proteins.

Functional Specialization and Selective Convergence of HSP60 Proteins.

Perspectives.

Acknowledgments

Abbreviations

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Heat shock protein 60 sequence comparisons: Duplications, lateral transfer, and mitochondrial evolution

Samuel Karlin

Luciano Brocchieri

Abstract

Figure 1.

Methods

SSPA.

Data.

Figure 2.

Results and Discussion

SSPA Values Among HSP60 Bacterial Sequences.

Mt HSP60 Sequences.

Similarities and Differences in the Structural Domains.

Multiple Alignment of HSP60 Consensus Group Sequences.

Table 1.

Duplications of the HSP60 Gene.

Expression of GroEL Genes.

Why Multiple Copies of a Gene?

Other Protein Sequence Comparisons.

Proteins encoded in animal Mt genomes.

Mt aminoacyl-tRNA synthetases.

Chaperones and proteases functioning in the Mt (data not shown).

Other proteins.

Functional Specialization and Selective Convergence of HSP60 Proteins.

Perspectives.

Acknowledgments

Abbreviations

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases