Abstract
Background
Elongation factor G (EFG) is a core translational protein that catalyzes the elongation and recycling phases of translation. A more complex picture of EFG's evolution and function than previously accepted is emerging from analyzes of heterogeneous EFG family members. Whereas the gene duplication is postulated to be a prominent factor creating functional novelty, the striking divergence between EFG paralogs can be interpreted in terms of innovation in gene function.
Methodology/Principal Findings
We present a computational study of the EFG protein family to cover the role of gene duplication in the evolution of protein function. Using phylogenetic methods, genome context conservation and insertion/deletion (indel) analysis we demonstrate that the EFG gene copies form four subfamilies: EFG I, spdEFG1, spdEFG2, and EFG II. These ancient gene families differ by their indispensability, degree of divergence and number of indels. We show the distribution of EFG subfamilies and describe evidences for lateral gene transfer and recent duplications. Extended studies of the EFG II subfamily concern its diverged nature. Remarkably, EFG II appears to be a widely distributed and a much-diversified subfamily whose subdivisions correlate with phylum or class borders. The EFG II subfamily specific characteristics are low conservation of the GTPase domain, domains II and III; absence of the trGTPase specific G2 consensus motif “RGITI”; and twelve conserved positions common to the whole subfamily. The EFG II specific functional changes could be related to changes in the properties of nucleotide binding and hydrolysis and strengthened ionic interactions between EFG II and the ribosome, particularly between parts of the decoding site and loop I of domain IV.
Conclusions/Significance
Our work, for the first time, comprehensively identifies and describes EFG subfamilies and improves our understanding of the function and evolution of EFG duplicated genes.
Introduction
Gene duplication is postulated to have played an important role in prokaryotic evolution; the divergence accumulated in the sequences of new gene copies could be considered as a major contribution to the evolution of novel gene functions [1], [2], [3]. Complete genome sequences have been surveyed for trGTPases [4], [5] but present knowledge does not include systematically structured information concerning EFG duplications in bacteria.
Elongation factor G (EFG) is an indispensable protein present in bacteria (EFG), archea (aEF2), and eukaryotes (eEF2) [6]. Data gathered since the 1960s concerning EFG are mainly based on the Escherichia coli (E. coli) model system [7], [8]. EFG is the translocase of translation, it catalyzes the movement of the peptidyl-tRNA from the A-site to the P-site and deacetylated tRNA from the P-site to the E-site of the ribosome [9], [10]. In addition, EFG together with ribosome recycling factor (RRF) participates in the disassembly of the post-termination ribosomal complex [11], [12]. These EFG functions, catalyzing translocation and ribosome recycling, are indispensable to cells.
EFG belongs to the translational GTPase (trGTPase) superfamily, whose bacterial members (IF-2, EF-Tu, EFG, SelB, CysN, RF3, TypA/BipA, LepA, Tet/RPP) are associated with diverse biological roles [13], [14], [15], [16]. Four large families, for which an ancestral protein existed in the last universal common ancestor (LUCA), can be identified [17]. The members of the EFG/EF2 family (EFG, TypA/BipA, LepA, RF3, and Tet/RPP) are successful descendants of the functional diversification resulting from gene duplications.
It is believed that highly expressed genes evolve slowly and that their duplication is avoided or counter-selected, which could be related to the unique structural or functional features that constrain their sequences [18], [19]. However, data obtained from complete bacterial genomes have demonstrated that two highly expressed trGTPases genes, tuf (EF-Tu) and fus (EFG), are often represented by multiple copies [4], [20]. Moreover, EF-Tu duplicates are restricted to a few phylogenetic groups (Proteobacteria, Thermus-Deinococcus and class Clostridia), whereas genomes containing duplicate genes for EFG are represented among all phyla [5]. Compared with EF-Tu, where both copies are almost identical due to gene conversion between paralogues [21], the EFG gene family is significantly divergent with the paralogues sharing approximately 30–40% identity [5]. In order to investigate how selective pressures avoid or favor divergence act on EFG duplicate genes, EFG subfamilies were identified and characterized. Phylogenetic reconstruction of bacterial EFGs revealed that during the course of evolution EFG gene multiplications have evolved under differential selective pressures, resulting in four distinct subfamilies: EFG I; spdEFG1; spdEFG2; and EFG II.
Despite of the fact, that the great potential of the gene duplication as the process involved in creating biological novelty is well known, there is still not enough information concerning the mechanisms responsible for creating functional divergence. Recently, the functional divergence of different EFG gene duplicates has attracted much attention; independent studies have revealed that EFG functions vary within the EFG family. For example, Connell et al. demonstrated that EF-G-2 in Thermus thermophilus binds and hydrolyzes GTP and is active in poly(Phe) synthesis [22]. Seshadri et al. demonstrated that MsmEF-G-2 in Mycobacterium smegmatis binds guanine nucleotides but lacks ribosome-dependent GTPase activity characteristic of EFGs [23]. Another study demonstrated that translocation and ribosome recycling, two functions catalyzed by EFG, have been split between EFG paralogues in Borrelia burgdorferi [24]. Therefore, the EFG family provides an interesting example of the fate of a duplicated gene and could be used as a model for in-depth study of changes that arise through gene duplication and divergence.
One of the aims of this study concerning gene duplications was to detect the rearrangements in functional regions of EFG that could be involved in creating altered functions on the same structural template. A large fraction of EFG duplications that have not previously been described were investigated as a separate EFG subfamily (the EFG II subfamily). This group of EFG duplications was chosen owing to its wide distribution among all bacterial species and a high degree of divergence, which could be accompanied by functional novelty. The detailed analysis of the EFG II subfamily is essential for understanding how the duplication events contribute to evolutionary advantage.
Results and Discussion
Identification and characterization of EFG subfamilies
An initial set of 305 complete genomes was used to identify duplications of EFG genes. We focused on the determination of EFG subfamilies. Therefore, data from genomes with a single EFG gene were excluded from this analysis, and the first set of sequences (214 EFG sequences) was limited to the 99 genomes that exhibited multiple EFGs. Phylogenetic trees for determining EFG subfamilies were constructed using Bayesian inference (BI) and maximum likelihood (ML) methods. We show that EFG duplicate genes form within the phylogenetic tree four subfamilies: the EFG I subfamily; the spdEFG1 subfamily; the spdEFG2 subfamily; and the EFG II subfamily (Figure 1).
Two additional types of evidence, conserved insertions or deletions in sequence alignment (conserved indels) and genome context conservation confirmed that two of the EFG subfamilies were distinct groups. Firstly, the conserved genome context characterized the EFG I subfamily, the EFG coding fus gene being located in the str operon. The str operon of E. coli contains the genes for ribosomal proteins S12 (rpsL), S7 (rpsG), and elongation factors EFG (fus), and EF-Tu (tuf) [25]. The genome context conservation analysis was performed on the initial set of 305 genomes; genomes with a single EFG gene and those with multiplied EFG genes were included. Secondly, the indel analysis demonstrated that the spdEFG1 has a specific three amino acid insertion with a consensus “KDG” in the switch I region (Figure S1). This conserved insertion was used to resolve the evolutionary history of the spdEFG1 genes.
We have found that the majority of bacteria studied (97%) have at least one gene for EFG I (Figure 2). The EFG I tree is provided in Figure S2. We highlight that where there is a single EFG gene in a genome it belongs to the EFG I subfamily without exceptions (Figure 2). These findings are consistent with EFGs functional importance in the cell. We note that in E. coli, for which there are clear and experimentally well-characterized descriptions of EFG function(s), there is a single EFG gene. The EFG I gene normally resides in the str operon (Figure 2). Therefore, the assumption is that after gene duplication, the original copy of the fus gene (fusA), which maintains original genome context, evolves under similar constraints in all bacteria and remains stable throughout evolution. However, there are additional EFG I genes that are acquired by LGT or recent duplications and do not reside in the str operon (see below).
The distribution of spdEFG subfamilies (spdEFG1 and spdEFG2) is restricted to three taxonomic divisions: S pirochaetes, P lanctomycetes and δ -proteobacteria (Figure 2). The prefix “spd” is composed from the first letters of taxonomic divisions where these subfamilies were found [26]. The most striking feature of the spdEFG1 and spdEFG2 is their co-occurrence in the same genome if there is no gene for EFG I present in that genome (Figure 2). It has been shown previously that spdEFG1 and spdEFG2 form distinct groups with the mitochondrial EFGs mtEFG1 and mtEFG2, respectively [26]. In cells that lack EFG I (in the phyla Spirochaetes, Planctomycetes, and in various species of δ-proteobacteria), the essential functions of EFG I are thought to be carried out by spdEFG1 and spdEFG2. This view is consistent with the recent work of Suematsu et al. who showed that in B. burgdorferi the functions of bacterial EFG are split between EFG paralogues [24]. Similarly, Tsuboi et al. demonstrated that the two functions of bacterial EFG are divided between mtEFG1 and mtEFG2 in human mitochondria [27].
This is the first time that the EFG II subfamily has been characterized as a separate EFG subfamily. Some members of the EFG II subfamily have been recognized by genome annotators as “EFG-2” or “EFG-Like”, and there are two clusters of diverged EFGs (clustering threshold 50% of identity) named “EFG-Like” in Uniprot/KB. These clusters are composed of diverged α-proteobacteria/Cyanobacteria (UniRef50_Q55421) and Actinomycetes (UniRef50_O07170) sequences, which were identified as belonging to the EFG II subfamily in the present study. EFG II sequences comprise the most numerous group of EFG duplicate genes in bacteria. The data presented here demonstrate that in the EFG phylogenetic tree the EFG II subfamily forms a separate branch, which is strongly supported by the high maximum likelihood bootstrap percentage (MLBP 100) and the Bayesian inference posterior probability (BIPP 1.0) (Figure 1).
The EFG II subfamily is highly divergent in its primary sequence; only 18% of positions were conserved within the EFG II subfamily compared with 52% overall conservation within the EFG I subfamily. In contrast to other EFG subfamilies (EFG I, spdEFG1, spdEFG2), a tendency towards an increased rate of evolution (Figure 1) and a vastly increased number of indels were evident in the EFG II subfamily (Figure S1). This could explain why the EFG II gene is always accompanied by another EFG, predominantly EFG I (Figure 2).
The emergence and distribution of EFG subfamilies
EFG subfamilies have emerged from ancient duplications
Well-established phylogenetic methods have demonstrated that EFG duplicate genes form four distinct subfamilies (see above). Three independent observations support the hypothesis that the four EFG subfamilies are the result of ancient duplication. Firstly, deep branches on the EFG phylogenetic tree indicate early divergence from one another (Figure 1). Secondly, the monophyly of spdEFG1 with mtEFG1 provides evidence for a common origin for these proteins [26]. Thirdly, the presence of EFG II in almost all phyla (Figure 2) suggests that the duplication event that gave rise to the EFG II subfamily occurred early in prokaryotic evolution. As the branching order of EFG subfamilies is not unambiguously determined, it complicates the picture of how EFG subfamilies emerged. Therefore, it would be intriguing to question how many gene duplications directly gave rise to those ancient subfamilies, and at which evolutionary stage they apparently took place. However, determining which is the most ancient subfamily of EFG gene duplication(s), and their exact branching order relating to that family, remains outside the scope of current research.
Recent duplications and LGT in EFG subfamilies
Using current data of complete genome sequences we analyzed how recent duplications and cases of lateral gene transfer (LGT) contribute to EFG subfamilies. Interestingly, recent duplications and LGT between phyla/classes that gave rise to an additional gene have shaped the EFG I subfamily but not the EFG II subfamily (Figure 2). The occurrence of an EFG I type EFG gene outside the str operon in class γ-proteobacteria indicates a successful fixation of sequence(s) acquired laterally, although not all species from γ-proteobacteria share this extra EFG copy (Figure 2). Another single LGT case was detected in Cyanobacteria (Figure 2 and Figure S2). Unfortunately, the role of LGT in spdEFGs could not be resolved owing to the limited number of complete genomes with spdEFG coding genes. The phylogenetic analysis demonstrates that within the EFG I subfamily there is a small fraction of recent duplications (Figure 2 and Figure S2). Recent duplications were identified as the source of the second EFG gene in thirteen genomes (eleven in β-proteobacteria and two in γ-proteobacteria (family Pseudomonas) Figure S2). The high identity of EFG I gene copies at protein level indicates retention of original function but does not supply us with sufficient information to discuss about duplicates fate.
Predicting fate of recent duplicates
In order to investigate how our data will fit with gene duplicate retention models we used the model derived from data of small-scale gene duplications [28]. Input for these models are values of dS (substitutions per synonymous site) and dN (substitutions per non-synonymous site) calculated as a cumulative value for the pair of sequences by using PAL2NAL [29]. The figure of dN as the function of dS was reproduced by using equations (4) and (5) [28] where our data points were added (Figure S3). All data points exceed lower quintile of 90% confidence interval of neo-functionalization model for mammals. When consider bigger population size and shorter generation time, specific for bacteria (data points will shift close to mean trend-line), our data fit with mammals neo-functionalization model even better (Figure S3). The same models gene death rate function (Weibulll survival function) predicts that 95% of gene duplicates have lost before gene copy starts evolve under purifying selection [28]. To find most parsimonious place of gene duplication event on species tree for recent duplicates (in β- and γ-proteobacteria) a reconciliation tree between gene tree (EFG I) and species tree was computed by SoftParsMap [30]. Two alternative scenarios of gene duplications are mapped into improved species tree (Figure S4). The first scenario, one duplication/ten deletions, leads to situation where 86% of genomes have lost a duplicate and therefore supports neo-functionalization model (Weibulll survival function predicts 95% losses). The second scenario, two duplication/four deletions, reveals that only 16% of genomes have lost a duplicate (versus predicted 95%) and, therefore contradicts with the neo-functionalization model but supports gene dosage model. Moreover, high identity at protein level between paralogues is in agreement with the increased dosage model of gene duplicate retention what postulates increasing expression from a gene that is already highly expressed with little mutational capacity [31]. However, as far as precise position of gene duplication remains ambiguous and the only parameters we estimate are cumulative values of dN and dS, the prediction of the fate of recent duplicates can not be more precise.
It is likely that each of the four subfamilies has taken a different evolutionary route to functional diversification. Overall conservation of EFG I together with the widespread appearance of EFG II in bacteria suggests that the presence of both in the genome is the best evolutionary scenario for the majority of bacteria with duplicate EFG genes, in the light of compromise between conservation and innovation. EFG I is considered to be indispensable; any other subfamily alone cannot replace the core function performed by EFG I. However, a pair of spdEFGs can replace EFG I due to the split of EFG I functions between the paralogues (spdEFG1 and spdEFG2) [24]. Therefore, it is very probable that the function(s) that the spdEFG1 and the spdEFG2 perform is not as unique as the function(s) of the EFG II. In addition, the spdEFGs have not been distributed throughout bacteria as successfully as EFG I and the EFG II (Figure 2). The wide distribution of the EFG II subfamily evident today is likely to be an indication of the important role for this type of EFG duplication in the evolution of bacteria.
EFG II phylogeny reveals specific sub-subgroups supported by indels
BI and ML methods were utilized to reconstruct the phylogeny of 141 EFG II protein sequences, gathered from 590 genomes. The EFG II phylogeny is intriguing in two respects. First, relatively long branches, which are characteristic of the EFG II tree, refer to the high evolutionary speed of this gene family (Figure 3). Second, the phylogenetic signal on deeper nodes (phyla/class level) is erased. In addition, the deeper branching order is not supported by independent data as insertions/deletions (indels) (Figure 3).
Indels are considered to be rare genomic changes that are more stable and easier to interpret than point mutations. Alignment regions with gaps were designated as indel regions when the specific insertion or deletion was detected in five or more sequences. Each indel region was labeled by Roman numerals from I to XI (Figure 3, Figure 4 and Figure S1). Interestingly, indels were prevalent in the EFG II subfamily but uncommon in other EFG subfamilies. Insertions and deletions in EFG II were interpreted as independent data that support the EFG II phylogeny. In addition, the indels could be regions of interest for studying functional changes in EFG II. Generally, two types of indels can be distinguished within the EFG II subfamily: (1) indels with conserved length and/or composition common to groups of closely related sequences, or (2) regions where majority of EFG II sequences have indels. One of the two indel regions within EFG II, where most sequences have indels, is region III, which is located in the G′ subdomain. The second indel-rich position in EFG II is region VI between domains I and II. Both regions predominantly contain deletions, but in β– & γ–proteobacteria there is a non-specific insertion in indel region VI (Figure 3). The number of indels is directly related to distance from the root of the tree. In particular, more distant group of closely related sequences (α-proeobacteria/Cyanobacteria, Actionobacteria, β– & γ– proteobacteria) are highly diverged and possess a large number of indels; groups near the root of the tree (δ-proteobacteria, Clostridia) are less diverged (Figure 3). However, no conserved indels were common to two different groups of closely related sequences. Therefore, it is not possible to use indels to resolve the deep branching order.
On the EFG II phylogenetic tree, sequences from the same phyla/class form monophyletic groups with one exception (see below) (Figure 3). The structure of the EFG II phylogenetic tree reveals clearly distinguishable separate groups, sub-subgroups, among the EFG II subfamily (Figure 3). These sub-subgroups are identified by the phylogenetic methods used (BI and ML) and by independent data as conserved indels (Figure 3). Phyla/class names are used to designate the sub-subgroups. Generally, the borders of the sub-subgroups correlate with phyla/class borders; no sequences from another phylum contaminate the sub-subgroups. The one exception is the case when EFG II sequences from two different phyla (α-proteobacteria and Cyanobacteria) formed one sub-subgroup (Figure 3). The common origin of the EFG II sequences forming this sub-subgroup is well supported by both tree constructing methods (BIPP 1.0, MLBP 100), and by shared deletions in regions III and VI, and insertion in region XI (Figure 3).
No LGT was observed between the sub-subgroups i.e. EFG II is not transferred between sub-subgroups. It is probable that some sub-subgroup-specific constraints could exist that avoid transfer between sub-subgroups. However, EFG II gene transfer by LGT is evident within sub-subgroups. We found an LGT case inside the α-proteobacteria/Cyanobacteria sub-subgroup; the donor originating from Cyanobacteria has been transferred to a fraction of α-proteobacteria. This LGT case is supported by two indels, the six amino acid deletion in region VII, and insertion in region VIII (Figure 3). In addition, in a few cases the incongruence between the 16S rRNA tree and EFG II tree could be interpreted as LGT within sub-subgroups (two cases in β-proteobacteria and two cases in Actinobacteria) (Figure S5).
Comparison of the EFG I and EFG II subfamily
To reveal the characteristics peculiar to EFG II the variations in its primary sequence were analyzed by comparing domains and consensus elements in EFG I and EFG II. Here a short overview of EFG structural domains and assigned functions is presented.
EFG consists of five structurally well defined domains [32], [33] (Figure 4). The first domain (GTPase domain) binds and hydrolyzes GTP and is common to all P-loop GTPases. Domains III, IV and V mimic aatRNA when it is bound to EF-Tu*GTP in the ternary complex [34]. Domain III affects GTP hydrolysis and translocation [35], and domains IV and V are required for translocation but not for GTP hydrolysis [36], [37]. Translocation and ribosome dissociation into subunits at the end of translation, both functions of EFG, are GTP dependent [8], [38]. The GTPase domain (domain I) contains five consensus elements – G1, G2, G3, G4, and G5 – which form the GTP binding pocket [39], [40] (Figure 4). The overall architecture of the GTPase domain is the same in all P-loop GTPases. The translational GTPases have family specific consensus RGITI in G2 [39]. Between G4 and G5 there is an insertion with an approximate length of 90–120 aa, called the G′ subdomain [32], [41].
Domain conservation comparison between EFG I and EFG II
Domain conservation comparison between the EFG I and EFG II subfamilies revealed major differences in the first three domains (domains I, II and III) that affect GTP binding and hydrolysis. These domains are unequally conserved between EFG I and EFG II, whereas domain IV was equally conserved in both subfamilies. The conservation of domains I, II and III domains was 55%, 47% and 67%, and 11%, 13% and 15% in EFG I and EFG II respectively (Figure 5A). In addition, the relatively short domain V was found less conserved in EFG II. The high divergence of the EFG II subfamily is, therefore, predominantly related to the first three domains. Therefore, the first three domains in these subfamilies are evolving under different constraints, resulting in divergence within EFG II and homogeneity in the EFG I subfamily.
To exclude the possibility that the observed high divergence within the first three domains is caused by sub-subgroup-specific conservation of these domains, domain conservation analysis for sub-subgroups containing at least 20 sequences was carried out (Clostridia and α-proteobacteria & Cyanobacteria). The overall domain conservation was higher, and differences between domain conservations were smaller, among sub-subgroups. Furthermore, the EFG II subfamily-specific divergence of the first three domains was confirmed at the sub-subgroup level (Figure S6).
Motif conservation comparison between EFG I and EFG II
The GTPase domain consensus elements G1 (GhxxxGKT), G3 (DxPG), G4 (NKxD) and G5 (gSAx) were conserved in the EFG II subfamily. Moreover, the negatively charged region in the G′ subdomain, which interacts with the L7/L12 stalk on the ribosome and is crucial for inducing GTP hydrolysis [42], [43], [44], is also conserved (Figure 5). Intriguingly, the trGTPase-specific consensus RGITI in the G2 motif is relaxed in the EFG II subfamily. The redundant consensus in EFG II in the G2 motif is xxxSx. RGITI contains specific Thr, which coordinates the Mg2+ ion of the GTPase-bound guanine nucleotide [40]. In EFG II, Ser instead of Thr was conserved in the fourth position in the G2 motif. However, Ser instead of Thr has been observed in several P-loop GTPases (SelB – A. aeolicus; aIF-2-g - M. jannaschii; and the kinesin-myosin family) [17]. Therefore, it is concluded that the crucial position in the G2 motif (Thr/Ser), which is part of the universal ‘spring loaded’ switch mechanism for G proteins [45], is maintained.
To determine if the G2 motif conservation is maintained among closely related EFG II sequences the G2 motif variants of the EFG II sub-subgroups were analyzed. The EFG II sub-subgroup-specific G2 motif variants are as follows: RxxT/SI (d-proteobacteria), xxHSL (g- and b-proteobacteria), qqRSV (Actinobacteria), R/HxMS/GV (a-proteobacteria and Cyanobacteria), r/kGxSx (Thermatogae), r/kxxSI (Chloroflexi), RxxSI (Clostridia), YGYSV (Bacterioidetes), and rxhSl (Chlorobi) (Figure 6). Overall divergence in the G2 motif of EFG II is associated with two types of changes: (a) trGTPase-specific consensus RGITI is changed to the sub-subgroup-specific G2 motif variant and (b) Thr is replaced with Ser or exceptionally, with Gly (Figure 6).
Conserved and relaxed regions on the surface of EFG
The relative site-specific substitution rates for EFG subfamilies were calculated by using Rate4Site [46] and ConSurf web server [47]. One of the advantages of ConSurf in comparison to other methods is the accurate computation of the evolutionary rate by using either an empirical Bayesian method or a maximum likelihood (ML) method [48]. Thus, they can correctly discriminate between conservation due to short evolutionary time and genuine sequence conservation.
ConSurf analysis results of the EFG I subfamily and the EFG II subfamily are mapped onto surface of the crystal structure (Figure 7 B and C respectively). The analysis reveals the high conservation of ribosome side surface of EFG I when the same region is relaxed on EFG II (Figure 7 left B and C respectively). Whereas, opposite sides are equally highly variable in both subfamilies (Figure 7 right B and C respectively). There are two regions, the tip of G′ domain and the tip of IV domain, which show moderately higher conservation in the EFG II than in the EFG I subfamily (Figure 7 right B and C respectively). Generally, ConSurf analysis correlates well with the domain conservation comparison results (see above) and complements to found relaxation of the first three domains of EFG II by localizing subfamily specific relaxation to ribosome side surface of EFG.
Conserved positions in the EFG II subfamily
Comparison of the conserved positions in EFG II (127 positions) with the conserved positions in EFG I (360 positions) revealed that the former are a subset of the latter, with a few exceptions (Figure S7). Those exceptions fall into two categories. The first category consists of the five positions where different amino acids are conserved in the EFG I and EFG II subfamilies (type I conserved positions). The second category consists of seven positions that are relaxed in EFG I but are under stronger selection in the EFG II subfamily (type II conserved positions).
Each of the five type I conserved positions is associated with substantial changes in physical-chemical properties (Table 1). The location of these five positions is restricted to the first two domains, the GTPase domain and domain II. The first two positions, 16 and 25, are in the P-loop (numbering is given according to T. thermophilus EFG-2 structure 1WDT). The conserved Gly16 (Ala in EFG I) increases hydrophilicity, and Leu 25 (Gly in EFG I) increases hydrophobicity (Table 1). The other three type I conserved positions (Thr-291, The Lys-352, and Gly-333) were in domain II (Table 1) and increase hydrophilicity. Seven type II conserved positions were identified in the EFG II subfamily (Table 2). Type II conserved positions are more uniformly distributed over EFG than type I conserved positions: three are located in the GTPase domain, three in domain IV and one in domain V (Table 2). Type II conserved positions are not related to considerable changes in physical-chemical properties.
Table 1. Type I conserved positions.
position1 | EFG I | EFG II | location | ||||
amino acid | hp index2 | cons %3 | amino acid | hp index2 | cons %3 | ||
16 (19) | Ala | 1.8 | 100 | Gly | −0.4 | 86 | Domain I (GTPase domain) |
25 (28) | Thr | −0.7 | 95 | Leu | 3.8 | 99 | |
61 (64) | Thr | −0.7 | 100 | Ser | −0.8 | 76 | |
291 (316) | Ile | 4.5 | 87* | Thr | −0.7 | 81 | Domain II |
333 (360) | Ala | 1.8 | 80 | Gly | −0.4 | 98 | |
352 (379) | Gly | −0.4 | 100 | Lys (Arg) | −3.9 | 86 (14)** |
Amino acid positions are numbered according to T. thermophilus EFG-2 structure 1WDT. An alternative numeration (EFG-1 of T. thermophilus) is given in brackets.
Hydropathy index (positive value indicates hydrophobicity and negative value indicates hydrophilicity) [71].
Amino acid conservation in is given in %.
*Substitutions of Ile with Val or Leu results in minimal change in hydrophobicity.
**Lys replacement by Arg retains positive charge in this position.
Table 2. Type II conserved positions.
position1 | EFG I | EFG II | differece4 % | location | ||
amino2 acids | %3 | amino acid | % | |||
216 (224) | D, S, n * | 61, 26, 10 | D | 88 | 27 | Domain I (GTPase domain) |
250 (258) | V, M, a | 42, 33, 10 | V | 90 | 48 | |
264 (272) | L, M, V | 38, 36, 19 | L | 88 | 50 | |
471 (498) | V, K, I | 45, 35, 15 | K | 84 | 39 | Domain IV |
472 (499) | K, R, h | 56, 43, 1 | K | 89 | 33 | |
513 (543) | E, D, n | 36, 35, 6 | E | 87 | 51 | |
603 (633) | G, a, d | 71, 9, 9 | G | 96 | 25 | Domain V |
Amino acid positions are numbered according to T. thermophilus EFG-2 structure 1WDT. An alternative numeration (EFG-1 of T. thermophilus) is given in brackets.
Three most represented amino acids, in a single letter code separated by commas. Amino acid shown with small letter when the conservation is <10%.
Percentage of conservation corresponding to the amino acids found in these positions.
Only those positions are shown where the difference in conservation of the most conserved amino acid exceeds 25% between the EFG I and EFG II subfamilies.
EFG II specific conserved positions point to changed functionality
To investigate how rearrangements in functional regions could influence the capability of EFG II to perform the translocase function, a set of positions, which could be associated with altered functionality was analyzed. EFG II specific conserved positions (five type I and seven type II conserved positions) fall within the functionally important regions in the GTPase domain (domain I) and domains II, IV and V. To avoid limiting the effect of these changes within the EFG II primary sequence, these positions were mapped on to the tertiary structure of EFG (1WDT) and on to the structure of EFG with the ribosome in the pseudo-posttranslocational state [22] and posttranslocational state [49].
Type I conserved positions have an effect on the GTPase domain and domain II
Positions 16 (Ala/Gly in EFG I/II respectively) and 25 (Thr/Leu) in the P-loop (Table 1) are located in the GTPase domain (Figure 8B). The GTPase domain binds and hydrolyzes GTP [45]. This is associated with the binding of EFG to the ribosome and translocation [50], [51], and dissociating the post-termination complex [38]. Three differentially conserved positions are located in domain II. These positions are 291 (Ile/Thr in EFG I/II respectively), 333 (Ala/Gly) and 352 (Gly/[Lys,Arg]) (Table 1 and Figure 8A). Domain II contacts the 30S subunit but no certain function has been assigned to this domain. It has been shown that domain II interacts with EFG domains I and III and with the 16S ribosomal RNA helixes 5 and 15 (h5 and h15) [22], [49].
Position 25 (the last position in P-loop) contains a well-conserved (99%) Leu that increases hydrophobicity. In the 1WDT structure, the Leu25 is located close to helix E1 and the G5 motif. Next to the G5 motif (7 amino acids towards the C terminus) another EFG II-specific conserved hydrophobic amino acid, Leu264, was identified (Figure S7). In the crystal structure (1WDT) the van-der-Waals radii of these two amino acids (Leu25 and Leu264) are in contact (Figure 8B), which is an indication of hydrophobic interaction between them. Moreover, the results demonstrate that Leu264 (the interaction partner of Leu25) is highly conserved (83%) in EFG II but not in EFG I (Table 2). These observations support the presence of EFG II-specific hydrophobic interactions inside the GTPase core domain, which strengthens the interaction between the P-loop and the G5 motif. This interaction increases the tightness of the GTPase core-domain and also, decreases the flexibility of the P-loop.
The crystal structures do not reveal any interactions between positions 16, 25 (Gly16 and Leu25 in EFG II) and the bound nucleotide. For position 16 it has been demonstrated that replacing Ala with Gly in aEF-2 of Sulfolobus solfataricus increases intrinsic GTP hydrolysis (measured in the absence of ribosomes) and decreases the Poly(Phe) synthesis rate [52]. More importantly, Connell et al. showed that EFG2 (EFG II) of T. thermophilus has higher intrinsic GTPase activity and a slightly lower poly(Phe) synthesis rate in cell-free assays compared with EFG-1 (EFG I) [22]. On the basis of the data, we propose that the conservation of amino acid Gly in position 16 (conserved 86%) is related to higher intrinsic GTPase activity in EFG II. Position 25, which is 99% conserved in EFG II, is likely to have the potential to modulate GTPase activity.
Domain II has not been studied extensively, and there is no specific function assigned to this domain, making it difficult to propose functional roles for the differentially conserved positions (type I conserved positions) located in this domain (positions 291, 333 and 352) (Figure 8A). In EFG II, Gly in position 333 (Ala in EFG I), which is located in the loop between beta sheets 82 and 92 facing towards switch I of the GTPase domain, increases hydrophilicity, which could influence the interaction between switch I and domain II. The Lys in position 352 increases the positive charge in the proximal tip of β sheet 72 and contributes to an interaction with the backbone of conserved uridines U367 and U368 on the 16S rRNA helix 15 (h15) (Figure 8C) [22]. The interaction between h15/h5 and domain II of EFG is also detected on the structure (2WRK) where the ribosome is trapped with EFG in the posttranslocational state [49]. The same proximity between the β-barrel domain II and h15/h5 presents in the ribosomes in pre-translocational intermediate state (TIPRE) [53]. Therefore, Lys352 has the potential to influence the interaction between EFG II and the ribosome throughout different states of translocation. Two aspects are highlighted that are related to these three type I conserved amino acids located in domain II. First, all three amino acid changes increase hydrophilicity (Table 1); second, each of these three amino acids points towards different interaction partners of domain II (Figure 8B).
Type II conserved positions and translocation
Whereas type I conserved positions were identified in the first two domains, the type II conserved positions were located in domains I, IV and V. Asp216, Val250, and Leu264 are the three type II conserved positions located in the GTPase domain (Table 2 and Figure 8A). Val250 is turned towards the N-terminal part of the G′ subdomain, but owing to low conservation of the closest hydrophobic amino acids in the G′ subdomain no specific interactions were identified. However, considering that Val250 and Leu264 surround the G5 motif, they are probably related to modified properties of nucleotide binding center.
Two of the type II conserved positions (471,472) are located in domain IV, which is required for translocation [36]. These two conserved Lys residues increase the positive charge of the loop I region (Table 2). More intriguingly, two additional adjacent positions, 469 and 470, contribute to the positive charge of that region (Figure 9A). These positions do not correspond to the threshold for single position conservation and therefore they are not shown in table 2. However, they form one single positively charged motif/region, which consists of four consecutive positions. To illustrate its interaction with the negatively charged backbone of rRNA and tRNA amino acid residues of loop I were modified in silico to those conserved in EFG II (Figure 9). It has previously been shown that replacing Lys with hydrophobic Ile in position 496 reduces the poly(Phe) synthesis efficiency more than twofold [54]. Therefore, it is assumed that the translation efficiency depends on the strength of the interaction between EFG and the decoding center and this interaction could increase translocation efficiency, particularly in those physiological conditions where the stronger interaction could be critical.
The divergent nature of the EFG II subfamily encourages us to ask what function(s) does this protein perform really? On the one hand, in the case of the EFG II subfamily, the weakened selection of duplicated genes can be observed as a vastly increased evolutionary speed and an increased number of indels. On the other hand, among members of EFG II subfamily there is particularly intense selection for certain characteristics, such as some positions, that are conserved throughout the entire subfamily. The presence of conserved characteristics in the otherwise highly diverged sequences of EFG II, which appear to correlate with unique functional peculiarities, can guide and inform the design of future experiments in this area of research. Our results suggest that EFG II specializes in some roles assigned to EFG I, but the possibility of functional shift should be also considered. The positions that are differentially conserved in EFG I and in EFG II (type I conserved positions), and the positions under stronger selection in EFG II (type II conserved positions) are the specific characteristics that provide information about functional divergence. They pinpoint the set of specific characteristics that open the door to further biochemical studies targeting the EFG's altered functionality.
Materials and Methods
Identifying EFG sequences
EFG protein sequences have been identified using HMMSEARCH [55] and TBLASTN [56] according to the procedure described by Margus et al. 2007 [5]. Searches were performed against the NCBI Ref-Seq database of completed bacterial genomes retrieved from NCBI. Three sets of EFG sequence data were used: the first contained 214 EFG sequences from 99 genomes with multiple fus genes; the second dataset contained EFG I sequences from genomes with single and multiple fus genes; the third dataset contained 141 EFG II sequences collected from 590 genomes. The first two sets were based on the Ref-Seq database of completed genomes, dated October 2006 and the third set is based on the Ref-Seq database as it was on March 2008 [57].
Computing multiple sequence alignments
The preliminary alignment of the first dataset was carried out with MAFFT version 5.861 [58] using strategy L-INS-I. Two highly diverged EFGs from Leptospira interrogans were excluded from the dataset used for tree building because of extensive deletions within the sequence. The final alignment was computed with T-COFFEE [59] where, in addition to default methods, results of threading to EFG tertiary structure 1FNM with FUGUE [60] were taken into account. The dataset was split into 50 sequence groups; each contained the corresponding guide sequence (gi|55981664) and the reference to the structure (1FNM) for threading. Computed alignments were coupled into one alignment and guide sequences were removed. This alignment was used for computing the phylogenetic tree of EFG subfamilies. Alignments for computing the phylogeny of EFG I, EFG II and for determining indels were computed by MAFFT using strategy L-INS-I [58].
Estimating conserved positions
The EFG alignment was modified by removing all insertions relative to Thermus thermophilus EFG I (gi|55981664). Sequence logos for EFG subgroup alignments were calculated using the Sequence Logo website (version 2.8) [61]. The EFG I subfamily contained 114 sequences and the EFG II subfamily contained 140 sequences. These 114 sequences of EFG I are representing adequately conservation/variation pattern specific to the EFG I subfamily and incorporating more sequences from genomes with a single EFG gene does not change our results. The position was counted as conserved if the height of the sequence logo was at least three bits.
Estimating position specific amino acid substitution rates
The relative site-specific substitution rates for EFG subfamilies were calculated by using Rate4Site [46] and ConSurf web server (http://consurf.tau.ac.il/) [47]. Alignments were computed by MAFFT using strategy L-INS-I [58]. More than 97% identical sequences were removed from the EFG I dataset resulted with 190 sequences (EFG I from all used genomes) and the EFG II subfamily contained 140 sequences. The run was carried out using PDB code 1FNM and the surface plot was generated using the PyMol script output by ConSurf [47].
Methods used to predict fate of recent duplicates
Synonymous and non-synonymous substitution rate ratio was estimated by using codon models of sequence evolution implemented in CodeML [62]. Values of dS (substitutions per synonymous site) and dN (substitutions per non-synonymous site) were calculated as a cumulative value for the pair of sequences by using PAL2NAL [29]. When the ratio of dN/dS (ω) is much lower than one (ω≪1) the gene is considered to be under selection, when close to one (dN/dS∼1) gene is considered to evolve under neutral model (no selection). Mutations accumulation is considered to be close to saturation when dS>3 and these pairs were removed from future analysis. To produce the figure of dN as the function of dS (Figure S3) equations (4) and (5) with predetermined values of free parameters [28] and the statistical software R [63] was used. Data points of dN and dS, determined for recent duplicates of EFG I genes, were added to the figure. To determine the gene duplication event(s) on species tree for recent duplicates in β- and γ-proteobacteria a reconciliation tree between gene tree (EFG I) and species tree was computed by SoftParsMap [30]. 16S rRNA based species tree and EFG I protein sequence based tree were used as input for SoftParsMap [30]. Two alternative scenarios of gene gain and loss were mapped into improved species tree (Figure S4).
Determining the type I and type II conserved positions
Positions that are highly conserved in the EFG I subfamily but where a different conserved amino acid in EFG II were identified (type I conserved positions). A preliminary set of such positions was obtained using the conservation criterion (3 bit). Only those positions where conservation of the different amino acid exceeds 80% in both subfamilies were selected. Positions that are conserved in EFG II but are relaxed in EFG I (type II conserved positions) were identified. In addition to the position conservation criterion (3 bit), the criterion for amino acid conservation (80%) in EFG II was utilized. In addition, the difference in amino acid conservation between subfamilies must exceed 25%.
Computing phylogenetic trees
Bayesian tree searching was carried out using MrBayes 3.12 [64], [65] and a mixture of amino acid substitution models. Maximum likelihood trees were calculated with RAxML-VI-HPC 2.2.3 [66] using the PROTCATWAG amino acid substitution model. A gamma distribution with the α shape parameter estimated by the programs was used. Tree manipulations (computing consensus tree from RAxML bootstraps, joining groups in the tree and other simple manipulations) were carried out with MEGA3 [67].
For computing species trees, pre-aligned 16S rRNA sequences were downloaded from RDP II [68]. Bayesian tree searching was carried out with MrBayes 3.12 [64] under model GTR+I+Γ for up to 1 million iterations. For 214 EFG protein sequences (excluding Leptospira interrogans second EFGs) from 99 genomes (first dataset) Bayesian tree searching applied 2.5 million iterations.
To calculate the tree for EFG I, the second dataset was used. For rooting, one Pirellula EFG (gi|32475048 belonging to spdEFG2) was added. The multiple sequence alignment was generated with MAFFT version 5.861 [58] using strategy L-INS-i. Bayesian tree searching was applied up to 2.14 million iterations. For the subfamily of EFG II (third dataset), Bayesian tree searching was applied 5 million iterations and a maximum likelihood tree was calculated and bootstrapped 500 times.
Finding genome context conservation
To determine the genome context of EFG genes, the orthologs of E. coli genes in other genomes were determined by INPARANOID [69]. For clustering genes with a similar set of neighboring genes, five genes before and after the gene of interest (EFG gene) were taken into account (not considering gene order). The distances between queried genes (EFG genes) were calculated on the basis of the number of common surrounding genes. A distance matrix was calculated in format, which served as input for the program NEIGHBOUR from the PHYLIP package [70]. This approach was useful for determining EFGs in the str operon. In other cases, the calculated similarity was manually rechecked as the capacity of the method to find similar genes is restricted to the gene repertory of E. coli.
Supporting Information
Acknowledgments
We thank Gemma C. Atkinson for helpful discussion of phylogeny and Ülo Maiväli for critical reading of the manuscript. We thank Phillip Endicott and Djuddah A.J. Leijen for correcting language and helpful hints for writing.
Footnotes
Competing Interests: The authors have declared that no competing interests exist.
Funding: This work was funded by grants SF0180026s09 and SF0180166s08 from the Estonian Ministry of Education and Research and by the EU through the European Regional Development Fund through the Estonian Centre of Excellence in Genomics and through the Center of Excellence in Chemical Biology. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
References
- 1.Gevers D, Vandepoele K, Simillon C, Van de Peer Y. Gene duplication and biased functional retention of paralogs in bacterial genomes. Trends Microbiol. 2004;12:148–154. doi: 10.1016/j.tim.2004.02.007. [DOI] [PubMed] [Google Scholar]
- 2.Kondrashov FA, Rogozin IB, Wolf YI, Koonin EV. Selection in the evolution of gene duplications. Genome Biol. 2002;3:RESEARCH0008. doi: 10.1186/gb-2002-3-2-research0008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Innan H, Kondrashov F. The evolution of gene duplications: classifying and distinguishing between models. Nat Rev Genet. 2010;11:97–108. doi: 10.1038/nrg2689. [DOI] [PubMed] [Google Scholar]
- 4.Pandit SB, Srinivasan N. Survey for g-proteins in the prokaryotic genomes: prediction of functional roles based on classification. Proteins. 2003;52:585–597. doi: 10.1002/prot.10420. [DOI] [PubMed] [Google Scholar]
- 5.Margus T, Remm M, Tenson T. Phylogenetic distribution of translational GTPases in bacteria. BMC Genomics. 2007;8:15. doi: 10.1186/1471-2164-8-15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Caldon CE, March PE. Function of the universally conserved bacterial GTPases. Curr Opin Microbiol. 2003;6:135–139. doi: 10.1016/s1369-5274(03)00037-7. [DOI] [PubMed] [Google Scholar]
- 7.Nishizuka Y, Lipmann F. Comparison of guanosine triphosphate split and polypeptide synthesis with a purified E. coli system. Proc Natl Acad Sci U S A. 1966;55:212–219. doi: 10.1073/pnas.55.1.212. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Pestka S. Studies on the formation of trensfer ribonucleic acid-ribosome complexes. V. On the function of a soluble transfer factor in protein synthesis. Proc Natl Acad Sci U S A. 1968;61:726–733. doi: 10.1073/pnas.61.2.726. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Rheinberger HJ, Nierhaus KH. Testing an alternative model for the ribosomal peptide elongation cycle. Proc Natl Acad Sci U S A. 1983;80:4213–4217. doi: 10.1073/pnas.80.14.4213. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Liljas A. 2004. Structural Aspects of Protein Synthesis: World Scientific Publishing Co. Pte. Ltd.
- 11.Hirashima A, Kaji A. Role of elongation factor G and a protein factor on the release of ribosomes from messenger ribonucleic acid. J Biol Chem. 1973;248:7580–7587. [PubMed] [Google Scholar]
- 12.Hirokawa G, Demeshkina N, Iwakura N, Kaji H, Kaji A. The ribosome-recycling step: consensus or controversy? Trends Biochem Sci. 2006;31:143–149. doi: 10.1016/j.tibs.2006.01.007. [DOI] [PubMed] [Google Scholar]
- 13.Caldon CE, Yoong P, March PE. Evolution of a molecular switch: universal bacterial GTPases regulate ribosome function. Mol Microbiol. 2001;41:289–297. doi: 10.1046/j.1365-2958.2001.02536.x. [DOI] [PubMed] [Google Scholar]
- 14.Inagaki Y, Doolittle WF, Baldauf SL, Roger AJ. Lateral transfer of an EF-1alpha gene: origin and evolution of the large subunit of ATP sulfurylase in eubacteria. Curr Biol. 2002;12:772–776. doi: 10.1016/s0960-9822(02)00816-3. [DOI] [PubMed] [Google Scholar]
- 15.Connell SR, Trieber CA, Dinos GP, Einfeldt E, Taylor DE, et al. Mechanism of Tet(O)-mediated tetracycline resistance. Embo J. 2003;22:945–953. doi: 10.1093/emboj/cdg093. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Owens RM, Pritchard G, Skipp P, Hodey M, Connell SR, et al. A dedicated translation factor controls the synthesis of the global regulator Fis. Embo J. 2004;23:3375–3385. doi: 10.1038/sj.emboj.7600343. [DOI] [PMC free article] [PubMed] [Google Scholar] [Retracted]
- 17.Leipe DD, Wolf YI, Koonin EV, Aravind L. Classification and evolution of P-loop GTPases and related ATPases. J Mol Biol. 2002;317:41–72. doi: 10.1006/jmbi.2001.5378. [DOI] [PubMed] [Google Scholar]
- 18.Drummond DA, Bloom JD, Adami C, Wilke CO, Arnold FH. Why highly expressed proteins evolve slowly. Proc Natl Acad Sci U S A. 2005;102:14338–14343. doi: 10.1073/pnas.0504070102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Hooper SD, Berg OG. On the nature of gene innovation: duplication patterns in microbial genomes. Mol Biol Evol. 2003;20:945–954. doi: 10.1093/molbev/msg101. [DOI] [PubMed] [Google Scholar]
- 20.Lathe WC, 3rd, Bork P. Evolution of tuf genes: ancient duplication, differential loss and gene conversion. FEBS Lett. 2001;502:113–116. doi: 10.1016/s0014-5793(01)02639-4. [DOI] [PubMed] [Google Scholar]
- 21.Abdulkarim F, Hughes D. Homologous recombination between the tuf genes of Salmonella typhimurium. J Mol Biol. 1996;260:506–522. doi: 10.1006/jmbi.1996.0418. [DOI] [PubMed] [Google Scholar]
- 22.Connell SR, Takemoto C, Wilson DN, Wang H, Murayama K, et al. Structural basis for interaction of the ribosome with the switch regions of GTP-bound elongation factors. Mol Cell. 2007;25:751–764. doi: 10.1016/j.molcel.2007.01.027. [DOI] [PubMed] [Google Scholar]
- 23.Seshadri A, Samhita L, Gaur R, Malshetty V, Varshney U. Analysis of the fusA2 locus encoding EFG2 in Mycobacterium smegmatis. Tuberculosis (Edinb) 2009;89:453–464. doi: 10.1016/j.tube.2009.06.003. [DOI] [PubMed] [Google Scholar]
- 24.Suematsu T, Yokobori SI, Morita H, Yoshinari S, Ueda T, et al. A bacterial elongation factor G homolog exclusively functions in ribosome recycling in the spirochaete Borrelia burgdorferi. Mol Microbiol. 2010 doi: 10.1111/j.1365-2958.2010.07067.x. [DOI] [PubMed] [Google Scholar]
- 25.Jaskunas SR, Fallon AM, Nomura M. Identification and organization of ribosomal protein genes of Escherichia coli carried by lambdafus2 transducing phage. J Biol Chem. 1977;252:7323–7336. [PubMed] [Google Scholar]
- 26.Atkinson GC, Baldauf SL. Evolution of elongation factor G and the origins of mitochondrial and chloroplast forms. Mol Biol Evol. 2010 doi: 10.1093/molbev/msq316. [DOI] [PubMed] [Google Scholar]
- 27.Tsuboi M, Morita H, Nozaki Y, Akama K, Ueda T, et al. EF-G2mt is an exclusive recycling factor in mammalian mitochondrial protein synthesis. Mol Cell. 2009;35:502–510. doi: 10.1016/j.molcel.2009.06.028. [DOI] [PubMed] [Google Scholar]
- 28.Hughes T, Liberles DA. The pattern of evolution of smaller-scale gene duplicates in mammalian genomes is more consistent with neo- than subfunctionalisation. J Mol Evol. 2007;65:574–588. doi: 10.1007/s00239-007-9041-9. [DOI] [PubMed] [Google Scholar]
- 29.Suyama M, Torrents D, Bork P. PAL2NAL: robust conversion of protein sequence alignments into the corresponding codon alignments. Nucleic Acids Res. 2006;34:W609–612. doi: 10.1093/nar/gkl315. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Berglund-Sonnhammer AC, Steffansson P, Betts MJ, Liberles DA. Optimal gene trees from sequences and species trees using a soft interpretation of parsimony. J Mol Evol. 2006;63:240–250. doi: 10.1007/s00239-005-0096-1. [DOI] [PubMed] [Google Scholar]
- 31.Aury JM, Jaillon O, Duret L, Noel B, Jubin C, et al. Global trends of whole-genome duplications revealed by the ciliate Paramecium tetraurelia. Nature. 2006;444:171–178. doi: 10.1038/nature05230. [DOI] [PubMed] [Google Scholar]
- 32.AEvarsson A, Brazhnikov E, Garber M, Zheltonosova J, Chirgadze Y, et al. Three-dimensional structure of the ribosomal translocase: elongation factor G from Thermus thermophilus. Embo J. 1994;13:3669–3677. doi: 10.1002/j.1460-2075.1994.tb06676.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Czworkowski J, Wang J, Steitz TA, Moore PB. The crystal structure of elongation factor G complexed with GDP, at 2.7 A resolution. Embo J. 1994;13:3661–3668. doi: 10.1002/j.1460-2075.1994.tb06675.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Nissen P, Kjeldgaard M, Thirup S, Polekhina G, Reshetnikova L, et al. Crystal structure of the ternary complex of Phe-tRNAPhe, EF-Tu, and a GTP analog. Science. 1995;270:1464–1472. doi: 10.1126/science.270.5241.1464. [DOI] [PubMed] [Google Scholar]
- 35.Martemyanov KA, Gudkov AT. Domain III of elongation factor G from Thermus thermophilus is essential for induction of GTP hydrolysis on the ribosome. J Biol Chem. 2000;275:35820–35824. doi: 10.1074/jbc.M002656200. [DOI] [PubMed] [Google Scholar]
- 36.Martemyanov KA, Gudkov AT. Domain IV of elongation factor G from Thermus thermophilus is strictly required for translocation. FEBS Lett. 1999;452:155–159. doi: 10.1016/s0014-5793(99)00635-3. [DOI] [PubMed] [Google Scholar]
- 37.Savelsbergh A, Matassova NB, Rodnina MV, Wintermeyer W. Role of Domains 4 and 5 in Elongation Factor G Functions on the Ribosome. Journal of Molecular Biology. 2000;300:951–961. doi: 10.1006/jmbi.2000.3886. [DOI] [PubMed] [Google Scholar]
- 38.Hirokawa G, Kiel MC, Muto A, Selmer M, Raj VS, et al. Post-termination complex disassembly by ribosome recycling factor, a functional tRNA mimic. Embo J. 2002;21:2272–2281. doi: 10.1093/emboj/21.9.2272. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Bourne HR, Sanders DA, McCormick F. The GTPase superfamily: a conserved switch for diverse cell functions. Nature. 1990;348:125–132. doi: 10.1038/348125a0. [DOI] [PubMed] [Google Scholar]
- 40.Sprang SR. G protein mechanisms: insights from structural analysis. Annu Rev Biochem. 1997;66:639–678. doi: 10.1146/annurev.biochem.66.1.639. [DOI] [PubMed] [Google Scholar]
- 41.AEvarsson A. Structure-based sequence alignment of elongation factors Tu and G with related GTPases involved in translation. J Mol Evol. 1995;41:1096–1104. [PubMed] [Google Scholar]
- 42.Hamel E, Koka M, Nakamoto T. Requirement of an Escherichia coli 50 S ribosomal protein component for effective interaction of the ribosome with T and G factors and with guanosine triphosphate. J Biol Chem. 1972;247:805–814. [PubMed] [Google Scholar]
- 43.Diaconu M, Kothe U, Schlunzen F, Fischer N, Harms JM, et al. Structural basis for the function of the ribosomal L7/12 stalk in factor binding and GTPase activation. Cell. 2005;121:991–1004. doi: 10.1016/j.cell.2005.04.015. [DOI] [PubMed] [Google Scholar]
- 44.Nechifor R, Murataliev M, Wilson KS. Functional interactions between the G′ subdomain of bacterial translation factor EF-G and ribosomal protein L7/L12. J Biol Chem. 2007;282:36998–37005. doi: 10.1074/jbc.M707179200. [DOI] [PubMed] [Google Scholar]
- 45.Vetter IR, Wittinghofer A. The guanine nucleotide-binding switch in three dimensions. Science. 2001;294:1299–1304. doi: 10.1126/science.1062023. [DOI] [PubMed] [Google Scholar]
- 46.Pupko T, Bell RE, Mayrose I, Glaser F, Ben-Tal N. Rate4Site: an algorithmic tool for the identification of functional regions in proteins by surface mapping of evolutionary determinants within their homologues. Bioinformatics. 2002;18(Suppl 1):S71–77. doi: 10.1093/bioinformatics/18.suppl_1.s71. [DOI] [PubMed] [Google Scholar]
- 47.Ashkenazy H, Erez E, Martz E, Pupko T, Ben-Tal N. ConSurf 2010: calculating evolutionary conservation in sequence and structure of proteins and nucleic acids. Nucleic Acids Res. 2010;38:W529–533. doi: 10.1093/nar/gkq399. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Mayrose I, Graur D, Ben-Tal N, Pupko T. Comparison of site-specific rate-inference methods for protein sequences: empirical Bayesian methods are superior. Mol Biol Evol. 2004;21:1781–1791. doi: 10.1093/molbev/msh194. [DOI] [PubMed] [Google Scholar]
- 49.Gao YG, Selmer M, Dunham CM, Weixlbaumer A, Kelley AC, et al. The structure of the ribosome with elongation factor G trapped in the posttranslocational state. Science. 2009;326:694–699. doi: 10.1126/science.1179709. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Kaziro Y. The role of guanosine 5′-triphosphate in polypeptide chain elongation. Biochim Biophys Acta. 1978;505:95–127. doi: 10.1016/0304-4173(78)90009-5. [DOI] [PubMed] [Google Scholar]
- 51.Hauryliuk V, Hansson S, Ehrenberg M. Cofactor dependent conformational switching of GTPases. Biophys J. 2008;95:1704–1715. doi: 10.1529/biophysj.107.127290. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.De Vendittis E, Adinolfi BS, Amatruda MR, Raimo G, Masullo M, et al. The A26G replacement in the consensus sequence A-X-X-X-X-G-K-[T,S] of the guanine nucleotide binding site activates the intrinsic GTPase of the elongation factor 2 from the archaeon Sulfolobus solfataricus. Eur J Biochem. 1999;262:600–605. doi: 10.1046/j.1432-1327.1999.00428.x. [DOI] [PubMed] [Google Scholar]
- 53.Ratje AH, Loerke J, Mikolajka A, Brunner M, Hildebrand PW, et al. Head swivel on the ribosome facilitates translocation by means of intra-subunit tRNA hybrid sites. Nature. 2010;468:713–716. doi: 10.1038/nature09547. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Kovtun AA, Minchenko AG, Gudkov AT. [Mutation analysis of functional role of amino acid residues in domain IV of elongation factor G]. Mol Biol (Mosk) 2006;40:850–856. [PubMed] [Google Scholar]
- 55.Eddy SR. Profile hidden Markov models. Bioinformatics. 1998;14:755–763. doi: 10.1093/bioinformatics/14.9.755. [DOI] [PubMed] [Google Scholar]
- 56.Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. doi: 10.1093/nar/25.17.3389. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.NCBI. Bacterial sequence database. NCBI 2008 [Google Scholar]
- 58.Katoh K, Kuma K, Toh H, Miyata T. MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Res. 2005;33:511–518. doi: 10.1093/nar/gki198. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Notredame C, Higgins DG, Heringa J. T-Coffee: A novel method for fast and accurate multiple sequence alignment. J Mol Biol. 2000;302:205–217. doi: 10.1006/jmbi.2000.4042. [DOI] [PubMed] [Google Scholar]
- 60.Shi J, Blundell TL, Mizuguchi K. FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. J Mol Biol. 2001;310:243–257. doi: 10.1006/jmbi.2001.4762. [DOI] [PubMed] [Google Scholar]
- 61.Crooks GE, Hon G, Chandonia JM, Brenner SE. WebLogo: a sequence logo generator. Genome Res. 2004;14:1188–1190. doi: 10.1101/gr.849004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Goldman N, Yang Z. A codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol Biol Evol. 1994;11:725–736. doi: 10.1093/oxfordjournals.molbev.a040153. [DOI] [PubMed] [Google Scholar]
- 63.R Development Core Team. R: A Language and Environment for Statistical Computing. 2.13 ed. Vienna: R Foundation for Statistical Computing; 2011. http://www.R-project.org. [Google Scholar]
- 64.Huelsenbeck JP, Ronquist F. MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics. 2001;17:754–755. doi: 10.1093/bioinformatics/17.8.754. [DOI] [PubMed] [Google Scholar]
- 65.Altekar G, Dwarkadas S, Huelsenbeck JP, Ronquist F. Parallel Metropolis coupled Markov chain Monte Carlo for Bayesian phylogenetic inference. Bioinformatics. 2004;20:407–415. doi: 10.1093/bioinformatics/btg427. [DOI] [PubMed] [Google Scholar]
- 66.Stamatakis A. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics. 2006;22:2688–2690. doi: 10.1093/bioinformatics/btl446. [DOI] [PubMed] [Google Scholar]
- 67.Kumar S, Tamura K, Nei M. MEGA3: Integrated software for Molecular Evolutionary Genetics Analysis and sequence alignment. Brief Bioinform. 2004;5:150–163. doi: 10.1093/bib/5.2.150. [DOI] [PubMed] [Google Scholar]
- 68.Maidak BL, Cole JR, Lilburn TG, Parker CT, Jr, Saxman PR, et al. The RDP-II (Ribosomal Database Project). Nucleic Acids Res. 2001;29:173–174. doi: 10.1093/nar/29.1.173. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Remm M, Storm CE, Sonnhammer EL. Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. J Mol Biol. 2001;314:1041–1052. doi: 10.1006/jmbi.2000.5197. [DOI] [PubMed] [Google Scholar]
- 70.Felsenstein J. 2004. PHYLIP (Phylogeny Inference Package) version 3.63.
- 71.Kyte J, Doolittle RF. A simple method for displaying the hydropathic character of a protein. J Mol Biol. 1982;157:105–132. doi: 10.1016/0022-2836(82)90515-0. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.