Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 1997 May 13;94(10):5183–5188. doi: 10.1073/pnas.94.10.5183

Four primordial modes of tRNA-synthetase recognition, determined by the (G,C) operational code

Sergei N Rodin *,†,, Susumu Ohno *
PMCID: PMC24653  PMID: 9144212

Abstract

In distinction to single-stranded anticodons built of G, C, A, and U bases, their presumable double-stranded precursors at the first three positions of the acceptor stem are composed almost invariably of G-C and C-G base pairs. Thus, the “second” operational RNA code responsible for correct aminoacylation seems to be a (G,C) code preceding the classic genetic code. Although historically rooted, the two codes were destined to diverge quite early. However, closer inspection revealed that two complementary catalytic domains of class I and class II aminoacyl-tRNA synthetases (aaRSs) multiplied by two, also complementary, G2-C71 and C2-G71 targets in tRNA acceptors, yield four (2 × 2) different modes of recognition. It appears therefore that the core four-column organization of the genetic code, associated with the most conservative central base of anticodons and codons, was in essence predetermined by these four recognition modes of the (G,C) operational code. The general conclusion follows that the genetic code per se looks like a “frozen accident” but only beyond the “2 × 2 = 4” scope. The four primordial modes of tRNA–aaRS recognition are amenable to direct experimental verification.

Keywords: origin of the genetic code, transfer RNA, aminoacyl-tRNA synthetases


Since the complete decipherment of the genetic code, its origin has become a most tempting cryptographic challenge. The main stumbling-block is that in a typical tRNA of the three-dimensional L-shape the site of amino acid attachment is at one end of the molecule, whereas the anticodon trinucleotide is at the opposite end, at a distance exceeding 70 Å. There actually exist two genetic codes, one is the “classic” code represented in tRNA by an anticodon for reading codons in mRNA, and the other is the “second” (1) operational RNA code (25) mapped mainly to the acceptor for appropriate aminoacylation at its 3′ terminus. As far as translation is concerned, it does not make sense to consider one code without the other. But contrary to relatively simple codon–anticodon interactions, the present-day operational RNA code is quite intricately written in the structure of tRNA acceptors and cognate aminoacyl-tRNA synthetases (aaRSs). Because of long coevolution between tRNAs and aaRSs, it is naive to hope that this “nonclassic” code would be finally interpreted in a literal form of the classic codon (anticodon)-amino acid assignment (Table 1). The present congruity of the two codes is provided by 20 specific aaRSs divided into two strikingly dissimilar classes of 10 members each. However, what could mediate the original matching between earliest tRNAs and amino acids at the very start of the genertic code evolution, in absence of aaRSs? In fact, we address here the “chicken-or-egg” dilemma. Aggravating the dilemma is the modular structure of aaRS which suggests that these proteins evolved from two class-defining rather small modules, each having sites to bind ATP, cognate amino acids, and acceptors, but not anticodons (25).

Table 1.

The genetic code: anticodons

5′ A G U C 3′
A AAA Phe AGA Ser AUA Tyr ACA Cys
G GAA Phe GGA Ser GUA Tyr GCA Cys A
U UAA Leu UGA Ser UUA Stop UCA Stop
C CAA Leu CGA Ser CUA Stop CCA Trp
A AAG Leu AGG Pro AUG His ACG Arg
G GAG Leu GGG Pro GUG His GCG Arg G
U UAG Leu UGG Pro UUG Gln UCG Arg
C CAG Leu CGG Pro CUG Gln CCG Arg
A AAU Ile AGU Thr AUU Asn ACU Ser
G GAU Ile GGU Thr GUU Asn GCU Ser U
U UAU Ile UGU Thr UUU Lys UCU Arg
C CAU Met CGU Thr CUU Lys CCU Arg
A AAC Val AGC Ala AUC Asp ACC Gly
G GAC Val GGC Ala GUC Asp GCC Gly C
U UAC Val UGC Ala UUC Glu UCC Gly
C CAC Val CGC Ala CUC Glu CCC Gly
G2-C71 C2-G71

Boxed areas are amino acids activated by class I aaRSs, and shaded areas are those activated by class II aaRSs. The vertical line divides the table into two halves in accordance with the consensus second base pair (shown below) in the acceptor helix of corresponding tRNAs when they are recognized by the opposite aaRSs (see text and Fig. 1 for detail).  

We reported recently (6) that in pairs of consensus tRNAs with complementary anticodons, second bases in their acceptor stems proved to be complementarily related as well. For short, we shall refer to this relationship as the dual complementarity. Even modern tRNAs of individual species (e.g., Escherichia coli, Haloferax volcanii, Saccharomyces cerevisiae) and consensus tRNAs representing main kingdoms still exhibit statistically significant relics of the dual complementarity, mitochondrial tRNAs being the only exception (Table 2). This previously unrecognized relationship between acceptor and anticodon points to the historically common root of the two codes. The value of 70 Å is no longer a critical distance simply because the two codes appear to be originally one and the same. It still remains obscure whether amino acids were able to stereo-specifically interact with anticodon/codon-like precursors in such close proximity to the amino acid attachment site of primordial acceptors. But in any case, such a favorable location of the code essentially alleviated the problem of specific aminoacylation for first minimalist aaRSs that supposedly had no anticodon recognizing sites (25).

Table 2.

Dual complementarity of second bases in separate organisms and consensus tRNAs representing main kingdoms

Organism or group No. of pairs of tRNAs with complementary anticodons No. of pairs with complementary second bases in the acceptors No. of pairs with noncomplementary second bases in the acceptors
E. coli (Eubacteria) 32 24 8**
H. volcanii (Archaebacteria) 29 24 6**
S. cerevisiae (Yeast) 24 20 4**
Chloroplasts 26 19 7**
Cytoplasm of plants 20 16 4**
Cytoplasm of animals 27 18 9**
Mitochondria of single cell or fungi 18 12 6*
Mitochondria of plants 17 12 5*
Mitochondria of animal 17 9 8*
Pooled data 210 154 56**
Common consensus (from ref. 6) 32 29 3**

All tRNAs sequence data are retrieved from ref. 7. The table shows that almost ideal dual complementarity represented by common consensus tRNAs diverged in different phyletic pathways. Therefore, consensus tRNAs show a highest index of the dual complementarity. Yet, all groups tested, except for mitochondrial tRNAs, still show significantly nonrandom dual complementarity. Nonrandomness of the dual complementarity was statistically tested using normal approximation to binomial distribution (without continuity correction) and assuming two-tail P values. 

**

Extremely nonrandom (P = 0.0004–0.008). **Significantly nonrandom (P = 0.016–0.04). *Insignificantly nonrandom (P ≥ 0.16). 

The above connection between acceptor and anticodon is consonant with the sequence complementarity that we found earlier: (i) for the anticodon stem-loop sequences antiparallelly aligned in the pairs of consensus tRNAs with complementary anticodons (8, 9), and (ii) for head-to-tail aligned class-defining signature motifs of aaRSs from the two opposite classes, remarkably only within their catalytic domains (10). Accordingly, we proposed that (i) tRNAs could emerge concertedly as pairs of double-stranded complementary palindromes, the precursors of both the acceptor and anticodon arms of the molecules (6, 8, 9), and (ii) the two aaRSs could also have been originally encoded by complementary strands of the same ancestral gene (10). Furthermore, in a binary (purine-pyrimidine, R-Y) approximation, the dual complementarity in tRNAs is almost ideally balanced with the division of 20 aaRSs and cognate amino acids on the two complementary classes (6).

Fortuitous independent emergence of all these complementarities (each itself being quite surprising) is very unlikely (6). Yet, in modern tRNAs, the two codes appear as markedly different (otherwise their common origin would have been discovered long ago by routine comparisons). In this paper we consider the “complementary scenario” in more detail, introduce the “2 × 2 = 4” formula for primordial tRNA–aaRS recognition inherited by the two codes, and explore why these two codes have diverged. Data and methods used are the same as described in ref. 6.

Nondegeneracy of the Acceptor Code

That the relic second base complementarity in the acceptor is conserved in parallel with the complementarity of anticodons could mean that the operational RNA code as we see it today actually developed from a few ancient anticodon/codon-like pairs located in the first three positions of the acceptor stem (6). Consistent with this is the following “coincidence” (6, 11). Inasmuch as there exist only 20 aaRSs, one for each amino acid (and, respectively, for isoacceptor tRNAs), the operational code is certainly nondegenerate (1). Therefore, the anticodon precursors in the acceptor are not necessarily expected to be so variable at their first position (that corresponds to the degenerate third position of codons) as are the real single-stranded anticodons. This is exactly the case for over 2,000 tRNAs and tRNA genes (excluding mitochondrial) sequenced by now: their G1-C72 base pair is almost invariant. Not surprisingly, unlike the second and partly third bases, the first bases of acceptor and anticodon do not exhibit the dual complementary relationships (6, 11). This observation strengthens the hypothesis (25) that both aaRS class-defining catalytic domains with the inserted acceptor-binding site emerged earlier than anticodon-binding domains. Such a nondegeneracy, inherent only to the acceptor code, may indicate the historically subsidiary role of anticodons in aminoacylation. Otherwise, more than 20 aaRSs could exist, one for each anticodon rather than one for each amino acid.

The Early (G,C) Code

In contrast with anticodons (and also codons), which are built of four bases, G, A, C, and U, their double-stranded precursors in the 1-2-3 positions of acceptors appear as triplets almost invariably composed of G-C and C-G base pairs. Even in the majority of modern systems, including the deep Archaea (12), specific aminoacylation of the acceptor 3′ terminus operates within the (G,C) variety of first three base pairs. Moreover, in spite of AT-richness of mitochondrial genomes, their tRNAs still show a tendency to keep G-C and C-G base pairs at 1-2-3 positions, contrary to other positions of similar variability inside as well as outside of the acceptor domain (S.N.R., unpublished data). Particularly impressive in this regard is a comparison of the two extreme cases, one presented by tRNAAla and tRNASer, and the other by tRNAVal and tRNAHis. The anticodon in the tRNAs from the former group was proved to be unnecessary for correct aminoacylation, with the acceptor playing a dominant role in their recognition (2). Accordingly, the first three base pairs in the acceptor of the mitochondrial tRNAs for Ser and Ala are especially (G,C) conservative. The opposite is true for the second group: the mitochondrial tRNAHis and tRNAVal show a very strong bias toward A-U and U-A base pairs instead of G-C and C-G base pairs, emphatically so at the second position. Of course, this bias does not exclude the contribution of the acceptors in the tRNAHis and tRNAVal identities, especially in the case of tRNAHis with its unique base pair-determinator G-1: C73 (13) which remains invariant in mitochondrial tRNAHis as well.

The archaic (G,C) code was hypothesized long ago (1416). The general thesis has been argued that, like all other evolving systems, the code “began simply and evolved to the present complexity” (17). The very fact that the anticodon/codon-like triplets in the acceptor helix turned out to be composed of predominantly G-C and C-G base pairs strongly supports this hypothesis.

The origin of single-stranded anticodons from the (G,C) precursors in double-stranded acceptors leads to the following conclusions: (i) although being historically connected by the common (G,C) ancestor, the two physically linked codes had segregated quite early, certainly before the final shaping of the codon (anticodon)-amino acid assignment; (ii) the classic code per se, expanded from the (G,C) toward the complete (G,C,A,U) variant in the course of early coevolution between tRNAs and mRNAs, while the antedating acceptor code coevolved with cognate aaRSs essentially within the limits of original (G,C) base repertoire; (iii) these two processes of coevolution eventually resulted in the two different codes which do look independent in superficial comparison.

All the above draws us to the question: Which part of the classic genetic code has been “preserved” by the earliest code written in proto-acceptor microhelices and the two minimalist proto-aaRSs?

The Primordial Modes of tRNA–aaRS Recognition: 2 × 2 = 4?

We suppose this trivial formula prompts an answer. Indeed, the probability of random second base complementarity to occur in the two acceptors simultaneously with the complementarity of anticodons in the corresponding pairs of consensus tRNAs is very low, ≈10−6 to be more exact (6). However, even assuming such a dual complementarity and the restricted (binary) base diversity (G,C) at the second position of tRNA acceptor stem granted, there still remains an opportunity for G2-C71 and C2-G71 combinations to be distributed randomly among all pairs of tRNAs with complementary anticodons. The actual distribution appears strikingly nonrandom, certainly due to those pairs of tRNAs with complementary anticodons that are aminoacylated by the opposite synthetases. Indeed, consider 18 such pairs of tRNAs depicted in Fig. 1. Out of these 18, four pairs do not strictly satisfy the outlined conditions since one of the partners has an ambiguous consensus second base (S = G or C) and thus ought to be excluded from the further statistical analysis. Also excluded is the Phe-Glu pair having uncomplementary second bases in their acceptors and the reverse aaRS class membership. On the other hand, we did not include into the analysis numerous pairs of “quasi-complementary” anticodons (8, 9) with allowed “wobbling” G-U and U-G pairing between their first and third bases. However, doing so would be unfair to our hypothesis because we actually consider the situation when these flanking bases are not fixed. Anyway, out of the remaining 13 pairs of legitimately complementary anticodons, we have 6 pairs in the upper part of Fig. 1 (second column amino acid/class II aaRS and fourth column amino acid/class I aaRS), four of which show the G2–C2 combination, and one (tRNAAla–tRNAArg), the “wobbling” G2–U2. Assuming that U at the second position of tRNAArg is a derivative of C2 → U2 transition, we added this pair to the above four. Seven pairs in the lower part of Fig. 1 (first column amino acid/class I aaRS and third column amino acid/class II aaRS), all possess the G2-C2 combination. Pooling both upper and lower parts of Fig. 1 together, we obtain 12 of 13, a highly nonrandom result. Pooling is justified by the fact that it corresponds to the “vertical” division of Fig. 1 such that first and second columns are grouped in one half (as well as third and fourth columns, respectively) as a precondition. The probability of randomly obtaining 12 of 13 (or better) for a two-tail binomial distribution is about 0.006. This said, we would like to note that our stringent criteria for the selection of “eligible” pairs significantly increased the random probability value. A rough estimate for the more realistic set up gives about 10−5. Yet, the 0.006 value is, obviously, highly significant by itself while being guarded against type I error. We conclude, therefore, that the 2-71 position distribution is definitely nonrandom.

Figure 1.

Figure 1

Dual complementarity of the second bases in consensus tRNAs and their aaRS synthetase class membership. Shown are class I × class II pairs of amino acids with their corresponding complementary anticodons and second bases in the 5′ acceptor strand of cognate consensus tRNA genes. The pairs are arranged according to the four-column organization of the genetic code (Table 1). As a result, nonrandom four (2 × 2) modes of primordial tRNA–aaRS recognition becomes evident. Indeed, although four hydrophobic amino acids of the first column (Val, Ile, Met, and Leu) and three small amino acids of the second column (Ala, Pro, Ser) have the same G2-C71 base pair, these two groups are recognized by the different synthetases. The same is true for their complementary partners with C2-G71 from the third and fourth columns, respectively. An asterisk marks the “nonsense” anticodon TCA that might be originally assigned to Trp as in the present-day mitochondrial code (18). D, mean values of physico-chemical distances (from ref. 19), calculated for each of the four groups and for the two pairs of complementarily linked groups (bold). The ambiguous “S” symbolizes either G or C. Even if the corresponding four pairs (with tRNACys and tRNATrp) are included into the statistical test, they should fit the “2 × 2” matrix of the genetic code at least in two cases. In sum, 14 of 17 pairs show the dual complementarity correlated with this “2 × 2” matrix. If in addition the “wobbling” G-U partnership is assumed between the first and third bases of anticodons (see text), this gives a total of 24 of 28 such pairs, which is evidently a nonrandom value.

Thus, with regard to the second base pair of the acceptor stem, the genetic code is dissected in two “vertical” complementary halves (Table 1), in accordance with the first step of gradual ambiguity reduction process (20). Subsequently, the core four-column structure of the code has been generated from transitions G → A and C → U at the central position of the originally also (G,C)-containing single-stranded anticodons (Fig. 1). Remarkably, this four-column organization was, in fact, pre-ordained by the double-stranded acceptor code. Indeed, Figs. 1 and 2A make it easier to notice that the two complementary minimalist aaRSs multiplied by the two, also complementary, G2-C71 and C2-G71 targets give precisely the desirable four modes of recognition. Note also that the second base pair-associated distribution of tRNAs (Fig. 2) correlates with amino acid similarity (see average D values in Fig. 1) and, moreover, nearly conforms to the partition of aaRSs in subsequent groups within each of the two classes (2125).

Figure 2.

Figure 2

Shown are the schematic modes of original tRNA–aaRS recognition with the single-stranded anticodon derived either from the 5′ strand (A) or the 3′ strand (B) of the primordial palindromic acceptor (6). The latter is presented in the center with the second base pair in boldface. The two types of proto-aaRSs are drawn as approaching the acceptor from the opposite sides, in accordance with the complementarity of the head-to-tail aligned signature motifs of aaRSs from the two classes, remarkably just within their catalytic domains (6, 10). In turn, each of the two proto-aaRSs is subdivided into two groups (I1, I2 and symmetrically II2, II1) depending on the second base pair, C2-G71 or G2-C71, respectively. For each amino acid, the corresponding third base of the acceptor and 73rd base-determinator are shown in brackets; the G3 for ALA bears an asterisk since the opposite “wobbling” U70 base strongly determines the identity of tRNAAla (2, 3). tRNAHis is also marked as a unique tRNA with the paired base-determinator, G-1:G73. Some of amino acids are marked by α or β in accordance with either α-helical or β-barrel structure of the aaRS anticodon-binding domain. Italicized are tRNAs for Cys and Trp with ambiguous consensus “S” at the acceptor second position symbolizing either C or G (Table 2); accordingly their anticodons could originate from 5′ or 3′ strand. In bold are shown the amino acids whose anticodons have the same second base as in the acceptor. Four of them, GLY, ALA, PRO (class II), and ARG (class I), are shown in boldfaced capital as they are presumably the first ones recruited in translation. Note that tRNAs for all the four “early” amino acids have the anticodon of the 5′ strand origin (A). In normal type are shown the amino acids with a changed central base in the anticodon caused by C → U or G → A transitions. Two-sided arrows connect tRNAs with complementary anticodons. In A, most such tRNAs are aminoacylated by complementary aaRSs—i.e., I1 × II1 and I2 × II2. Contrarily, in B, all complementarily connected amino acids except Arg have been involved later in translation. Again a majority of these are activated by aaRSs of the same class—i.e., I1 × I2 and II1 × II2. N = {G,C,A,U}, B = {G,C,U}, R = {G,A}, Y = {C,U}, K = {G,U}, M = {A,C}, S = {G,C}, W = {A,U}.

In previous studies (6, 10) we repeatedly accentuated how truly puzzling is the hypothesis of independent origins of the two aaRS classes in two archaic translation systems with their subsequent “symbiotic” integration in evolution. Instead, we showed a possibility of their concerted origin from the complementary strands of primordial short genes (10). However, it was a sort of an ad hoc explanation ignoring the question of why this possibility has been realized. Formulated more concretely, the question is whether usage of the two proto-aaRSs was “necessitated” by the primordial operational code. The question is not idle, since each of the two aaRS classes seems to have evolved from a sufficiently flexible structural prototype to potentially generate all 20 individual enzymes, 1 for each of 20 amino acids. Quite ironically, it is the anticodon-binding units of aaRS which illustrate this flexibility since both aaRS classes are equally idiosyncratic in using either α-helical or β-barrel domains for anticodon recognition (Fig. 2). Not so with the acceptor-binding domains. The two aaRSs recognize the acceptor helix from opposite sides: a class I aaRS approaches the helix from the side of its minor groove and attaches the amino acid to the 2′OH group of the terminal adenosine ribose while a class II aaRS approaches from the side of major groove and attaches the amino acid to the 3′OH group (21). As a consequence, the two complete tRNA–aaRS complexes look like mirror images of one another (22). Thus, rationality of using the two complementary aaRSs instead of either one could be originally dictated by the double-helical structure of the acceptors. If so, aaRS domains interacting (if at all) with single-stranded anticodons are not expected to be class-defining units. This actually holds as we just emphasized, and once again points to the historically second-rate role of the anticodon in charging the earliest tRNAs with cognate amino acids.

An Enlarged Complementary Partnership of the Early (G,C) Code

Fig. 1 shows that 17 of 20 amino acids are encompassed by the class I × class II mode of tRNA–aaRS recognition. Exceptional are only three amino acids, Tyr, Gln (class I, four anticodons total), and Gly (class II, four anticodons). However, the acceptor (G,C) code assumed more a broad, complementary partnership between amino acids than that in the subsequent complete (G,C,A,U) code. For example, the single-stranded anticodons GAC of Val (class I) and GCC of Gly (class II) are not complementary because of the central mispairing of A∗C, while their presumed precursors in the acceptor (G and C) do complement each other (Fig. 1). In other words, the precursor of tRNAGly might originate in a pair with that for tRNAVal as well as tRNAAla. Thus, again, unlike the classic code, its (G,C) precursor in the acceptor better fits the rule we heuristically formulated earlier (10): tRNAs with complementary anticodons are recognized by the complementary synthetases. This explains an overall balance between the two classes of aaRSs and the two equal partitions they determine, one for amino acids (10:10) and another for cognate codons (32:32), respectively (6). Such a balance would be unthinkable assuming the independent origin of the two aaRS classes.

Why Didn’t the Acceptor Code of Aminoacylation Get Extended in Unison with and Exactly in the Same Way as the Classic Genetic Code?

It would seem that exactly the same parallel (G,C) → (G,C,A,U) expansion of the acceptor code could be indeed the easiest solution of all the problems. We think that this path has not been exploited not only because the (G,C) operational code, as any fundamental invention, was established once and for ever, but also because it is double-stranded. Consequently, transition from G2-C71 pair to A2-U71 pair requires two independent base substitutions in the 5′ and 3′ strands of the acceptor stem and, what is critical, this transition cannot pass through the destructive A⋅C intermediate (when G mutates to A first). The same obstacle is evident for the complementary C2-G71 → U2-A71 events in the mirror proto-acceptor (Fig. 2). The primordial aaRSs have had to wait for coadaptive changes as well. Moreover, the same constraints have had to be very critical for the two aaRSs themselves, if they were originally also double-strand encoded. Interestingly, of all inferred consensus acceptors, there is only one (tRNAArg with the anticodon GCG) which contains the U2-A71, whereas the wobbling G2-U72 and U2-G72 pairs are not so rare. With more exceptions, the same pattern is the case for the flanking 1–72 and 3–70 paired positions.

The above argument looks contradictory in the case of (A,U)-rich acceptors of mitochondrial tRNAs. However, in our scenario, the mitochondrial versions of both the operational and classic codes have a secondary origin—i.e., mitochondrial coding system evolved after the two “cytoplasmic” codes have been basically shaped. Under shelter of the two codes, mitochondrial tRNAs might be less constrained with regard to G-C → A-U double transitions in their double-helical regions. Consequently, with few exceptions, mitochondria-specific synthetases have evolved to charge their own mitochondrial tRNAs.

In any case, it was certainly simpler for early aaRSs to recruit the impaired 73rd “base-determinator” to reduce ambiguity of the operational code. One can see in Fig. 2 that in addition to the four major groups, determined by the “2 × 2 = 4” matrix, a primordial diversity of the 73rd base together with the third base pair might be sufficient to discriminate, weakly at first, most of “canonical” 20 amino acids within as well as between the four groups. Moreover, since the two “mirror” aaRSs could preferably recognize the opposite, complementary, faces of their tRNA acceptors, the impaired 73rd base-determinator on the tRNA 3′ tail should be differently exposed to class I and class II aaRSs. According to the scheme in Fig. 2, a more important role is expected for the base-determinator in recognition by class II enzymes. Indeed, as it has been already noticed, tRNAs aminoacylated by class II aaRS do show larger diversity at the 73rd site (12, 21). Needless to say, later in tRNA–aaRS coevolution, some bases (both unpaired and paired) outside of the acceptor were recruited as additional determinants of tRNA identity (12).

Complementary Partnership of Evolutionarily Early and Late Amino Acids

Four complementary partnerships are theoretically conceivable for amino acids grouped by the second base pair: the two that are recognized by the complementary aaRSs (Fig. 2, I1 × II1 and I2 × II2) and the other two recognized by aaRSs of the same class (I1 × I2 and II1 × II2). The actual partnership turned out to depend on which of the acceptor helix strands, 5′ or 3′, served a progenitor of the single-stranded anticodon. The (I1 × II1) and (I2 × II2) complementary partners are typical for the 5′ strand origin of anticodons (Fig. 2A), whereas the (I1 × I2) and (II1 × II2) ones evidently prevail for the 3′ strand origin of anticodons (Fig. 2B). The 3′ strand origin may explain some relationships between amino acids, their tRNAs and cognate aaRSs which may seem illogical at first glance. For example, tRNAGly and tRNAPhe have the “diametrically” different anticodons (Table 1), but they turned out to share quite similar acceptors (Fig. 2). Therefore, it is not surprising that these two so different amino acids (DPhe↔Gly = 4. 14 in ref. 19) are activated by such similar aaRSs (both are of a unique α2β2 quaternary structure and share in total 100 identical residues within the catalytic domain), including evidently homologous 20-residue-long fragments in the N terminus (23).

The 5′ strand of the acceptor outnumbers the 3′ strand as a precursor of single-stranded anticodons. It seems quite rational because complementary synthetases activate amino acids with contrasting properties (see average D values in Fig. 1). Therefore, if 5′ and 3′ strands were equally used to generate complementary anticodons (Fig. 3A), then an effect of wrong aminoacylation would be far more negative than it actually is in Fig. 2A. Appearance of single-stranded anticodons is followed by the C → U and G → A transitions in the central site of their original (G,C) versions. Alternatively, more uncommon variants shown in Fig. 2B might have emerged later from G ↔ C transversions involving the second base of either acceptor or anticodon of the 5′ strand origin. Either way, there is one principal difference between Fig. 2 A and B: all the (G,C)-composed anticodons (for Gly, Ala, Pro, and Arg) are contained in Fig. 2A, and correlatively all “early” amino acids are here as well. Included in Fig. 2B are the tRNA–aaRS complexes and amino acids of certainly late recruitment in protein synthesis, most likely already after single-stranded anticodons began to play a significant role in aminoacylation, likely by time when translation fidelity has become quite sufficient to protect ancestral genomes from a risk of “error catastrophe” (26), and hence to increase their informational capacity by gene duplications. Accordingly, coevolution between redundant tRNAs and aaRSs could become more free and easy. We believe it is the case of tRNAs for six amino acids (Val, Ile, Lys, Leu, and probably Cys and Trp) with varying G2-C71 or C2-G71 base pairs.

Figure 3.

Figure 3

(A) Hypothetical pair of tRNAArg and tRNAAla with complementary anticodons originated from the opposite strands of the same acceptor. This pair looks extremely disadvantageous because incorrect aminoacylation, quite likely for identical acceptors, had to involve very dissimilar amino acids, thus leading to large pleiotropical perturbations of all early proteins. (B) The above second base-associated disadvantage, inherent in the situation (A), appears as an important constraint on the primordial double-strand coding. Shown is an example of quite conservative mutation (Arg → Lys) caused by a G → A transition at the codon second position in one strand, inevitably accompanied by a detrimental mutation (Ser → Phe) caused by symmetric C → T transition in the complementary strand. Analysis of the code (Table 1) shows that most of all such pairs of complementary base substitutions result in either one or even both detrimental mutations. (C) Two examples showing that of the two DNA strands, the coding nontranscribed one is, on average, more vulnerable regarding protein function when the C → T transition originates at the hypermutable CpG dinucleotides within TCG and CGA codons, respectively. Nonconservative missense mutation Ser → Leu in TCG codon on the nontranscribed (coding) strand is complemented by a silent mutation on the opposite transcribed (noncoding) strand; similarly, detrimental Arg → Stop mutation in CGA codon on the nontranscribed strand is mirrored by an evidently more conservative Arg → Gln missense mutation on the transcribed strand. All other CpG-containing codons (three XCG and three CGX) exhibit the same strand asymmetry. Its application to the comparison of different tumor-attributed versus evolutionary mutation databases will be published separately.

The Double-Stranded (G,C) Code-Based Mutational Asymmetry of DNA Strands

So, based on the double-stranded (G,C) operational code and ancient double strand coding for the two aaRS classes (10), the assignment of amino acids to the four tRNA–aaRS recognition modes (Fig. 2A) was optimal enough to minimize negative effects of wrong aminoacylation. But it is clearly not so with the double-strand coding itself. Originally the latter seems to have been quite advantageous when erroneous primitive translation essentially restrained genome size increase (10). At the same time, it is the second base of codons which strongly constrained direct coevolution of archaic proteins encoded by complementary strands because the second base-associated conservative amino acid substitutions on one strand is usually complemented by the detrimental substitution on the opposite strand (Fig. 3B). This hindrance was gotten around by “discrimination” of the DNA strands on the coding (“sense”) and the noncoding (“antisense”) ones. Yet, because of its early double-stranded form, the code was somewhat destined to not be mutationally resistant to this subsequent discrimination. Interestingly, a largest mutational asymmetry of DNA strands is associated with eight codons (CGX and XCG) containing the CpG mutational hotspot. Since the CpG is a palindrome, its complementary counterpart is also CpG. However, as Fig. 3C exemplifies, the spontaneous CpG → TpG transitions (especially frequent for methylated mCpGs) cause definitely more severe amino acid substitutions when the transitions occur on the coding (nontranscribed) strand than when the mirror symmetric events occur on the noncoding (transcribed) strand. Accordingly, by measuring the simple ratio CpG → TpG/CpG → CpA, we can estimate contribution of selection into mutational spectra of various genes, and so distinguish selection from preceding processes of mutagenesis and repair.

On Ribozyme Precursors of aaRSs

In all the above considerations, very short complementary proteins were posed as remote ancestors of the two catalytic domains of modern aaRSs. If even weak selective complexes (of the type analyzed in ref. 27) between amino acids and the anticodon-like triplets at 1-2-3 positions of the acceptor were possible, then it becomes unnecessary for ribozymic predecessors of aaRS to be postulated. Yet, it has not escaped our attention that the perfect complementary symmetry of real tRNA recognition by the hypothetical ribozymes invites us to consider the possibility that one tRNA could originally catalyze the aminoacylation of its complementary partner, and vice versa. The tRNA-like structure of the catalytic center of many ribozymes and that of the group I introns in particular (28) is of certain interest in this respect. All the more intriguing is the fact that the 5′ transcribed mRNA sequence of E. coli ThrRS folds into a cloverleaf structure based on significant homology with the tRNAThr isoacceptors (29).

Conclusion

The operational RNA code of aminoacylation (25) seems to have started as (G,C) restricted anticodon/codon-like pairs at the 1-2-3 positions of the original acceptor microhelix. Duplications of their 5′ strands have produced anticodons per se, thus providing a possibility for genetically coded synthesis of first short proteins; predecessors of the two complementary aminoacylating modules of modern aaRSs also being likely among them. These two complementary proto-aaRSs interacted in a mirror manner with the two palindromic acceptors having in their center complementary second base pairs, G2-C71 and C2-G71, thus generating four (2 × 2) modes of recognition. These primordial four modes in fact “presaged” the core four-column organization of the classic genetic code (Table 1) and may be a “missing link” in the scenario (5, 30) of evolutionary transition from noncoded protein synthesis to genuine translation.

Because of the double-helix structure, the operational code of the acceptor was unable to expand from the limited binary (G,C) form toward the complete (G,C,A,U) one. It appears that the evolutionary paths of the two codes were condemned to diverge. Obviously, expansion of single-stranded anticodons did not have any constraints so critical for their double-stranded ancestors. Quite the opposite, the concepts of evolutionary continuity in general and ambiguity reduction in particular (20) let us to state that only after sufficient fidelity of aminoacylation based on the nondegenerate operational (G,C) code of the acceptor has been secured, and, accordingly, only after the first primitive aaRSs have become reproducible, does the possibility of the (G,C) → (G,C,A,U) expansion of the single-stranded classic code emerge. In this scenario, the genetic code looks like a “frozen” accident (ref. 31, see also refs. 1 and 2), but only beyond the “2 × 2 = 4” scope.

Acknowledgments

We thank A. Rodin, E. Roberts, H. Hartman, and reviewers for critical comments and useful suggestions. Special thanks are due to S. Bates for reading the manuscript. This work was partly supported by Russian Academy of Sciences programs “Geninform”, “Human Genome,” and “Frontiers in Genetics.”

ABBREVIATION

aaRS

aminoacyl-tRNA synthetase(s)

References


Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES