Abstract
Background/Objectives: tRNAs, tRNAomes, aminoacyl-tRNA synthetases (AARSs), the first proteins, ribosomes and the genetic code coevolved. We utilize sequence data to reconstruct key steps in establishing the first code on Earth. Methods: Networks were constructed to describe initial tRNAome and AARSome evolution. Results: tRNA-34 wobble and tRNA-37 modifications were necessary to evolve the code, as were additional tRNA modifications, so diverse tRNA modification enzymes (i.e., histidyl-tRNA -1 GTP synthase) are among the first proteins. tRNA-linked chemistry brought asparagine, glutamine, cysteine and possibly additional amino acids into the code. tRNA, tRNA modifications and tRNA-linked chemistry were core founding innovations for code evolution. Coevolution of AARSomes was also essential. Class II and class I AARSs have distinct folds but are nonetheless homologs by sequence. Early AARS enzymes folded around Zn motifs. Networks were generated for tRNAomes and AARSomes in ancient Archaea, because Archaea are the closest living organisms to the last universal common ancestor. Conclusions: The first code on Earth was surprisingly ordered, and the few apparent deviations from the regular order can yet be explained. Early in the evolution of the code, innovation was more strongly selected than accuracy. The code froze, however, because of evolving fidelity mechanisms. A historical record was documented in tRNA and in the genetic code structure and has been preserved in living organism sequences. AARSome structure describes the first code evolution more adequately than tRNAomes.
Keywords: genetic code, tRNA, aminoacyl-tRNA synthetase, tRNA modifications, network analyses, last universal common (cellular) ancestor, tRNA-linked chemistry, abiogenesis, astrobiology
1. Introduction
To evolve complex life requires a genetic code, which requires a genetic adapter. Without a code supported by an adapter molecule, the potential to evolve enduring and replicated complexity based on pre-life metabolic systems remained limited [1,2,3,4]. Life on Earth evolved around tRNA, tRNAomes, AARSomes, first proteins, ribosomes and the genetic code [5,6,7,8,9,10]. The purpose of this review is to concentrate on early tRNAome and AARSome networks to describe the evolution of the first code on Earth.
For evolution of the code, tRNAs must diversify to tRNAomes. Most tRNAs are type I, initially, with a 5 nt V loop (V for variable). In Archaea, longer type II V arms (initially 14 nt) are utilized by tRNALeu (5 tRNALeu) and tRNASer (4 tRNASer). The type I V loop was processed from the primitive type II V arm by a 9 nt internal deletion [11]. Leucine and serine occupy 6-codon sectors of the code, so their longer V arms were used in place of the anticodon stem–loop–stem as a major determinant for cognate AARS recognition. Arginine is also found in a 6-codon sector of the code (5 tRNAArg). Arginine utilizes significant anticodon loop unwinding to expose additional bases for recognition. It is not likely that the strategy utilized for arginine could support three amino acids in 6-codon boxes. Anticodon loop unwinding indicates allosteric effects of cognate tRNA-AARS binding [11,12,13,14]. Complex life on Earth evolved around tRNA, tRNAomes and AARSomes.
AARSomes diverged from class II to class I enzymes [15,16,17,18]. GlyRS-IIA (class II; subclass A) appears to be closest to the founding AARS. It follows that glycine was the founding amino acid in the code [19,20]. All class II enzymes were derived from GlyRS-IIA as the root sequence. A primitive ValRS-IA (class I; subclass A) was derived from a primitive GlyRS-IIA by appending an N-terminal extension, which redirected to the class I AARS fold. Early folds of class II and class I AARS were directed by Zn binding. All class I AARS appear derived from a primordial ValRS-IA as the root enzyme. AARS enzymes are analyzed for: (1) tRNA contacts; (2) tRNA deformation (allostery); (3) modifications of the anticodon loop; (4) amino acid identity (chemical features); and (5) fidelity (i.e., editing/proofreading). These characteristics appear to be most central to the establishment of the first code. At early stages, code innovation was more important than fidelity. At late stages, fidelity mechanisms froze the code.
We utilize the ancient Archaeon Pyrococcus furiosus as a reference species that may be close to the last universal common (cellular) ancestor (LUCA) for translation functions [21]. The P. furiosus tRNAome is tightly clustered around the primordial tRNA sequence. Similarly, the AARSome appears to be diverged in an orderly manner from the primitive GlyRS-IIA root sequence. Of course, tRNAomes and AARSomes must diverge from root sequences to maintain cognate translational discrimination and accuracy.
As described in analyses of tRNAome and AARSome sequences, the evolution of tRNA and the genetic code fits naturally with the amyloid hypothesis for the origin of life [22,23]. We imagine a complex ribozyme, co-factor, tRNA–amino acid, tRNA–peptide, metabolism, polyglycine, polyGADV, amyloid, coacervate, protocell world coevolved to emulsify and form the first cells [24,25,26,27,28,29]. According to our understanding, an RNA world lacking complementary components is an oversimplification. Analyses of tRNA sequences and evolutionary history reveal a complex pre-life world with substantial chemical and metabolic capacities. The first proteins that coevolved with the code were long, complex and specialized functions [15,30,31].
2. Materials and Methods
Based on our previous analyses, P. furiosus was the reference species chosen to be similar to LUCA for translation functions [21]. In the future, selecting a more advantageous reference species that is closer to LUCA may be possible and may provide additional insights. The idea behind the choice of P. furiosus was to anchor our research to a system lacking huge divergence from the first code. The root sequence for tRNA evolution has been determined and essentially matches a typical tRNA from P. furiosus [31]. Defining or estimating root sequences is fundamental to understanding early evolution and the pre-life-to-life transition. It appears that a primitive GlyRS-IIA diversified to all class II AARSs. A primitive GlyRS-IIA apparently diverged to a primitive ValRS-IA by attachment of an N-terminal segment that redirected the protein fold [15]. All class I AARSs appear to diverge from a primitive ValRS-IA.
Sequence similarity of class II and class I AARSs has been demonstrated. For instance, the sequence similarity of Methanobacterium bryantii IleRS-IA (a class I AARS) and Methanococcoides burtonii GlyRS-IIA (a class II AARS) was determined with an e-value of 4 × 10−12 for a substantial in-phase alignment [15,18]. The e-value represents about 1 chance in 2.5 × 1011 of the alignment resulting from a random occurrence. Many more examples of class II versus class I AARS homology can readily be obtained. These data are inconsistent with some other published models for class II and class I AARS evolution [32,33,34,35]. The Carter–Ohno–Rodin model indicated that class I and class II AARSs evolved separately, each from a complementary strand of a short ancestral bidirectional gene. In their model, small class I and class II AARS “urzymes” expanded to modern AARS enzymes. Because class II and class I AARS enzymes are linear, direct homologs by sequence, the Carter–Ohno–Rodin model cannot be correct. Also, class II and class I AARSs were complex enzymes and not urzymes, being first enzymes that coevolved with the genetic code.
A 2-dimensional network for P. furiosus AARS enzymes was previously published [18]. Phyre 2 scoring of structural and sequence similarity was used to draw maps for class II and class I AARSs separately. Because high homology scores in Phyre 2 were assigned to closely related enzymes, reciprocal scores were used to draw the maps. At the time the maps were constructed, there was no objective mechanism to incorporate the sequence similarity of GlyRS-IIA, IleRS-IA and ValRS-IA. Phyre 2 could not do or score this alignment because of the distinct class IIA and class IA folds.
AARS enzymes utilize editing reactions in their aminoacylating active site (i.e., ValRS-IA, MetRS-IA, IleRS-IA, LeuRS-IA, AlaRS-IID, ThrRS-IIA, ProRS-IIA and SerRS-IIA) that blocks the incorporation of non-cognate amino acids. In separate proofreading active sites (i.e., ValRS-IA, IleRS-IA, LeuRS-IA, PheRS-IIC, AlaRS-IID and ThrRS-IIA), some AARSs remove non-cognate amino acids after attachment to a cognate tRNA [36,37]. Editing and proofreading assays were done in a small number of reference organisms, so results may not universally apply.
Structural studies also tend to be done in a small number of reference organisms. ChimeraX (version 1.10) was used to generate molecular graphics [38,39,40]. AARS structures were selected that were close to the first enzymes. AARS structures were drawn to describe the most important tRNA contacts and deformations and are not meant to be a complete description.
The Modomics database was used to identify anticodon loop modifications [41]. P. furiosus anticodon loop modification data were obtained from reference [42].
tRNA structures were drawn and colored according to internal homologies, based on the three 31 nt minihelix tRNA evolution theorem, as previously described [15,30,31]. Historical numbering of tRNAs can be confusing, particularly within the D loop and V loop. Here, we number the D loop D1 to D17 and the V loop V1 to Vn (V loop of n bases; initially, V1 to V5 for type I and V1 to V14 for type II; V1 to V5 for type I align with V1 to V5 for type II). Type I V loops and type II V arms have been misaligned in tRNA databases.
3. Evolution of tRNA
The evolution of tRNA has been solved to the last nucleotide (Figure 1 and Figure 2) [15,30,31]. The sequences of tRNAs in living organisms validate the model. In Figure 1, the linear sequences that formed tRNAs are shown. In Figure 2, the folding of the linear tRNA sequence is indicated. tRNA evolved from 100% RNA repeats and inverted repeats of known sequence plus 3′-ACCA (Figure 1; lines 1–8). ACCA was the primordial adapter molecule. In pre-life, ACCA-Gly was ligated to numerous RNAs to synthesize polyglycine. Multiple ACCA-Gly (line 6) paired with an extended GCGGCGGCG repeat (line 1) are capable of synthesizing polyglycine using wet–dry cycling and published procedures [43]. tRNAomes in ancient Archaea validate the model for tRNA evolution to such an extent that the model is referred to as a theorem (a proven theory). A typical tRNA from an ancient Archaeon has the same sequence as is shown (Figure 1; lines 12 and 13) [15,30,31]. Sequence logos demonstrate the theorem. tRNA and 31 nt minihelices evolved initially as improved mechanisms to synthesize polyglycine. tRNA evolved from the ligation of three 31 nt minihelices (one D loop minihelix (line 9) and two anticodon (Ac) stem–loop–stem minihelices (line 10)). The ligation of minihelices formed long RNAs from which segments were removed by endo- and exo-nucleases, selecting RNA stem lengths and stem–loop–stems. In tRNA, the 17 nt Ac stem–loop–stem and the 17 nt T stem–loop–stem are homologs, and obviously so, just from inspection. By contrast, the 31 nt D loop minihelix had a 17 nt UAGCC repeat core (D1-UAGCCUAGCCUAGCCUA-D17) (line 7). The small number of sequence changes from the primordial sequence, indicated in white bold, were selected to support the tRNA fold.
Figure 1.
tRNA evolved from RNA repeats and inverted repeats (stem–loop–stems) of known and conserved sequence. See the text for details. ACCA-Gly (line 6) was the primordial adapter molecule. Colors reflect internal homologies and are consistent throughout the figures. The tRNA precursor (line 11) was generated by the ligation of a 31 nt D loop minihelix (line 9, lacking 3′-ACCA) and two 31 nt anticodon (Ac) minihelices (line 10, lacking 3′-ACCA). Internally deleted bases are indicated in bold with strike-throughs. Bases in white bold are anticodon bases or sequence changes to support the tRNA fold. During pre-life, RNAs were formed with and without ligated 3′-ACCA. Underscored positions separate sequence elements and indicate how stem and loop sequences were selected. We posit that GCC may have been the primordial anticodon. GCC can utilize a GGC repeat (line 1) as a pre-mRNA. Secondary structures are indicated (parentheses stand for stems; * stands for loops).
Figure 2.
The tRNA fold caused selection of a small number of systematic sequence changes compared to the tRNA precursor. (A) A linear comparison of the tRNA precursor, type II tRNAPri and type I tRNAPri. (B) Type I tRNAPri. (C) Type II tRNAPri. B indicates G, C or U, not A (A is not utilized in the wobble tRNA-34 position in Archaea). Black lines indicate the Levitt base pair (Lbp) and elbow contacts (where the D loop binds the T loop). White bold bases are the anticodon and systematic sequence changes before LUCA to support the tRNA fold. As) stands for acceptor stems; Ac) stands for anticodons; SLS) stands for stem–loop–stems.
Figure 2 shows how the tRNA fold caused selection of a small number of changes in the primordial tRNA precursor sequence. The tRNA precursor was a replication intermediate for the 31 nt minihelix world. tRNA was generated by a folding and processing “error”. Type II tRNA was generated from the precursor by a single internal 9 nt deletion within ligated 3′- and 5′-acceptor stems. Type I tRNA was generated by an additional internal 9 nt deletion within ligated 3′- and 5′-acceptor stems, within the V arm region, so a primitive type II tRNA could have been processed by an internal 9 nt deletion to type I tRNA. The two 9 nt internal deletions to form type I tRNA were identical on complementary strands.
There are alternative published views of tRNA evolution, but they cannot be correct because they are falsified by existing tRNA sequences. No accretion model, convergent model or two minihelix model can possibly be correct [44,45,46,47,48,49,50,51]. Clearly, the 17 nt anticodon stem–loop–stem and the 17 nt T stem–loop–stem of tRNA are homologous sequences [15,30,31]. No two minihelix model, convergent model or accretion model is consistent with this obvious homology. Just as clearly, the 17 nt D loop minihelix core is based on a UAGCC repeat. Many D loops in ancient Archaea have two perfect UAGCC repeats [52,53,54]. Acceptor stems and acceptor stem remnants (i.e., type II V arms and type I V loops) are based on GCG and CGC repeats. tRNAomes diverged from a single root tRNA sequence. The three 31 nt minihelix model made and fulfilled predictions: (1) tRNA sequences in ancient Archaea and their radiations to tRNAomes can be understood; (2) sequence logos support the model; (3) statistical evaluations of sequences support the model; (4) primordial tRNA sequences have been confirmed; (5) proper alignments can be adjusted for D loops and V loops; and (6) the proper alignment of type II and type I V arms and V loops has been obtained (a type I V loop was derived from a 9 nt deletion within a 14 nt type II V arm) [11,15,30,31] (Figure 1 and Figure 2). A tRNA sequence has been traced to its root in pre-life. Tracing root sequences to pre-life strongly supports tRNA as the primordial function around which translation systems and the genetic code evolved. If the three 31 nt minihelix tRNA evolution theorem is accepted and applied, tRNA evolution and the evolution of the genetic code can be understood. If the theorem is rejected, no adequate description of tRNA evolution or of the genetic code can be reasonably inferred. All two minihelix and accretion models are random sequence models that cannot be supported or rationalized using sequences of ancient Archaeon species.
4. AARS Enzymes at the Base of Code Evolution
4.1. The AARS Mechanism
The AARS enzyme reaction is complex [36,37,55]. Within the aminoacylating active site of AARSs, the amino acid carboxy terminus reacts with ATP to form an AMP adduct (aa-C=O, -O–AMP), releasing pyrophosphate. The tRNA 73-NCCA (N is the discriminator base) end displaces AMP to bind the aa-C=O, -O-tRNA, releasing AMP. Class II AARSs attach the 76A ribose 3′-O to the cognate amino acid. Class I AARSs attach the 76A ribose 2′-O to the cognate amino acid. ATP, the cognate amino acid and the cognate tRNA are substrates. Because the reaction progresses in two steps, the order of substrate additions may be important for AARS enzymes, affecting whatever reaction intermediate analogues or AARS-tRNA conformations can be visualized in crystal or cryo-electron microscopy structures. Pyrophosphate, AMP and aa-tRNA are products. In structures, non-reactive aa-AMP analogues were sometimes used to mimic a reaction intermediate.
4.2. GlyRS-IIA
A primordial GlyRS-IIA appears to be the founding AARS. All class II and class I AARS enzymes appear to be derived from this root. Figure 3 shows a GlyRS-IIA-tRNAGly (CCC) structure from Homo sapiens [56]. Human GlyRS-IIA is similar in structure and sequence to archaeal GlyRS-IIA. GlyRS-IIA is an α2-dimer, but the image is of only a single GlyRS-IIA-tRNAGly (CCC). The image was selected to demonstrate primary tRNAGly contacts to the anticodon loop and the tRNA 73-ACCA-76 3′-end. As a class II AARS, GlyRS-IIA has its aminoacylating active site on a surface of antiparallel β-sheets. The GAP reaction intermediate analogue and the 73-ACCA-76 sequence identify the aminoacylating active site. tRNAGly utilizes 34-CCC-36, UCC and GCC anticodons (anticodons are underlined for clarity), so the strongest interactions with GlyRS-IIA might be 35-CCA-37, as indicated in the structure.
Figure 3.
GlyRS-IIA-tRNAGly (CCC) from H. sapiens. A primitive GlyRS-IIA appears to be the founding AARS. This image was selected to emphasize tRNA contacts. Proteins are in beige. β-sheets are in cyan. tRNAs are colored according to the three 31 nt minihelix tRNA evolution theorem (Figure 1 and Figure 2). Some GlyRS-IIA amino acids that were not imaged are noted. Lbp stands for the Levitt base pair. The elbow is where the D loop binds the T loop.
Figure 4 shows tRNAGly (CCC). Figure 4A is the primordial tRNAGly (CCC). Figure 4B is the P. furiosus tRNAGly (CCC). Figure 4C is the human (Hsa) tRNAGly (CCC), as in the structure in Figure 3 [41,54,57,58]. Modifications to the anticodon loop are indicated and explained in the figure legend. Conserved bases compared to the primordial tRNAGly are in bold in Figure 4B,C. As previously described, the primordial tRNAGly (CCC) is a highly ordered sequence formed from GCG (5′-acceptor stem and 5′-acceptor stem remnant (5′-As*)), CGC (3′-acceptor stem and 3′-acceptor stem remnant (type I V loop)), and UAGCC (D loop) repeats and inverted repeats (stem–loop–stem; ~CCGGG_CU/CCCAA_CCCGG; _ indicates separation of stem and loop; / indicates a U-turn; the anticodon is underlined) [15,31]. A few deviations from the perfectly ordered initial sequence are noted (Figure 4A). These deviations, which pre-date LUCA, support the tRNA fold. D12G (replacing D12A in the third UAGCC repeat) intercalates between 57A and 58A and hydrogen bonds to 55U. D13G forms a Watson–Crick pair with 56C. These are referred to as “elbow” contacts, where the D loop binds the T loop to stabilize the tRNA form [15,59]. The T loop is strongly selected to obtain the typical sequence UU/CAAAU to maintain the interaction with the D loop at the elbow.
Figure 4.
tRNAGly (CCC). (A) A primordial tRNAGly (CCC). (B) P. furiosus tRNAGly (CCC). (C) Human tRNAGly (CCC). tRNAs are colored according to the three 31 nt minihelix tRNA evolution theorem (Figure 1 and Figure 2) [15,31]. In Figure 4A, the Levitt base pair (D8G = V5C) (Lbp) and some elbow contacts are indicated. The Levitt base pair is a reverse Watson–Crick pair that forms two hydrogen bonds. D12G intercalates between 57A and 58A and hydrogen bonds to 55U [60]. D13G forms a Watson–Crick pair with T loop 56C. Anticodon sequences are underlined or in white bold. / indicates a U-turn. Modifications of the anticodon loop are indicated in the sequences. Modomics notation is used for tRNA anticodon loop modifications [41]. xU indicates an unknown 5-carbon U modification to suppress superwobbling. Yellow arrows indicate features that may be of interest. 32U-38U is expected to alter dynamics of the anticodon loop.
In P. furiosus, tRNAGly is the most similar tRNA to tRNAPri (Pri stands for primordial) [21,41]. The acceptor stem matches the primordial sequence in all but 1 bp. The primordial acceptor stem sequence is matched perfectly in some Archaea (i.e., Staphylothermus marinus tRNAGly (GCC); GCGGCGG; a GCG repeat) [52]. In tRNAGly (CCC) of P. furiosus, the D loop is intact (D1-UAGUCUAGCCUGGUCUA-D17; Figure 4B) and very similar to tRNAPri (D1-UAGCCUAGCCUGGCCUA-D17; Figure 4A), with only two base changes from the primordial sequence and no deleted bases relative to tRNAPri. The 5′-As* sequence GGACG varies in only a single base from the typical primordial sequence GGGCG. The anticodon stem matches tRNAPri in 4 of 5 bp. The V loop sequence C_GAC matches the primordial sequence CCGCC in 3 of 5 positions. The T stem–loop–stem matches tRNAPri at every position. As expected, human tRNAGly is more innovated from tRNAPri, but tRNAGly (CCC) in humans is so similar that it might be monophyletic with tRNAGly (CCC) in Archaea such as P. furiosus.
In P. furiosus, only the 34U anticodon loop for tRNAGly appears to be modified [42]. The 34cnm5U modification is initiated by Elp3, which is a first enzyme that is as ancient as the genetic code. A 34U modified at the 5-carbon position is expected to limit superwobbling. Unmodified 34U can read the codon wobble 3-A,G,C and U. In mitochondria, superwobbling is utilized in 4-codon boxes to shrink the size of the tRNAome [61,62,63]. A single unmodified 34U tRNA can read an entire 4-codon box. Because glycine is in a 4-codon box, unmodified 34U might be tolerated, but the 34cnm5U modification would limit reading to codons 3-A and 3-G.
Anticodon loops 32C-38A form a hydrogen bond [60]. A 32Y-38Y (Y stands for pseudouridine), 32Um-38U, or 32C-38m5C interaction would be expected to change the dynamics of the loop.
Glycine is the smallest and most flexible amino acid. Steric hindrance of larger amino acids may be a mechanism by which GlyRS-IIA limits misincorporation of non-cognate amino acids. In more innovated Bacteria (i.e., Escherichia coli), GlyRS-IID substitutes for GlyRS-IIA. GlyRS-IIA is the more ancient enzyme and appears to be the root of both the class II and class I AARS lineages [16,18,64].
The genetic code is hypothesized to have initially evolved to synthesize polyglycine, making tRNAGly the first tRNA, and a primitive GlyRS-IIA appears to be the founding AARS, as indicated by sequence alignment. In pre-life, single-stranded RNA may have been stabilized by methylation at the 2′-O [43]. This modification would render RNA resistant to ribozyme ribonucleases and base hydrolysis. If the 2′-O were modified in pre-life (i.e., by methylation), this might explain why GlyRS-IIA initially evolved to utilize the tRNA-76 ribose 3′-O to attach glycine.
4.3. ValRS-IA
The founding class I AARS appears to be a primitive version of ValRS-IA [16,17,18,64]. All class I AARSs appear to radiate from this root. A primitive ValRS-IA was derived from a primitive GlyRS-IIA by attachment of an N-terminal sequence that redirected to the distinct class I fold. In ancient Archaea, ValRS-IA and IleRS-IA have two Zn motifs: one in the added N-terminal segment, and one in the segment that is homologous to the N-terminal Zn motif in GlyRS-IIA. The added N-terminal Zn motif to form class I AARSs generates the class I fold. It appears that early folding of class II and class I AARSs was highly dependent on these Zn motifs. As evolution progressed, the Zn motifs were sometimes replaced by other folding determinants. Because AARSs are the first proteins (coevolved with the genetic code), early folding mechanisms dependent on Zn indicate early entry of cysteine into the code.
Glycine, alanine, aspartic acid and valine (GADV) are proposed to be the first encoded amino acids [65,66,67,68]. GADV (the four simplest amino acids) are located at the 4th row of the code (tRNA-36C). The genetic code appears to have been sectored primarily by code columns. It is hypothesized that the evolution of columns commenced by filling the code with GADV in the favored row 4 and, perhaps, expanding into other rows. It is hypothesized that earlier encoded amino acids occupied larger segments of the code that were then invaded by incoming amino acids. Amino acids that were added early subsequently retreated to occupy the most favored sectors of the code (i.e., tRNA-36C). Glycine is located at code column 4 (tRNA-35C). Alanine is located at code column 2 (tRNA-35G). Aspartic acid is located at code column 3 (tRNA-35U). Valine is located at code column 1 (tRNA-35A). It is hypothesized that row 4 (tRNA-36C) and column 4 (tRNA-35C) were the most favored in establishing the code. As the first encoded amino acid, glycine occupies the most favored row and column in the code. The genetic code is a highly ordered assembly.
Valine appears to be the founding amino acid for the assembly of column 1 of the code (tRNA-35A). Assembly of column 1 can be considered from the points of view of similar amino acids and homologous AARS enzymes. As an order of assembly, in column 1, we posit that V (row 4) evolved to L (rows 1 and 2), which evolved to I (row 3), which evolved to M (row 3). F appears to be a later addition to column 1, row 1, in the disfavored row 1. According to closely homologous AARS enzymes, consider the following order of evolution: ValRS-IA evolved to LeuRS-IA, which evolved to IleRS-IA, which evolved to MetRS-IA. The entry of phenylalanine and PheRS-IIC will be discussed below. The disfavored row 1 (tRNA-36A) appears to fill last and is a separate case. Sequence preference for rows appears to follow the following order: C (row 4; tRNA-36C) was favored over G (row 2; tRNA-36G), which was favored over U (row 3; tRNA-36U), which was strongly favored over A (row 1; tRNA-36A). In row 3, it appears that Met invaded an Ile 4-codon sector, eliminating the UAU anticodon and inducing differential tRNA-34 modifications of CAU to discriminate Ile (GAU and agm2CAU) (agm2C for agmatidine) and Met (CAU (initiator) and CmAU (elongator). Modification of the 2-carbon of C (agm2C) (Ile) slightly resembles G (Ile) and discriminates from Met (2-carbon C=O).
The ValRS-IA of Thermus thermophilus bound to tRNAVal (CAC) is shown in Figure 5 [69]. ValRS-IA functions as an α1-monomer. Because ValRS-IA is a class I AARS, the aminoacylating active site is at the C-terminal ends of a set of parallel β-sheets. The arrangement of parallel β-sheets has been described as a Rossmann fold, but, undoubtedly, the aminoacylating active site arrangement of ValRS-IA and other class I AARSs is genetically unrelated to Rossmann fold proteins. The aminoacylating active site can also be identified by the binding of VAA, a non-reactive Val-AMP analogue. 73-ACCA-76 is located at the separate editing active site that removes non-cognate amino acids from tRNAVal. Non-cognate homocysteine, serine, alanine and isoleucine can be removed by the separate proofreading (editing) active site after attachment to tRNAVal. Reactions within the ValRS-IA aminoacylating active site limit non-cognate threonine, α-aminobutyric acid, cysteine and norvaline attachments to tRNAVal. As a small, hydrophobic amino acid, valine has little chemical character, so editing reactions both before and after tRNAVal attachment are important to limit inaccurate translation [36,37].
Figure 5.
ValRS-IA-tRNAVal (CAC) from T. thermophilus. The color scheme used in this image is the same as that in Figure 3. Non-cognate amino acids that are blocked from incorporation within the aminoacylating active site are indicated in red. Non-cognate amino acids that are removed from tRNAVal after attachment within the separate proofreading (editing) active site are indicated in black.
tRNAVal is shown in Figure 6. In Figure 6A, P. furiosus tRNAVal (CAC) is shown. The acceptor stem matches the primordial tRNA sequence in 4 of 7 bp. The D loop has the sequence D1-UGGUCUAGACUGG_UUA-D17 and matches the primordial sequence in all but five positions, with a single base deleted from tRNAPri. The anticodon stem matches the primordial sequence in 3 of 5 bp. The T stem–loop–stem matches the primordial sequence in all but 1 stem bp. In P. furiosus, tRNAVal is similar to tRNAAla in sequence, indicating that Val and Ala may have entered the code at about the same time in evolution. GADV are proposed to have been the first four encoded amino acids [65,66,67]. The T. thermophilus tRNAVal (CAC) is more derived from the root sequence, as expected (Figure 6B). Bacteria are more derived from LUCA than Archaea.
Figure 6.
tRNAVal (CAC). (A) From P. furiosus (Pfu). The yellow arrow indicates an unmodified 34U. (B) From T. thermophilus (Tth). The yellow arrow indicates a 32C-38C arrangement, which may affect loop dynamics. Hvo indicates Haloferax volcanii. Eco indicates E. coli. Sgr indicates Streptomyces griseus.
tRNAVal utilizes CAC, UAC and GAC anticodons. In P. furiosus and H. volcanii, none of these anticodon loops appear to be modified [42,70]. Unmodified UAC would be predicted to read codon 3-A,G,C, and U by superwobbling [61,62,63]. Because valine occupies a 4-codon sector, such promiscuity would be tolerated and might be selected. The tRNAVal (CAC) anticodon loop is substantially unwound by ValRS-IA, indicating allosteric effects of ValRS-IA-tRNAVal binding. Allostery is likely important in selectively directing the tRNAVal 3′-end to the aminoacylating or separate proofreading active site. Because ValRS-IA makes elbow contacts, these might leverage allosteric effects mediated through tRNAVal. It appears that 35-AC-36 and 38C make the strongest contact with ValRS-IA, as might be expected. In P. furiosus, 38A is present, rather than 38C, as in E. coli. In P. furiosus, tRNAVal and tRNAAla are similar in sequence, consistent with the GADV hypothesis that indicates that valine and alanine were two of the first four encoded amino acids.
4.4. IleRS-IA
IleRS-IA-tRNAIle (GUA) from the Bacterium Staphylococcus aureus is shown in Figure 7 [71]. By structure and sequence, IleRS-IA is closely related to ValRS-IA. Both enzymes function as α1-monomers. Compared to ValRS-IA, IleRS-IA appears to make no elbow contacts and unwinds the tRNAIle (GAU and k2CAU (k2C stands for lysidine)) (in Archaea, GAU and agm2CAU (agm2C stands for agmatidine)) anticodon loop to a somewhat lesser extent than ValRS-IA. Elbow contacts by an AARS may be used to leverage allosteric effects transmitted through a cognate tRNA to the aminoacylating or editing active sites. Apparently, k2C and agm2C can partly mimic G for IleRS-IA binding. Minimally, k2C and agm2C are better G mimics than 2-C=O, as in unmodified C at the 2-carbon position (i.e., Met anticodons). The agm2C modification is added by a first protein (tRNAIle2 2-agmatinylcytidine synthase). As noted above, the UAU anticodon is rarely encoded in Archaea. When the anticodon UAU is encoded, it is also modified to agm2CAU to encode Ile. Also, MetRS-IA utilizes CmAU (elongator) and CAU (initiator) anticodons. Because Ile anticodons have tRNA-36U, tRNA-37 is t6A or hn6A. It is hypothesized that modified tRNA-37A (i.e., t6A and hn6A) may have evolved to suppress wobbling at tRNA-36U.
Figure 7.
IleRS-IA-tRNAIle (GAU) from S. aureus. IleRS-IA has a separate proofreading active site that removes non-cognate homocysteine and cysteine attached to tRNAIle (black text). Non-cognate valine, norvaline and α-aminobutyrate are blocked from attachment to tRNAIle through reactions at the aminoacylating active site (red text) [36,37]. MRC binds the aminoacylating active site.
Because IleRS-IA is a class I AARS, the aminoacylating active site is at the C-terminal ends of a set of parallel β-sheets. The reaction intermediate analogue MRC binds at this site. 73-AC(CA) is not fully resolved in the structure. IleRS-IA has a separate editing active site that can remove non-cognate homocysteine and cysteine from tRNAIle. The IleRS-IA aminoacylating active site limits incorporation of valine, norvaline and α-aminobutyric acid. Similar to valine, isoleucine is a somewhat featureless amino acid that requires editing functions to suppress translation errors.
tRNAIle (GAU) is shown in Figure 8. In P. furiosus, tRNAIle (GAU) matches the primordial tRNA sequence in 4 bp within the acceptor stem. The D loop has the sequence D1-UGGCUCAGCCUGG_UCA-D17, matching the primordial tRNA in all but 6 positions, with a single base deleted from tRNAPri. The 5′-As* sequence is GAGCG versus GGGCG in the primordial tRNA [15,31]. The anticodon stem matches the primordial tRNA in 3 bp. The T stem–loop–stem matches 4 of 5 bp in the stem and all but two bases in the loop compared to tRNAPri. In S. aureus, the tRNAIle is more innovated.
Figure 8.
tRNAIle (GAU). (A) P. furiosus tRNAIle (GAU). (B) S. aureus tRNAIle (GAU). Modifications to the anticodon loop are as expected. Mca stands for Mycoplasma capricolum.
4.5. MetRS-IA
MetRS-IA-tRNAMet (CAU) from Aquifex aeolicus is shown in Figure 9 [72]. In sequence and structure, MetRS-IA is very similar to IleRS-IA and ValRS-IA. MetRS-IA functions as an α1-monomer. There are no tRNAMet (CAU) elbow contacts. Unwinding of the anticodon loop to expose CmAU or CAU is slight, perhaps because only a single CAU anticodon, lacking the agm2C or k2C modifications to encode isoleucine, is recognized. Because of deletion, MetRS-IA lacks an editing active site, but the aminoacylating active site of MetRS-IA limits incorporation of homocysteine. The aminoacylating active site is identified by MSP binding and a set of parallel β-sheets. 73-A(CCA) is partly resolved in the structure, possibly indicating allosteric effects (i.e., of MSP analogue binding).
Figure 9.
MetRS-IA-tRNAMet (CAU) from A. aeolicus.
It is hypothesized that methionine entered the genetic code by invading a 4-codon isoleucine sector [16]. This invasion resulted in the suppression of the UAU anticodon that would cause ambiguity between methionine and isoleucine coding. Methionine adopted the CmAU (elongator) and CAU (initiator) anticodons (Figure 8). In Archaea, to support a 3-codon box, isoleucine utilized the agm2CAU and GAU anticodons. In Bacteria, the k2CAU and GAU anticodons are utilized for isoleucine.
The P. furiosus elongator tRNAMet (CmAU) is a close match to the primordial tRNA sequence (Figure 10A). The acceptor stem matches in 5 bp out of 7. The D loop sequence D1-UAGCUUAGCCUGG_UCA-D17 matches in all but four positions, with a single base deleted from tRNAPri. The anticodon stem matches tRNAPri in 4 out of 5 bp. The T stem matches in 4 of 5 bp, and the T loop matches tRNAPri in all but two bases. The initiator tRNAMet (CAU) anticodon stem matches tRNAPri in 5 out of 5 bp (Figure 10B). The initiator tRNAMet (CAU) anticodon loop is unmodified. The 1A-72U pair in the initiator tRNAMet (CAU) is unusual (Figure 10B) and more readily melted than 1G=72C, which is typical. The tRNAIle (agm2CAU) sequence is similar to the elongator tRNAMet (CmAU) (Figure 10C), indicating that tRNAMet may have been derived from a primitive tRNAIle.
Figure 10.
tRNAMet (CmAU and CAU) and tRNAIle (agm2CAU). (A) P. furiosus elongator tRNAMet (CmAU). (B) P. furiosus initiator tRNAMet (CAU). (C) P. furiosus tRNAIle (agm2CAU).
4.6. LeuRS-IA
tRNALeu is a type II tRNA with a longer V arm (initially, 14 nt; a 3′-acceptor stem ligated to a 5′-acceptor stem; initially, CCGCCGC_GCGGCGG) (Figure 1). In Archaea, tRNALeu and tRNASer are type II tRNAs. Both leucine and serine are in 6-codon boxes. LeuRS-IA and SerRS-IIA utilize the longer tRNA V arms as major determinants for cognate tRNA charging rather than the anticodon loops, which LeuRS-IA and SerRS-IIA do not contact. Arginine is also in a 6-codon box, but tRNAArg is a type I tRNA. Rather than using a longer type II tRNAArg V arm, ArgRS-IA uses enhanced anticodon loop unwinding to expose bases for recognition. In Bacteria, tRNATyr is a type II tRNA, but the type II tRNATyr (GUA) and its recognition by bacterial TyrRS-IC are bacterial innovations.
The LeuRS-IA-tRNALeu (CAA) of Pyrococcus horikoshii is shown in Figure 11 [73,74]. By structure and sequence, LeuRS-IA is closely related to ValRS-IA, IleRS-IA and MetRS-IA. LeuRS-IA functions as an α1-monomer. Archaeal and bacterial LeuRS-IA have different modes of contacting the type II V arm of their cognate tRNALeu. In Archaea (but not in Bacteria), LeuRS-IA contacts the end loop of the V arm at the typical sequence UAG [11]. In both Archaea and Bacteria, the tRNALeu (CAA) anticodon loop is not contacted by LeuRS-IA. Of all the AARSs, only LeuRS-IA, SerRS-IIA and AlaRS-IID lack anticodon loop contacts for cognate tRNA recognition. A C-terminal region of archaeal LeuRS-IA contacts the elbow. The aminoacylating active site is identified by parallel β-sheets and 73-ACCA-76. The tRNALeu (CAA) 73-ACCA-76 is in the catalytic “hairpin” conformation for class I AARS enzymes, curving down into the aminoacylating active site. LeuRS-IA has a separate editing active site that removes non-cognate valine, α-aminobutyrate and methionine from tRNALeu. The LeuRS-IA aminoacylating active site limits non-cognate norvaline, homocysteine, γ-OH- leucine and isoleucine incorporation [36]. Leucine is a hydrophobic amino acid with little chemical character and so requires editing to maintain translational accuracy.
Figure 11.
LeuRS-IA-tRNALeu (CAA) of P. horikoshii. LeuRS-IA has a separate editing active site that removes non-cognate valine, α-aminobutyric acid and methionine from tRNALeu (black text). The aminoacylating active site blocks norvaline, homocysteine, γ-hydroxy leucine and isoleucine incorporation (red text). In P. horikoshii, tRNALeu is a type II tRNA with a 14 nt V arm.
Figure 12 shows a comparison of a primordial type II tRNA (Figure 12A) and archaeal tRNALeu (CAA) (Figure 12B). A bacterial tRNALeu is also shown (Figure 12C). In P. horikoshii, the tRNALeu (CAA) type II V arm is 14 nt, as in type II tRNAPri. Most archaeal tRNALeu type II V arms are 14 nt in length, the primordial length. In Archaea, the tRNALeu type II V arm is a major determinant for cognate tRNALeu charging with leucine. The trajectory of the type II V arm is different than that for tRNASer, with two bases separating the 3′-V arm stem and the Levitt base for tRNALeu and with one base separating the 3′-V arm stem and the Levitt base for tRNASer, in Archaea. For archaeal tRNALeu, the V arm end loop includes the UAG consensus to bind LeuRS-IA (Figure 12B). The V arm end loop contact is not utilized by bacterial LeuRS-IA (Figure 12C). tRNALeu utilizes CAA, UAA, CAG, GAG and UAG anticodons. Because of superwobbling, unmodified UAA anticodons, in principle, might utilize a phenylalanine UUU or UUC codon, substituting leucine for phenylalanine. We are not certain what limits miscoding in this case. The anticodon cnm5UAG is in a 4-codon box, but it has the 5-carbon U modification that should suppress superwobbling.
Figure 12.
Comparison of type II tRNAPri and tRNALeu. (A) Type II tRNAPri. In the anticodon, B indicates G, C or U, but not A. (B) P. horikoshii tRNALeu (CAA). In Archaea, two bases separate the 3′-V arm stem from the Levitt base (V14C), giving the trajectory of the V arm. The V6-UAG-V8 consensus to bind LeuRS-IA is indicated. In principle, the unmodified CU/UAAGA anticodon loop could cause leucine substitution for phenylalanine by superwobbling. (C) Bacterial T. thermophilus tRNALeu (CAA) has a different trajectory of the V arm and lacks the V arm end loop UAG consensus. In Bacteria, the trajectory of the V arm is given by the number of unpaired bases (one) separating the 3′-V arm stem and the Levitt base V15U.
The P. horikoshii tRNALeu (CAA) acceptor stem matches tRNAPri in 6 out of 7 bp (Figure 12B). The D loop has the sequence D1-UUGCCGAGCCUGGUCAA-D17, matching the tRNAPri sequence in all but four positions and including no deletions compared to tRNAPri. The 5′-As* sequence is AGGCG, matching the typical tRNAPri GGGCG in all but one position. Two bases separate the type II 3′-V arm stem and the Levitt base. The V arm end loop has the sequence V5-GUAG-V8 that includes the V6-UAG-V8 consensus to bind LeuRS-IA (Figure 11 and Figure 12B). The T stem matches tRNAPri all but 1 bp. The T loop matches in all but one base.
4.7. SerRS-IIA
At the base of code evolution, only tRNALeu and tRNASer were selected to be type II tRNAs. The number of amino acids that are type II in an organism or domain is determined by the allowed trajectories of the V arm. In Archaea, the number of trajectories is two. In Bacteria, the number is three. In Archaea and Bacteria, the trajectories of type II V arms are different for tRNALeu and tRNASer. SerRS-IIA is a very different AARS compared to LeuRS-IA. From sequences, however, it appears that type II tRNASer may have been derived from type II tRNALeu. To maintain translational accuracy, the type II V arm of tRNASer is recognized very differently than the type II V arm of tRNALeu. The type II tRNASer V arm has a different trajectory from its tRNA body compared to the tRNALeu V arm [11]. The trajectory of the type II V arm depends on the number of unpaired bases between the 3′-V arm stem and the Levitt reverse Watson–Crick base pair (i.e., D8G=V14C). In Archaea, for tRNALeu, the number is two unpaired bases (in Bacteria, the number is one unpaired base for the tRNALeu type II V arm). For tRNASer, the number in Archaea is one unpaired base (in Bacteria, the number is zero unpaired bases for the tRNASer type II V arm).
The human SerRS-IIA-tRNASer (UGA) is shown in Figure 13 [75]. SerRS-IIA has an N-terminal helix hairpin that lies across the type II V arm and interacts with the tRNASer elbow. SerRS-IIA functions as an α2-dimer. The aminoacylating active site and the helix hairpin for a single tRNASer are on separate α-subunits. As noted above, no contact is made by SerRS-IIA with the tRNASer anticodon loop. Instead, SerRS-IIA recognizes the type II V arm as a major determinant. The aminoacylating active site is on the surface of antiparallel β-sheets. The SerRS-IIA aminoacylating active site limits non-cognate attachment of threonine, cysteine and alanine to tRNASer.
Figure 13.
SerRS-IIA-tRNASer (UGA) from H. sapiens. The full α2-dimer is shown. One α-subunit is colored white; one is colored wheat. β-sheets are colored light pink. HH indicates the N-terminal helix hairpin that binds the type II V arm stems and the elbow of tRNASer.
tRNASer anticodon loop modifications are interesting and slightly unanticipated (Figure 14). Generally, tRNA-36U is associated with a modified tRNA-37A, as is observed. Generally, tRNA-36A is associated with a modified tRNA-37G, but GGA in H. volcanii is followed by unmodified 37A. In P. furiosus, UGA is unmodified at 34U, but UGA is in a 4-codon box, so implied superwobbling would not cause miscoding. In the P. furiosus tRNASer (UGA), the acceptor stem matches that of tRNAPri in 4 of 7 bp. In H. sapiens, tRNASer (UGA) matches the tRNAPri acceptor stem in 5 of 7 bp (Figure 14B). In the D loop, tRNASer (UGA) of P. furiosus has two perfect UAGCC repeats, consistent with the three 31 nt tRNA evolution theorem (Figure 1 and Figure 2). The D loop sequence is D1-UAGCCUAGCCUGG__UA-D17, matching the primordial tRNA sequence in all but two deleted positions. The 5′-As* sequence AGGCG matches tRNAPri in 4 of 5 positions. The T stem–loop–stem of P. furiosus matches tRNAPri in all but 1 stem bp.
Figure 14.
tRNASer (UGA). (A) P. furiosus tRNASer (UGA). In Archaea, one base separates the 3′-V stem and the Levitt base (in this case, V15C). (B) H. sapiens tRNASer (UGA). (C) T. thermophilus tRNASer (UGA). In Bacteria, typically, zero bases separate the 3′-V arm stem and the Levitt base (V19C).
In Bacteria, trajectories of tRNA type II V arms are different than in Archaea (compare Figure 14A,C). In Archaea, tRNALeu has two unpaired bases (Figure 12B), and tRNASer has one unpaired base (Figure 14A) separating the 3′-V arm stem and the Levitt base. In Bacteria, tRNATyr has two unpaired bases (see below), tRNALeu has one unpaired base (Figure 12C), and tRNASer has zero unpaired bases (Figure 14C) separating the type II V arm 3′-stem and the Levitt base [11]. Differences in type II V arm trajectories cause different modes of cognate AARS-tRNA recognition and are expected to limit horizontal type II tRNA gene transfers between Bacteria and Archaea.
Serine is the only amino acid that is located at two separate columns of the genetic code. It is hypothesized that serine jumped from column 2 to column 4 during code evolution. Being a type II tRNA lacking anticodon recognition by SerRS-IIA probably facilitated jumping. We suggest that jumping of serine in code evolution may correlate with the introduction of cysteine into the code (see below).
4.8. ArgRS-IA
The ArgRS-IA-tRNAArg (ICG) of S. cerevisiae is shown in Figure 15 [76]. As noted above, although arginine is in a 6-codon box, tRNAArg is a type I tRNA. Compared to LeuRS-IA and SerRS-IIA, ArgRS-IA utilizes the alternate strategy of increased unwinding of the type I tRNAArg anticodon loop to expose additional bases for cognate recognition. Three amino acids probably could not occupy 6-codon sectors in the code using the strategy that evolved for arginine, explaining why tRNALeu and tRNASer evolved to substitute recognition of longer type II V arms, rather than utilizing anticodon loop determinant contacts. For tRNAArg, the 34-ICGAA-38 sequence is substantially unwound. 35C, 37A and 38A appear to make strong contacts with ArgRS-IA. ArgRS-IA makes substantial elbow contacts that may help leverage anticodon loop opening through allosteric effects. 73-GCCA-76 is in the catalytic “hairpin” conformation for a class I AARS. Arginine binds at the aminoacylating active site. As expected for a class I AARS, parallel β-sheets approach the aminoacylating active site.
Figure 15.
ArgRS-IA-tRNAArg (ICG) of S. cerevisiae.
In P. furiosus, the GCG anticodon would most closely correspond with the ICG anticodon in S. cerevisiae. Generally, when encoded ACG is modified to ICG by deamination, in Bacteria and Eukarya, the corresponding GCG anticodon is not utilized. Similarly, Archaea do not appear to utilize the 34A→I modification but use the GCG anticodon instead. It is notable that 34A is not utilized in Archaea and, for the most part, in Bacteria (Bacteria utilize ICG to encode arginine). The lack of anticodon wobble base discrimination (i.e., pyrimidine versus purine only) causes genetic code degeneracy.
The P. furiosus tRNAArg (GCG) sequence is of interest (Figure 16A). The acceptor stem matches tRNAPri in 5 of 7 bp. The D loop has the sequence D1-UGGCCUAGCCUGG_AUA-D17, which varies in only three positions from the primordial D loop sequence, with a single base deleted relative to tRNAPri. The 5′-As* sequence is GGGCG, which matches the primordial tRNA sequence (GGGCG, rearranged from GGCGG before LUCA). The V loop sequence AGGUC is typical. The T stem–loop–stem matches in all but 1 stem bp. The S. cerevisiae tRNAArg (ICG) sequence is more derived from the root sequence, as expected (Figure 16B). In P. furiosus, the [cnm5U]CU anticodon is followed by an unmodified A, which is unexpected. Perhaps the cnm5U modification, or another feature, helps to compensate. Generally, 36U is followed by a modified 37A, as in CCU[t6A]. CGA and CGG codons are rare in Eukarya. The corresponding UCG and CCG anticodons are also rare in Eukarya.
Figure 16.
tRNAArg. (A) P. furiosus tRNAArg (GCG). (B) S. cerevisiae tRNAArg (ICG). Yellow arrows indicate features of possible interest. Bta indicates Bos taurus.
Arginine is an amino acid with significantly discriminating characteristics. Arginine is positively charged and much stiffer than lysine. Also, arginine has significant hydrogen bonding potential. These characteristics discriminate arginine from lysine, which is much more flexible and has a more concentrated positive charge. We consider the idea that the first encoded positively charged amino acid may have been ornithine [77]. Ornithine can be converted to arginine in two enzymatic steps, consistent with the notion that tRNA-linked chemistry may have contributed to the encoding of arginine and lysine. Ornithine can be converted to lysine in some Archaea and Bacteria [78,79,80]. Consistent with this idea, in P. furiosus, tRNAArg and tRNALys are similar in sequence.
4.9. CysRS-IA
In sequence and structure, CysRS-IA (Figure 17) is closely related to ArgRS-IA. Cysteine and arginine are located at column 4 of the genetic code, indicating evolution in code columns. Because CysRS-IA recognizes only the GCA anticodon, 34-GCA-36 can be recognized by the anticodon-binding domain [81]. In the structure, 73-UCCA-76 enters the CysRS-IA aminoacylating active site in the “hairpin” catalytic conformation. The discriminator 73U is rarely used. In P. furiosus, 73U is only found in tRNACys (1 tRNACys) and tRNAThr (3 tRNAThr). The aminoacylating active site of CysRS-IA is at the C-terminal ends of a set of parallel β-sheets, as expected. Cysteine is important for Zn binding. CysRS-IA utilizes Zn binding to bind and orient cysteine in its aminoacylating active site.
Figure 17.
CysRS-IA-tRNACys (GCA) from H. sapiens.
In P. furiosus, tRNACys (GCA) is of interest (Figure 18). The acceptor stem matches tRNAPri in 4 of 7 bp. The D loop sequence is D1-UAGCCUAG__AGG__CC-D17, matching the primordial tRNA sequence in the first eight positions. The 5′-As* sequence is AGGCG, matching tRNAPri GGGCG in 4 of 5 positions. The anticodon stem matches tRNAPri in 2 bp. The T stem–loop–stem matches the primordial tRNA sequence exactly. Interestingly, for 34-GCAG, the anticipated modified 37G is not present. The P. furiosus tRNACys (GCA) is very similar to the human tRNACys (GCA), possibly indicating a monophyletic relationship between tRNACys in Archaea and Eukarya.
Figure 18.
tRNACys (GCA). (A) P. furiosus tRNACys (GCA). (B) H. sapiens tRNACys (GCA).
Cysteine may have first entered the genetic code by tRNA-linked chemistry. There are two mechanisms by which Ser-tRNACys might be converted to Cys-tRNACys. pSer-tRNACys can be converted to Cys-tRNACys by pSer-tRNACys→Cys-tRNACys cysteine synthase (pSer stands for o-phosphoserine) [82]. Serine can also be acetylated and then converted to cysteine with H2S. It is hypothesized that serine jumping to column 4 of the genetic code from column 2 may have resulted from such a tRNA-linked mechanism. Cysteine ended up in column 4, row 1. Most row 1 amino acids (i.e., Phe, Tyr, Trp and Cys) appear to be among the last encoded. Cysteine, however, was important for Zn binding and protein folding (i.e., for AARS enzymes), indicating that cysteine must have entered the code earlier, before landing in its row 1 location [15,16,17]. Serine may have occupied a larger sector of column 2 (i.e., rows 2 and 3). Serine or serine converted to cysteine may have jumped to row 4 (i.e., from column 2, row 3A (GGU) to column 4, row 3A (GCU)). Serine converted to cysteine could have shifted to column 4, row 1 (GCU→GCA), and CysRS-IA could have evolved from a primitive ArgRS-IA. In this manner, cysteine could have entered the code early with tRNA-linked synthesis but found its eventual position late. The GCU within a disrupted arginine sector would then have reverted to a serine anticodon. In column 2 of the code, Thr and Pro displaced Ser to its location in column 2, row 1. SerRS-IIA recognizes a type II tRNASer, without anticodon recognition. A simple change in the tRNASer anticodon might, therefore, be sufficient to achieve the jump from column 2 to column 4, but the change in the anticodon would not affect SerRS-IIA recognition. Serine split what was probably an enlarged arginine sector by jumping into column 4. The jumping of serine from column 2 to column 4 was some of the only chaos in generating the standard code.
4.10. ThrRS-IIA
By structure and sequence, ThrRS-IIA (Figure 19) is very similar to SerRS-IIA and GlyRS-IIA. As a class II AARS, ThrRS-IIA has its aminoacylating active site on a surface of antiparallel β-sheets [83]. 73-ACCA-76 penetrates the aminoacylating active site, where AMP binds. In P. furiosus, the discriminator base is 73U rather than 73A, as in E. coli. ThrRS-IIA has a separate editing active site that removes non-cognate β-hydroxynorvaline and valine from tRNAThr. The aminoacylating active site of ThrRS-IIA limits non-cognate attachment of serine to tRNAThr. The anticodon binding region of ThrRS-IIA binds 35-GU[m6t6A]-37 (in E. coli).
Figure 19.
The ThrRS-IIA-tRNAThr (CGU) from E. coli. A* is 37m6t6A.
tRNAThr (CGU) is shown in Figure 20. In P. furiosus, the acceptor stem matches the primordial tRNA sequence in 4 of 7 bp. The D loop has the sequence UAGCCUAGCCUGG__UG, which matches the primordial tRNA sequence in the first 13 positions exactly and in all but three positions, two of which are deletions. The 5′-As* sequence is GGGCG, which is typical. The anticodon stem matches tRNAPri in 2 bp. The V loop sequence AGGUC is typical. In P. furiosus, the T stem–loop–stem matches the primordial tRNA sequence exactly. In Archaea and Bacteria, 36U is generally associated with a modified 37A (i.e., t6A or hn6A), as is observed. The modification of 37A may aid in accurate tRNAThr charging. Also, the modification of 37A may help to support the reading of 36U anticodons. In P. furiosus, tRNAThr resembles tRNASer in sequence, except for the V loop region (tRNAThr is type I; tRNASer is type II) [21].
Figure 20.
tRNAThr (CGU). (A) P. furiosus tRNAThr (CGU). (B) E. coli tRNAThr (CGU).
4.11. ProRS-IIA
ProRS-IIA-tRNAPro (CGG) is shown in Figure 21 [84]. ProRS-IIA is closely related to GlyRS-IIA, SerRS-IIA and ThrRS-IIA in sequence and structure. The aminoacylating active site is on the surface of antiparallel β-sheets. The reaction intermediate analogue P5A is located at the aminoacylating active site. In the structure, 70-(CCGACCA)-76 is disordered. The anticodon loop 34-CGGG-37 is substantially unwound, indicating allosteric effects, which may also be indicated by a disorder of the tRNA 3′-end. As expected, anticodon loop 35-GGG-37 makes the strongest ProRS-IIA binding contacts.
Figure 21.
ProRS-IIA-tRNAPro (CGG) from T. thermophilus. P5A is a reaction intermediate analogue that binds in the aminoacylating active site.
T. thermophilus and P. furiosus ProRS-IIA lack a separate editing active site that is, however, present in more derived Bacteria, such as E. coli. The aminoacylating active site of ProRS-IIA limits non-cognate alanine attachment to tRNAPro.
The P. furiosus tRNAPro (CGG) matches the acceptor stem of the primordial tRNA in 5 of 7 positions (Figure 22). tRNAPro (CGG) has the D loop sequence D1-UAGGGUAGCUUGGCCCA-D17, which matches the primordial D loop in all but four positions and has no deleted bases relative to tRNAPri. The anticodon stem matches the primordial sequence in 3 of 5 bp. The V loop sequence C_GAC matches the primordial sequence CCGCC in three positions. The T stem–loop–stem sequence matches the primordial tRNA sequence exactly. Proline is in a 4-codon box and so utilizes CGG, GGG and UGG anticodons. Modifications are as expected, except that H. volcanii 34U is unmodified. Because proline occupies a 4-codon box, superwobbling need not necessarily be suppressed. By contrast, P. furiosus has the 34cnm5U modification to suppress superwobbling.
Figure 22.
tRNAPro (CGG). (A) P. furiosus tRNAPro (CGG). (B) T. thermophilus tRNAPro (CGG). Sty indicates Salmonella typhimurium.
4.12. AspRS-IIB
A primitive AspRS may be the founding AARS in column 3 of the code (tRNA-35U). The AspRS-IIB-tRNAAsp (GUC) of S. cerevisiae is shown in Figure 23 [85]. Column 3 is the most innovated column, dividing into 2-codon sectors. For tRNAAsp, only the GUC anticodon is utilized. The anticodon loop is substantially unwound, exposing 33-UGUCG-38 to make AspRS-IIB contacts. Anticodon loop unwinding indicates allosteric effects communicated to the AspRS-IIB aminoacylating active site through tRNAAsp (GUC). 73-GCCA-76 enters the aminoacylating active site, where ATP binds. As expected, a surface of antiparallel β-sheets is present at the aminoacylating active site.
Figure 23.
AspRS-IIB-tRNAAsp (GUC) from S. cerevisiae.
In Figure 24, part of a T. thermophilus transamidosome is shown [86]. This image provides a partial approximation of the mechanism by which asparagine and glutamine may have first entered the genetic code [87]. The α-subunit of the amidotransferase that modifies Asp-tRNAAsn to Asn-tRNAAsn (GUU) is homologous to an archaeal amidotransferase. Both asparagine and glutamine initially entered the code by tRNA-linked amidotransferase reactions. The tRNAAsn (GUU) anticodon loop is substantially unwound. 33-UGUUA-37 interacts with the AspRS-IIB anticodon-binding domain.
Figure 24.
tRNA-linked chemistry. A detail of the T. thermophilus transamidosome is shown.
Asp and Glu are closely related negatively charged amino acids that are located at column 3, row 4. Asp has a shorter side chain than Glu and so generally forms better ion pair allosteric switches, particularly with Arg, which is stiffer than Lys. In P. furiosus, tRNAAsp, tRNAGlu and tRNAGln are all closely related tRNAs by sequence [21]. In P. furiosus, tRNAAsn is most similar to tRNATyr. The deviation of tRNAAsn from tRNAAsp supports the discrimination of chemically similar amino acids in coding.
In Figure 25, tRNAAsp (GUC) and tRNAAsn (GUU) are compared. It is hypothesized that tRNAAsn (GUU) evolved from tRNAAsp (GUC) and that AsnRS-IIB evolved from AspRS-IIB by duplication and divergence. The acceptor stem of the P. furiosus tRNAAsp (GUC) matches the primordial sequence in 4 of 7 bp. The D loop has the sequence D1-UGGUGUAGCCCGGCCUA-D17, which differs in four positions from the primordial tRNA but includes no deletions relative to tRNAPri. The D loop sequence D6-UAGCCCGGCCUA-D17 has only a single mismatch compared with tRNAPri. The anticodon stem matches tRNAPri in 3 bp. The T stem–loop–stem exactly matches the primordial tRNA sequence. The tRNAAsp (GUC) anticodon loop has a 32C-38C arrangement, which should alter the dynamics of the loop relative to 32C-38A, which is most common and primordial. The P. furiosus tRNAAsn (GUU) matches the acceptor stem of the primordial tRNA in 5 of 7 bp. The D loop has the sequence D1-UAGCUUAG_CUGG__UG-D17, with three bases deleted from the primordial tRNA D loop but matching in sequence in all but five positions. The 5′-As* sequence GAGCG matches the primordial sequence GGGCG in all but one base. The anticodon stem matches the primordial tRNA in 4 of 5 bp. The V loop sequence CGGUC matches tRNAPri CCGCC in 3 of 5 positions. The T stem–loop–stem matches in all but 1 stem bp and one loop base.
Figure 25.
tRNAAsp (GUC) and tRNAAsn (GUU). (A) P. furiosus tRNAAsp (GUC). (B) S. cerevisiae tRNAAsp (GUC). (C) P. furiosus tRNAAsn (GUU).
4.13. HisRS-IIA
Another column-3 amino acid is histidine. HisRS-IIA-tRNAHis (GUG) from T. thermophilus is shown in Figure 26 [88]. HisRS-IIA functions as an α2-dimer. As a class II AARS, HisRS-IIA has the aminoacylating active site on the surface of antiparallel β-sheets. AMP and histidine bind in the aminoacylating active site, and 73-CCCA-76 enters the aminoacylating active site. On the ribosome, 74-CC-75 must pair with a GG sequence in the peptidyl site (P-site) of the peptidyl transferase center to orient the peptide–tRNA. Having the sequence 73-CCCA-76 in a tRNA, therefore, might cause problems with orienting the growing peptide chain during translation. To block 73C pairing with the ribosome G, tRNAHis (GUG) is modified by the addition of GTP at the -1 position. The enzyme that catalyzes this reaction is tRNAHis (-1) GTP transferase. This enzyme appears to be a first protein, as old as the genetic code. Also, the 73C=(-1)GTP base pair is a unique discriminator for the cognate tRNAHis (GUG) charging with histidine. As is also observed for tRNAAsp (GUC) and tRNAAsn (GUU), the tRNAHis (GUG) anticodon loop is unwound, exposing 34-GUGG-37 to bind the HisRS-IIA anticodon-binding domain. It is hypothesized that AspRS-IIB was originally AspRS-IIA but diverged to suppress tRNA charging errors. In this case, HisRS-IIA would have been derived from an AspRS-IIA prior to divergence.
Figure 26.
HisRS-IIA-tRNAHis (GUG) from T. thermophilus.
In P. furiosus, the tRNAHis (GUG) acceptor stem differs in only 2 bp from the primordial tRNA sequence (Figure 27). The D loop of tRNAHis (GUG) has the sequence D1-UGGUGUAGCCUGG_UUA-D17, differing in five positions from the primordial tRNA sequence, with a single base deletion relative to tRNAPri. In P. furiosus, the anticodon stem matches tRNAPri in 2 bp. In T. thermophilus, the anticodon stem matches tRNAPri in 4 bp. In P. furiosus, the T stem–loop–stem exactly matches the primordial tRNA sequence.
Figure 27.
tRNAHis (GUG). (A) P. furiosus tRNAHis (GUG). (B) T. thermophilus tRNAHis (GUG). The yellow arrows indicate the unique (-1)GTP=73C discriminators that also may suppress misalignment of the P-site peptide–tRNA on the ribosome.
4.14. GluRS-IB
Column 3 of the genetic code is the most innovated column that encodes the most amino acids. It appears that column 3 may have been sectored into 2-codon boxes initially by splitting Asp and Glu into a striped pattern of Asp in A rows (row 2A, 3A and 4A) and Glu in B rows (row 2B, 3B and 4B). A and B rows represent wobble tRNA-34. tRNA-34G is the anticodon base of the A row. At the base of code evolution, wobble tRNA-34A is rarely or never used. tRNA-34C or 34U is the anticodon base of the B row. Note that related amino acids and AARS enzymes Asp, Asn and His, charged to their cognate tRNAs by related enzymes AspRS-IIB, AsnRS-IIB and HisRS-IIA, are located at rows 4A, 3A and 2A. It is likely that AspRS was initially AspRS-IIA, which evolved to AspRS-IIB to suppress translation errors. Glu, Lys and Gln are located at rows 4B, 3B and 2B. GluRS-IB and LysRS-IB (in Archaea) are closely related enzymes. GlnRS-IB was derived from GluRS-IB in Eukarya (~2.5 billion years ago) and then transferred to many prokaryotic species by horizontal gene transfers. At LUCA, GluRS-IB added glutamate to tRNAGln. Glu-tRNAGln was converted to Gln-tRNAGln by an amidotransferase. This is a similar tRNA-linked chemistry mechanism to that by which asparagine first entered the code [87,89,90,91].
GluRS-IB (Figure 28) [92] may be derived from a primitive ArgRS-IA by duplication and repurposing. In contrast to AspRS-IIB, which is a class II AARS and an α2-dimer, GluRS-IB is a typical class I AARS that functions as an α1-monomer. The GluRS-IB aminoacylating active site is at the C-terminal ends of a set of parallel β-sheets. 73-ACCA-76 penetrates to the aminoacylating active site in the catalytic hairpin conformation for class I AARSs. The non-reactive GOM synthetic reaction intermediate binds here. In contrast to AspRS-IIB and AsnRS-IIB, the anticodon loop of tRNAGlu is not substantially unwound by GluRS-IB. This difference and the difference of discriminator bases (73G (i.e., Asp) versus 73A (i.e., Glu)) may contribute to Asp versus Glu discrimination in cognate tRNA charging. 34-CUC-36 binds the anticodon-binding domain. Glutamate is a negatively charged amino acid with significant chemical character. No editing reactions are identified for GluRS-IB, consistent with the idea that glutamate is more readily discriminated by GluRS-IB than column 1 (Val, Leu, Ile, Met and Phe) and column 2 (Ala, Thr, Pro and Ser) amino acids that require cognate AARS enzymes that edit. Amino acids encoded in columns 3 and 4 have greater chemical character and less need for editing for error correction.
Figure 28.
GluRS-IB-tRNAGlu (CUC) from T. thermophilus.
tRNAGlu and tRNAGln are compared in Figure 29. In P. furiosus, tRNAGlu (CUC) (Figure 29A) is very close to the primordial tRNA sequence. The acceptor stem of tRNAGlu (CUC) varies by only 2 bp from tRNAPri. The D loop has the sequence D1-UGGUGUAGCCCGGUCAA-D17, differing from the primordial sequence in six positions but including no deletions relative to tRNAPri. By contrast, tRNAGln (CUG) has two deletions from the primordial sequence in the D loop (Figure 29B). For tRNAGlu (CUC), the anticodon stem matches tRNAPri in 3 of 5 bp. For tRNAGlu (CUC) and tRNAGln (CUG), the V loop has the sequence C_GAC, which matches the primordial sequence of CCGCC in three positions. Also, the V loop sequence C_GAC is found in tRNAAsp (GUC) (Figure 25A), indicating that tRNAGlu (CUC) may be derived from tRNAAsp, as might be expected. The T stem–loop–stem of tRNAGlu (CUC) is a perfect match to the primordial sequence. For tRNAGln (CUG), the T stem–loop–stem sequence is slightly altered relative to tRNAPri. We note that tRNAGln (CUG) has an unusual 1A=72U pair that is expected to separate more easily than 1G=72C in tRNAGlu (CUC) and many other P. furiosus tRNAs. Melting the 1A=72U pair in tRNAGln (CUG) should contribute to discriminator function (i.e., in pre-life and until eukaryogenesis, for the Glu to Gln amidotransferase). In T. thermophilus, the tRNAGlu (CUC) is similar to P. furiosus but more derived from the root sequence, as expected. As mentioned above, tRNAAsp, tRNAGlu and tRNAGln are closely related sequences in P. furiosus.
Figure 29.
tRNAGlu and tRNAGln. (A) P. furiosus tRNAGlu (CUC). (B) P. furiosus tRNAGln (CUG). (C) T. thermophilus tRNAGlu (CUC). Sgr for S. griseus. xU indicates an unknown 5-carbon U34 modification to suppress superwobbling.
4.15. LysRS-IB
Currently, no suitable demonstration structure of archaeal LysRS-IB-tRNALys is available. Because of homology, we assume the structure would be similar to the image of GluRS-IB-tRNAGlu (CUC) (Figure 28). LysRS-IB in Archaea appears to be the oldest LysRS. LysRS-IIB in Bacteria appears to be derived from AspRS-IIB, as a bacterial innovation. In Archaea, GluRS-IB, LysRS-IB (archaeal type) and GlnRS-IB (from Eukarya) are closely related AARS enzymes.
In Figure 30, a P. furiosus tRNALys (CUU) is shown. The acceptor stem sequence varies in only 2 bp from the primordial tRNA sequence. The D loop has the sequence D1-UAGCUUAGCCUGG_UUA-D17, differing in three positions from the primordial sequence, including a single base deletion relative to tRNAPri. The 5′-As* sequence GAGCG differs in only one position from the typical primordial sequence GGGCG. The anticodon stem matches tRNAPri in 3 of 5 bp. The type I V loop sequence AGGUC is typical. The T stem–loop–stem matches the primordial sequence in all but 2 stem bp. The modifications of the anticodon loop are as expected. In P. furiosus, tRNALys is most similar to tRNAPhe and somewhat similar to tRNAArg. Lysine and arginine are positively charged amino acids and may both be derived from ornithine by pre-life metabolism [77].
Figure 30.
P. furiosus tRNALys (CUU). xU is an unidentified 5-carbon U34 modification to suppress superwobbling.
4.16. AlaRS-IID
Alanine is proposed to be the founding amino acid for column 2 of the genetic code. It is hypothesized that AlaRS-IID may have replaced a now extinct AlaRS-IIA before LUCA, so there may be no sequence record of an earlier AlaRS-IIA. Homology comparing a IID and a IIA AARS is difficult to discern, so these are very different enzymes. The reason for the replacement may be to discriminate alanine, serine, threonine and proline. Column 2 of the genetic code includes SerRS-IIA, ThrRS-IIA and ProRS-IIA, indicating evolution in code columns. Column 2 of the code is divided into all 4-codon boxes.
AlaRS-IID-tRNAAla (UGC) of Archaeon Archaeoglobus fulgidus is shown in Figure 31 [93]. AlaRS-IID functions as an α2-dimer. This image is of only half of the protein. Interestingly, although alanine is located at a 4-codon sector, AlaRS-IID makes no contact with the tRNAAla anticodon loop. AlaRS-IID makes extensive elbow contacts, which may indicate tRNAAla distortion and allosteric effects of tRNAAla binding. 73-ACCA-76 penetrates the aminoacylating active site, which is also identified by the surface of antiparallel β-sheets and A5A reaction intermediate analogue binding. AlaRS-IID includes a separate editing active site that removes non-cognate azetidine-2-carboxylic acid, cysteine and α-aminobutyrate from tRNAAla. The aminoacylating active site of AlaRS-IID limits non-cognate glycine and serine attachment to tRNAAla. In Archaea, a separate AlaX editing enzyme is also present that can remove non-cognate amino acids from tRNAAla [94]. In the image shown, AlaX is light pink and was overlaid with the AlaRS-IID structure to locate the editing active site domain. AlaX may partly compensate for the lack of anticodon recognition by AlaRS-IID. Ala is a small hydrophobic amino acid with little chemical character, which may explain why AlaRS-IID has editing functions, including the trans AlaX editing function.
Figure 31.
AlaRS-IID-tRNAAla (UGC) of A. fulgidus. The AlaX protein of P. horikoshii (pink) is also shown and overlaid on the A. fulgidus structure to locate the editing active site. No contact is made by AlaRS-IID with the tRNAAla anticodon loop.
The P. furiosus tRNAAla (UGC) is shown in Figure 32. The acceptor stem of tRNAAla (UGC) varies in only 2 bp from the primordial tRNA sequence. The D loop has the sequence D1-UAGCUCAGCCUGG_UAU-D17, matching the primordial sequence in all but six positions, with a single base deleted from tRNAPri. The 5′-As* sequence has the sequence GAGCG versus the typical GGGCG. The anticodon stem matches tRNAPri in 3 of 5 bp. The V loop has the sequence AGGCC versus CCGCC for the primordial tRNA and AGGUC for the typical tRNA. The T stem–loop–stem matches the primordial tRNA. The P. furiosus tRNAAla has the appearance of an ancient tRNA, consistent with GADV being the first four amino acids in the code. As noted above, the P. furiosus tRNAAla is similar in sequence to tRNAVal, consistent with alanine and valine being early additions to the code.
Figure 32.
P. furiosus tRNAAla (UGC).
4.17. PheRS-IIC
It is hypothesized that aromatic amino acids (Phe, Tyr and Trp) entered the genetic code as some of the last amino acids added, in the disfavored row 1 (tRNA-36A) [95]. It is suggested that row 1 (tRNA-36A) was disfavored because, initially, both the tRNA-34 and tRNA-36 positions of the anticodon were wobble positions. During the evolution of the code, tRNA-34 remained a wobble position, but wobbling at tRNA-36 was suppressed. Wobbling at tRNA-36 was suppressed, in part, by modification of tRNA-37. Notably, if tRNA-36U is present, generally, tRNA-37A is modified (i.e., t6A or hn6A). If tRNA-36A is present, generally, tRNA-37G is modified (i.e., m1G). tRNA-37t6A may be more effective at suppressing tRNA-36U wobbling compared to the efficacy of tRNA-37m1G at suppressing tRNA-36A wobbling. In contrast to tRNA-36, tRNA-34 remained a wobble position. For one thing, the modification of tRNA-33U cannot alter tRNA-34 reading, because tRNA-33U is on the opposite side of the anticodon loop U-turn. At the base of code evolution, tRNA-33 is always U. Also, tRNA-35 is a Watson–Crick position that cannot be modified in any way that interferes with coding. In evolution, tRNA-34 wobbling could not be suppressed.
The following model is proposed for PheRS evolution. The initial PheRS may have been PheRS-IC derived distantly from a primitive ArgRS-IA or GluRS-IB. As TyrRS-IC and TrpRS-IC differentiated, there was insufficient discrimination between phenylalanine and tyrosine. PheRS-IC was then replaced by PheRS-IIC, before LUCA, leaving no sequence trace of PheRS-IC, except for TyrRS-IC and TrpRS-IC.
A detail of PheRS-IIC-tRNAPhe (GAA) from T. thermophilus is shown in Figure 33 [96,97]. PheRS-IIC in T. thermophilus functions as an α2β2-dimer, which is also the archaeal form. Only one αβ-unit is shown. To observe the relevant tRNA contacts, both tRNAPhe (GAA) were visualized. The aminoacylating active site is a surface of antiparallel β-sheets in the α-subunit. 73-ACCA-76 penetrates to the aminoacylating active sites. The separate editing active site is within the β-subunit. PheRS-IIC removes non-cognate tyrosine, meta- and para-substituted phenylalanine derivatives, leucine and isoleucine from tRNAPhe (GAA). An extrusion of the α-subunit makes elbow contact. An extrusion of the β-subunit makes anticodon contacts.
Figure 33.
PheRS-IIC-tRNAPhe (GAA) from T. thermophilus. Both tRNAPhe (GAA) are shown to indicate all relevant PheRS-IIC-tRNAPhe (GAA) contacts.
The P. furiosus tRNAPhe (GAA) is shown in Figure 34. The acceptor stem matches the primordial sequence in all but a single base pair. The D loop has the sequence D1-UAGCUCAGCCUGG__GA-D17, matching the primordial sequence in all but five positions, with two bases deleted relative to tRNAPri. The 5′-As* sequence GAGCA matches the primordial sequence GGGCG in three positions. The anticodon stem matches the primordial sequence in 4 of 5 positions. The V loop sequence GUGCC matches primordial CCGCC in three positions. The T stem–loop–stem is very similar to the primordial sequence. Interestingly, tRNAPhe (GAA) appears to be a relatively early tRNA, although phenylalanine appears to be a later entry into the code. In P. furiosus, tRNAPhe (GAA) is closely related in sequence to tRNALys (UUU) and (CUU).
Figure 34.
P. furiosus tRNAPhe (GAA).
4.18. TyrRS-IC
It is hypothesized that aromatic amino acids were a late addition to the genetic code along the disfavored row 1 (tRNA-36A). In evolution, TyrRS-IC and TrpRS-IC may be derived from a primitive ArgRS-IA or GluRS-IB. In contrast to most class I AARSs, which are α1-monomers, TyrRS-IC and TrpRS-IC are obligate α2-dimers, with the anticodon-binding and the aminoacylating active site for a single cognate tRNA in separate α-subunits. The TyrRS-IC-tRNATyr (GUA) from Archaeon Methanocaldococcus jannaschii is shown in Figure 35 [98]. Aminoacylating active sites are at the C-terminal ends of a set of parallel β-sheets. Tyrosine is bound at the aminoacylating active sites. 73-A(CCA)-76 is partly disordered in the structure. The anticodon 34-GUA-36 contacts the anticodon interaction domain. One tRNATyr (GUA) is white and mostly obscured in this image.
Figure 35.
TyrRS-IC-tRNATyr (GUA) of M. jannaschii.
In Figure 36A, the P. furiosus tRNATyr (GUA) is shown. The acceptor stem matches the primordial tRNA in all but two base pairs. The D loop sequence is D1-UAGCCUAGCCUGG_UAG-D17, matching the primordial sequence in all but four positions, with a single base deleted relative to tRNAPri. Consistent with the three 31 nt minihelix tRNA evolution theorem, the D loop sequence begins with two perfect UAGCC repeats. The 5′-As* sequence is UGGCG, matching the typical primordial sequence GGGCG in all but a single base. The anticodon stem matches tRNAPri in 3 of 5 bp. The type I V loop is the typical AGGUC. The T stem–loop–stem matches the primordial sequence in all but a single stem base pair. In Archaea, tRNATyr (GUA) is a type I tRNA. In Bacteria, by contrast, tRNATyr (GUA) is a type II tRNA [11] (Figure 36B). The difference appears to be a bacterial innovation.
Figure 36.
tRNATyr (GUA). (A) Archaeal P. furiosus tRNATyr (GUA) (type I). (B) Bacterial T. thermophilus tRNATyr (GUA) (type II). Lla for Lactobacillus lactis. In Bacteria, type II tRNATyr has two unpaired bases separating the 3′-V stem from the Levitt base.
4.19. TrpRS-IC
TrpRS-IC [99] is a very similar enzyme to TyrRS-IC. In the H. sapiens TrpRS-IC-tRNATrp (CCA) structure (Figure 37), 73-ACCA-76 enters the aminoacylating active site, where tryptophan binds. A set of parallel β-sheets approach the aminoacylating active site. There are substantial allosteric effects on tRNATrp (CCA) from TrpRS-IC binding. Elbow contacts between the D loop (D12-GG-D13) and the T loop (54-UU/CAA-58) are broken. The Levitt bp is also disrupted. Deformability of tRNATrp (CCA) may contribute to cognate tryptophan charging. Tryptophan is in a 1-codon box in the code, which is generally not allowed. Tryptophan can be in a 1-codon box because Trp shares a 2-codon box with a stop codon (UGA), which does not utilize a tRNA but rather is recognized by a protein release factor binding to the UGA stop codon on the mRNA on the ribosome. Methionine is also in a 1-codon box that is shared with isoleucine (anticodon CAU). In this case, different wobble 34C modifications explain how translational accuracy is maintained, and the UAU Ile anticodon is generally not utilized.
Figure 37.
TrpRS-IC-tRNATrp (CCA) from H. sapiens.
The P. furiosus tRNATrp (CCA) is shown in Figure 38. The acceptor stem matches the primordial tRNA sequence at 4 bp. The tRNA has a D loop with the sequence D1-UGGUGUAGCCUGGUCCA-D17, matching the primordial sequence in all but five positions and including no deletions from tRNAPri. The anticodon stem matches tRNAPri in 4 of 5 bp. The T stem–loop–stem matches the primordial tRNA sequence in all but one stem base pair. In P. furiosus, tRNATrp (CCA) is similar to tRNAPro (GGG, CGG and UGG).
Figure 38.
P. furiosus tRNATrp (CCA).
5. The Genetic Code
A model for the evolution of the first genetic code is shown in Figure 39 [15,16,17,18,64]. Much of the data supporting this model are summarized in Figure 40 and Figure 41. The code is represented as a codon–anticodon table with a complexity of 32 assignments, rather than 64 assignments. Because of code degeneracy, 32 assignments (2 × 4 × 4) is the maximum complexity of the genetic code in tRNA, because a wobble position (tRNA-34) has only purine versus pyrimidine resolution. The code is highly ordered. Most of the evolution is in code columns. The history of the evolution of AARS enzymes (summarized below) relates a fairly straightforward story of evolution of the code. The genetic code is simpler and more ancient in Archaea. The code is more innovated in Bacteria and Eukarya [61].
Figure 39.
A model for the evolution of the first code. A codon–anticodon table is shown with a maximum complexity of 32 assignments, as in tRNA. Codons are shown in sectors marked 1st, 2nd and 3rd. Anticodons (Ac) are indicated (i.e., 34-[A/G]AA-36). Anticodons that are not utilized are shown in red letters. No tRNA matches stop codons (UAA, UAG, and UGA). Blue 34U indicates a modification to limit superwobbling, such as 34cnm5U. As indicated above, some exceptions have been noted, but wobble 5C-U modifications to suppress superwobbling may have been universal at the inception of the first code. 37m1G is associated with 36A. 37t6A is associated with 36U. Column 1, row 3B, 34C modifications (orange) discriminate Ile and Met. The genetic code evolved primarily in columns, as indicated in the model. In column 1, ValRS-IA, LeuRS-IA, IleRS-IA and MetRS-IA are closely related enzymes (yellow type). In column 2, SerRS-IIA, ProRS-IIA and ThrRS-IIA are closely related enzymes (red type). In column 3, AspRS-IIB, AsnRS-IIB and HisRS-IIA are closely related (green type), and GluRS-IB, LysRS-IB (in Archaea) and GlnRS-IB (a eukaryotic innovation) are closely related (orange type). In column 4, ArgRS-IA and CysRS-IA are closely related. In row 1, TyrRS-IC and TrpRS-IC are closely related. In the anticodon loop image, modified bases are indicated in ball-and-stick representation. A similar figure was previously published and is republished here with permission [15,16].
Figure 40.
Evolution of AARS enzymes. Phyre 2 homology scoring mostly to P. furiosus AARS sequences was used to draw the class II and class I AARS maps. GlyRS-IIA is homologous to ValRS-IA and IleRS-IA by sequence as indicated by the red arrow. AARSs with separate editing active sites are shaded gray. AARS that have editing reactions only in their aminoacylating active sites are shaded pale yellow. Bacterial innovations are indicated (B). Archaeal-type AARSs are indicated (A). GlnRS-IB was a eukaryotic innovation (E). GlyRS-IIA appears to be the root of all class II and class I AARSs. A primitive ValRS-IA appears to be the root of all class I AARSs. PheRS-IIC and AlaRS-IID are in bold because these enzymes may have replaced PheRS-IC and AlaRS-IIA before LUCA. Sep stands for o-phosphoserine. Pyl stands for pyrrolysine.
Figure 41.
Relationship of AARS enzymes and the genetic code. Column 1 amino acids and AARSs are on an orange background. Column 2 amino acids and AARSs are on a blue background. Column 3 amino acids and AARSs are on a green background. Column 4 amino acids and AARSs are on a red background. Row 1 amino acids and AARSs are on a yellow background. Other indications are as in Figure 40.
The model for the evolution of the genetic code relies on the solution of tRNA evolution [15,30,31]. For tRNA, the orderly mechanism of tRNA assembly and tRNA root sequences is known (Figure 1 and Figure 2). tRNA evolved according to the three 31 nt minihelix tRNA evolution theorem. Based on sequences, this is much more of a theorem (a proven theory) than a conjecture or hypothesis or model. The original tRNA sequence was 100% RNA repeats (GCG, CGC and UAGCC) and inverted repeats (~CCGGG_CU/GCCAA_CCCGG; _ separates stems and loops; / indicates a U-turn; the only sequence ambiguity is in the anticodon GCC, which has since been scrambled in coding). Because the initial tRNA was so highly ordered, tRNA evolution was solved by inspection as a simple puzzle. ACCA-Gly was ligated at the tRNA 3′-end to synthesize polyglycine. There is no “chicken and egg” problem in the evolution of the genetic code, because the code initially evolved to synthesize polyglycine and subsequently advanced to encode GADV polymers. The code did not need foresight of its evolving role in encoding RNA sequence-dependent proteins.
The evolution of the code makes the best sense when viewed by code columns. The model for filling the code might be the following: G (polyglycine) evolved to GADV, which evolved to GADVLSER, which evolved to GADVLSERNCQ, which evolved to GADVLSERNCQPTIMHK, which evolved to GADVLSERNCQPTIMHKFYW [15,16,100]. At about the 11 amino acid stage, GADVLSERNCQ might be expected to support synthesis of the first sequence-dependent proteins. Dividing the evolutionary history into code columns makes the best sense (Figure 39). NCQ, and possibly other amino acids, were added through tRNA-linked chemistry, giving insights into a major mechanism for RNA-linked pre-life metabolism.
5.1. Column 1
In column 1 (tRNA-35A), valine may be the founding amino acid. It appears that valine (tRNA-36C) goes to leucine, which goes to isoleucine, which goes to methionine. Phenylalanine is added last along the disfavored row 1 (tRNA-36A). In metabolism, valine can be converted to leucine in five steps. Thus, leucine may have been initially added to the code by tRNA-linked chemistry. Val-tRNAVal may have been converted to Leu-tRNALeu in several steps, either supported by ribozymes or by the first protein catalysts. We posit that tRNA-linked and RNA-linked chemistry were very ancient and were fundamental to the evolution of pre-life metabolism and the genetic code. Notably, the evolution of tRNA and the divergence of tRNAomes is a story of RNA–amino acid and RNA–protein evolution [43]. In the first code, tRNAVal is type I and tRNALeu is type II, and type I tRNA was processed from a primitive type II tRNA (Figure 1 and Figure 2). During early tRNAome assembly, tRNAs may have been mixed type I and type II, and the mixtures may have been sorted later by selection to construct the first code. In Archaea, only leucine and serine utilize type II tRNAs. In Bacteria, tyrosine, leucine and serine utilize type II tRNAs [11]. The number of amino acids supported by type II tRNAs was limited by the number of allowed trajectory set points of the type II V arm.
In Archaea, type II tRNAs encoding leucine and serine were selected to substitute longer V arms for anticodon loop recognition by their cognate AARSs, because 5 tRNALeu and 4 tRNASer were necessary [11]. Isoleucine was the next amino acid to enter column 1. Neither valine nor leucine can be converted to isoleucine. Threonine (tRNA-36U), which borders isoleucine (tRNA-36U) in column 2 of the code, however, can be converted to isoleucine. It may be that Thr-tRNAThr evolved to Ile-tRNAIle via tRNA-linked chemistry. It is hypothesized that isoleucine briefly occupied a 4-codon sector of the code that was invaded by methionine. It appears that tRNAIle may have evolved to tRNAMet. In P. furiosus, tRNAIle and tRNAMet are similar. It appears that methionine invaded a 4-codon isoleucine sector. At the base of the code evolution, the anticodon UAU was eliminated. Without modification, UAU would cause confusion between encoding isoleucine and methionine. Both the initiator and elongator tRNAMet (CAU) evolved. The initiator tRNAMet (CAU) is unmodified at 34C. The elongator tRNAMet (CAU) utilizes the 34Cm modification. Phenylalanine was added to the code late (described below). In Archaea, 34agm2CAU (agm2C for agmatidine) encodes isoleucine. In Bacteria, 34k2CAU (k2C stands for lysidine) encodes isoleucine.
5.2. Column 2
Column 2 (tRNA-35G) of the code appears to have evolved from alanine to serine and then to proline and threonine. Serine appears to have jumped from column 2 to column 4 of the genetic code, perhaps, in part, to obtain a more favorable anticodon (tRNA-35C appears to be favored over tRNA-35G). Alanine can be converted to serine in several steps, so Ala-tRNAAla may have evolved to Ser-tRNASer. If this is the case, however, type I tRNASer was probably replaced from type II tRNALeu. From sequences, it appears that type II tRNASer was derived from type II tRNALeu. Serine is a special case, because serine is the only amino acid that appears to jump columns in the establishment of the genetic code. Also, serine can be converted to cysteine by tRNA-linked chemistry [82,101,102]. It is hypothesized that the conversion of serine to cysteine may relate to the jumping of serine from column 2 to column 4 of the code. To establish the standard code, cysteine landed in column 4, the disfavored row 1. Cysteine, however, must have entered the code earlier, perhaps linked to serine within an expanded serine sector. In proteins, cysteine is necessary for Zn binding, which was required for the first protein folding. AARS enzymes are an example of the first proteins, coevolved with the genetic code, whose folding depended on cysteine binding Zn. Cysteine may have occupied the disfavored row 1 (tRNA-36A) late in the evolution of the code. Serine may have displaced cysteine in row 3, column 4, where serine now resides. Serine appears to have jumped columns by invading and splitting an expanded arginine sector.
Because serine utilizes a type II tRNASer, and because SerRS-IIA lacks tRNASer anticodon recognition, these features may have facilitated serine or serine/cysteine jumping in the evolution of the code. A change of Ser/Cys-tRNASer/Cys (GGU)→Ser/Cys-tRNASer/Cys (GCU) could account for serine jumping columns. GGU becomes a threonine anticodon, indicating that the threonine 4-codon sector (column 2, row 3) displaced serine from an expanded serine sector. Threonine and serine are chemically related amino acids. Proline also appears to have displaced serine to form a 4-codon sector (column 2, row 2).
It is hypothesized that the first AlaRS may have been an AlaRS-IIA, from which column 2 SerRS-IIA, ThrRS-IIA and ProRS-IIA were derived. As the genetic code was built up, however, we posit that AlaRS-IIA was replaced by AlaRS-IID. The proposed replacement is analogous to the replacement of archaeal-type GlyRS-IIA by GlyRS-IID in more derived Bacteria. If the AlaRS-IIA-to-AlaRS-IID replacement event was prior to LUCA, there may now be no sequence record of AlaRS-IIA. The AlaRS-IID innovation helped discriminate neutral amino acids, alanine, serine, threonine and proline. AlaRS-IID has editing functions, and AlaRS-IID has a separate editing domain. In ancient organisms, AlaRS-IID utilizes the AlaX editing protein to support accuracy (Figure 31). The AlaRS-IID aminoacylating active site also has editing functions. The AlaX protein may partially compensate for the lack of AlaRS-IID anticodon loop recognition. Alanine is in a 4-codon box, but AlaRS-IID does not utilize the tRNAAla anticodon loop as a determinant for accurate charging.
5.3. Column 3
Column 3 is the most innovated column in the code, encoding the most amino acids. Notably, column 3 is broken into all 2-codon sectors. It is hypothesized that column 3 may have been sectored by a slightly different mechanism compared to columns 1, 2 and 4. We suggest that early in code evolution both tRNA-34 and tRNA-36 were wobble positions, but only a single wobble position could be utilized at a time. According to this view, columns 1, 2 and 4 primarily utilized Watson–Crick 35 and wobble 36. Column 3 primarily utilized Watson–Crick 35 and wobble 34. tRNA-35 was always the easiest to read because this is the central base in the anticodon. In a wobble position, only purine–pyrimidine discrimination is achieved, so only two possible code assignments are obtained. In such a scenario, the complexity of the evolving code would be 2 × 4 or 4 × 2 or 8 amino acids, depending on the wobble position (tRNA-34 or tRNA-36). Because of tRNA-linked chemistry adding NCQ, the limited code probably expanded to 11 amino acids (GADVLSER expanded to GADVLSERNCQ). We posit that the evolution of the genetic code was hung up at 8 or 11 amino acids until wobbling at tRNA-36 could be suppressed. Wobbling at tRNA-36 was suppressed, in part, by modifications at tRNA-37. tRNA-37G modifications (i.e., m1G) were used to read tRNA-36A. tRNA-37A modifications (i.e., t6A) were used to read tRNA-36U. It appears that wobbling at tRNA-36U was more readily suppressed than wobbling at tRNA-36A. Notably, 37t6A to suppress 36U is a more dramatic modification than 37m1G to suppress 36A. In the evolution of the code, row 3 (tRNA-36U) of the genetic code appears to have been more favorable than row 1 (tRNA-36A).
Column 3 appears to have first encoded aspartic acid. The chemically related glutamic acid may then have invaded the B rows. This resulted in a Glu-Asp-Glu-Asp-Glu-Asp (column 3, row 4B-4A-3B-3A-2B-2A) pattern (Figure 39). Asparagine displaced aspartic acid in column 3, row 3A. Histidine displaced aspartic acid in column 3, row 2A. Lysine displaced glutamate in column 3, row 3B. Glutamine displaced glutamate in column 3, row 2B. Stop codons and tyrosine were added late across the disfavored row 1.
As soon as aspartic acid and glutamate entered the code, tRNA-linked chemistry generated asparagine and glutamine [87,90,91,103,104]. Serine can be converted to cysteine by tRNA-linked chemistry [82,101,102]. The 8 amino acid code (i.e., GADVLSER), therefore, rapidly evolved to an 11 amino acid code (GADVLSERNCQ) by tRNA-linked chemistry. The 11 amino acid code appears to be sufficient to generate the first RNA sequence-dependent proteins.
5.4. Column 4
Column 4 (tRNA-35C) appears to be the most favored code column. Glycine appears to occupy the most favored sector of the genetic code (tRNA-35C, tRNA-36C). It is hypothesized that glycine is the founding amino acid in the code. GADV, the four simplest and probably the four initial encoded amino acids, occupy the most favored row of the genetic code (tRNA-36C) [65,66,67,68,105,106]. These observations are consistent with glycine being the first encoded amino acid [19,20]. In P. furiosus, tRNAGly is the most similar to tRNAPri, indicating that glycine may have been the first encoded amino acid [21]. GlyRS-IIA appears to be the root for all class II and class I AARSs. Glycine is the smallest and the most flexible amino acid. It is very likely that glycine was the founding amino acid in the evolution of the genetic code.
In contrast with glycine, arginine, which is also in column 4, is a complex amino acid. It is hypothesized that ornithine may have been the founding positively charged amino acid [77]. Ornithine can be converted to arginine in two metabolic steps. In some Archaea and Bacteria, ornithine can be converted to lysine by the α-aminoadipate pathway [107,108,109,110,111,112]. Thus, arginine and lysine may have entered the genetic code through tRNA-linked reactions: Orn-tRNAOrn evolved to Arg-tRNAArg (column 4), and Orn-tRNAOrn (UCU, CCU) (column 4) evolved to Lys-tRNALys (UUU, CUU) (column 3).
ArgRS-IA is the closest relative of CysRS-IA, indicating how cysteine may have evolved to its current placement in column 4, row 1A, of the code. In P. furiosus, only tRNAThr (column 2) and tRNACys (column 4) utilize the discriminator base 73U. tRNAThr and tRNACys are similar in sequence in P. furiosus.
5.5. Disfavored Row 1
The disfavored row 1 of the genetic code appears to have been sectored last. Phenylalanine, tyrosine and tryptophan are complex aromatic amino acids. It has been hypothesized that, initially, phenylalanine spread across row 1, utilizing a primitive PheRS-IC. From PheRS-IC, both TyrRS-IC and TrpRS-IC may have been derived. To suppress translation errors, it is hypothesized that PheRS-IC was replaced by PheRS-IIC before LUCA. There now appears to be no sequence trace of PheRS-IC. PheRS-IIC has a separate editing active site to suppress non-cognate charging with tyrosine and other amino acids. In Bacteria, tRNATyr (GUA) is a type II tRNA, but this is a bacterial innovation, perhaps, to suppress translation errors (i.e., enhancing the discrimination of Phe and Tyr). Amino acids that differ only in a hydroxyl group are difficult for AARS enzymes to distinguish [36].
Serine appears to have ended up in column 2, row 1, from its particular chaotic history in genetic code evolution that may have involved the invasion of an expanded serine sector by threonine and proline. Cysteine may have ended up in column 4, row 1A, from the history of serine jumping from column 2 to column 4.
We conclude that a rational explanation can be provided for the placements of: (1) all amino acids; (2) class II and class I AARS enzymes; and (3) many tRNAs in the evolution of the genetic code. When this project was started, this outcome was not anticipated.
5.6. Stop Codons and Evolution of Translational Fidelity
Stop codons are located at column 3, row 1B, and column 4, row 1B. We consider it likely that stop codons were a late addition to the code. The evolution of the genetic code can be viewed as the evolution of intellectual property initially to support polypeptide polymer synthesis as a pre-life chemistry emulsifier and then progressing to cognate coding with the inception of complex life. According to this view, initially, in pre-life, long protein polymers and innovation in amino acid additions were selected over fidelity. Translational fidelity became ever more important, however, as the genetic code evolved and the system developed intellectual property in tRNAomes, AARSomes and the first proteins that were more strongly selected.
Initially, the code evolved to synthesize polyglycine and then GADV polymers as emulsifiers for metabolic reaction components and to coalesce the first protocells. As accurate coding became more strongly selected, the selective pressure was toward the evolution of fidelity mechanisms, such as editing mechanisms in AARS aminoacylating active sites and separate proofreading domains. Stop codons and frame maintenance are also fidelity mechanisms.
Amino acids with little chemical character are located at the left half of the genetic code (Val, Met, Ile, Leu, Phe, Ala, Thr, Pro, and Ser) (columns 1 and 2). These amino acids are charged to cognate tRNAs by AARS enzymes that edit either within separate proofreading active sites, within the aminoacylating active site, or both. Amino acids from the right half of the code have more chemical character (Glu, Asp, Lys, Asn, Gln, His, Tyr, Gly, Arg, Trp and Cys). Cognate charging of right-half amino acids (columns 3 and 4) generally does not require editing. Right-half amino acids have more chemical character (i.e., charge, hydrogen bonding, and metal binding (i.e., Cys)) that is used to support accurate charging of their cognate tRNA.
Initially in pre-life, stop codons were not as important as later in evolution, because making longer emulsifying polymers was more important than accurate stops. Also, because of RNA ligations for RNA replication, combinations of primitive protein reading frames were strongly selected to generate more complex first proteins and new functions. When sequences were fused out of frame, therefore, more complex proteins were initially synthesized using frame shifts, before translation frames evolved to be in phase. A primitive class I ValRS-IA evolved by ligation of an N-terminal-encoding RNA to a primitive class II GlyRS-IIA-encoding RNA. In the absence of hard stop codons, the initial ligation was not necessarily in phase. Initially, innovation was strongly valued over accuracy. Stop codons are read by protein release factors in mRNA. Protein release factors bind to stop codons in mRNA and effect the nascent protein release from the ribosome [113]. No tRNA is associated with stop codons and translation termination. In suppressor strains, a tRNA anticodon mutates into a stop codon to add an amino acid, somewhat inefficiently, in place of a stop.
6. Radiation of AARSomes
Much has been written about the radiations of AARSomes and individual AARS enzymes [36,37,114,115,116,117,118]. To our knowledge, however, others have not attempted to correlate the radiation of AARSomes to the structure of the first genetic code, as we attempt here. We do not claim that our effort will be the final word on this project. We present a working model that probably can be improved using emerging network analysis techniques. In support of what we have done, our analysis of AARSomes fits the structure of the genetic code very closely.
Figure 39, Figure 40 and Figure 41 document the evolution of the first genetic code. Figure 39 shows the ordered structure of the code, indicating the relatedness of AARS enzymes [15,16,17,18,64]. Most of the evolution is in code columns, as indicated in the model. Figure 40 relates the relationships among all class II and class I AARSs in the ancient Archaeon P. furiosus. Some AARSs missing in P. furiosus were supplied from other species, as appropriate. P. furiosus was selected because P. furiosus has an ancient tRNAome that is similar to that of LUCA [21]. It was assumed that the P. furiosus AARSome would also be similar to LUCA. Figure 40 was prepared, as previously described, using Phyre 2 homology scoring by structure and sequence [18,119]. The relatedness of GlyRS-IIA and ValRS-IA and IleRS-IA sequences has previously been demonstrated [15,16,17,18,64]. Figure 41 shows how the structure of the genetic code relates to the apparent lineages of AARS enzymes by correlating the map in Figure 39 to the pattern of AARS evolution shown in Figure 40. In Figure 41, AARS enzymes are assigned background colors that relate to the structure of the code shown in Figure 39. These three figures summarize an ordered model for AARS and genetic code evolution.
A primitive GlyRS-IIA appears to be the root of both the class II lineage and the class I lineage. Based on sequence homology, it is hypothesized that a primitive ValRS-IA was derived from a primitive GlyRS-IIA by ligation of an N-terminal-encoding RNA to a GlyRS-IIA-encoding RNA. In pre-life, the replication of RNAs required ribozyme ligases that generated long and complex RNAs and complex proteins very early in evolution. tRNA evolution required ligation and complementary replication to generate tRNA out of 31 nt minihelices. Attachment of the N-terminal-encoding RNA to the primitive GlyRS-IIA-encoding RNA altered the folding of the translated protein to a primitive ValRS-IA. Zn-binding motifs were important in folding the first AARS enzymes.
AARS enzymes were among the first proteins, coevolved with the genetic code. Without full tRNAomes and AARSomes, there is no standard code. It is hypothesized that sequence-dependent proteins emerged at about the 11 amino acid stage of code evolution (i.e., GADVLSERNCQ) (R may initially have been O (ornithine) that radiated to R and K). The 11 amino acid stage provides sufficient chemical diversity (i.e., flexibility, hydrophobicity, hydrogen bonding, and charge) to encode the first proteins. The addition of amino acids to the code improved protein structure and function until 20 amino acids and stops were encoded. As the code froze, adding additional amino acids became more of a liability because of the threat posed to translational accuracy. There is tension between innovation and error catastrophe. To prevent error catastrophe, fidelity mechanisms such as amino acid identity (chemical character) and editing by AARSs froze the code. Early in code evolution, innovation was more strongly selected. Late in code evolution, fidelity mechanisms evolved to protect the intellectual property that pre-biology and emerging biology had generated.
The model for AARS radiation (Figure 39, Figure 40 and Figure 41) is a working model. More advanced network and evolutionary analyses will be necessary to confirm or improve the model. To enhance tRNAome networks, alignments of tRNAs must be optimized in the D loop region and V loop region.
7. Evolution of Complex Life
The pathway to evolve complex life on Earth, supported by a genetic adapter and genetic code, is mostly elucidated [15,30]. Once the genetic code arose, all features of complex life and biodiversity became possible. This solution is embedded in the sequence of tRNAPri and in the order of assembly of the genetic code. tRNA was formed from GCG, CGC and UAGCC repeats and the inverted repeats (i.e., ~CCGGG_CU/GCCAA_CCCGG). tRNA evolved from ligation of three 31 nt minihelices of mostly known sequence (GCGGCGG_UAGCC_UAGCCUA_GCCUA_CCGCCGC and ~GCGGCGG_CCGGG_CU/GCCAA_CCCGG_CCGCCGC). ACCA-Gly was ligated to various RNAs, including tRNAs, to synthesize polyglycine. The genetic code evolved as described in this report. Primitive pre-mRNAs and pre-rRNAs were generated by similar processes of ligation and genetic recombination.
To evolve tRNA required a small number of catalytic functions (i.e., ribozymes). The process required a mechanism to generate RNA repeats and inverted repeats. Multiple functions were necessary, including RNA ligase, RNA replicase (complementary replication), exo- and endo-nucleases, ribose 2′-O-methyltransferase (for RNA stability) and ACCA-Gly transferase. Complementary replication utilizing snap-back primers (i.e., 31 nt minihelices) was needed. With these ingredients and little else, it should be possible to recreate most of the origin of tRNA and the genetic code in a laboratory. The evolution of tRNA and the genetic code describe an RNA–amino acid and RNA–peptide world overlaid on primitive metabolism with coevolution of protocells to generate the first life on Earth.
8. Discussion
The genetic code coevolved with tRNA, tRNAomes, AARSomes, ribosomes and the first proteins [5,6,7,8,9,10]. The evolution of AARSomes is evident in genetic code columns. In column 1, ValRS-IA, LeuRS-IA, IleRS-IA and MetRS-IA are closely related enzymes. In column 2, SerRS-IIA, ProRS-IIA and ThrRS-IIA are closely related enzymes. AlaRS-IID may have replaced a now extinct AlaRS-IIA before LUCA. Column 3 demonstrates a striped pattern of related AARS enzymes. AspRS-IIB, AsnRS-IIB and HisRS-IIA are closely related enzymes in rows 4A, 3A and 2A. GluRS-IB, LysRS-IB (in Archaea) and GlnRS-IB (a eukaryotic innovation) are closely related enzymes in rows 4B, 3B and 2B. A primitive GlyRS-IIA appears to be the founding AARS. tRNAGly appears to be the founding tRNA that is most similar to tRNAPri [21]. Glycine appears to be the founding amino acid [19,20], and glycine occupies the most favored sector in the code (tRNA-35C, tRNA-36C). In column 4, ArgRS-IA and CysRS-IA are closely related enzymes. Row 1 of the genetic code appears to have been sectored last. TrpRS-IC and TyrRS-IC are closely related enzymes. PheRS-IIC appears to be a late substitution, perhaps for a PheRS-IC, from which TyrRS-IC and TrpRS-IC were derived. According to this view, the hypothesized PheRS-IC is now extinct. Cysteine may have first entered the code through tRNA-linked chemistry within an expanded serine sector.
Coding evolved around tRNA and the tRNA anticodon. Coding should be viewed as arising first in the tRNA anticodon. In tRNA, the maximum number of coding assignments is limited to 32 by wobbling. tRNA cannot support 64 genetic code assignments, as can DNA and mRNA. Coding coevolved from tRNA anticodons into mRNA codons and then was cast into DNA for more stable information storage. Degeneracy of the code is a feature of tRNA and the tRNA anticodon. Wobbling at tRNA-34 created code degeneracy. The suppression of wobbling at tRNA-36 gives the history of genetic code establishment. Even with modifications, no tRNA-34A is utilized at the base of the genetic code evolution. Elp3 and subsequent tRNA-34U 5-carbon modifications suppressed superwobbling in order to evolve 2-codon sectors in the code (i.e., column 3) [61,62,63]. tRNA-34, tRNA-37 and other tRNA modifications were necessary to evolve the first code.
Because of the placement of the anticodon loop U-turn, wobbling at tRNA-36 was suppressed, but wobbling at tRNA-34 was not. Next to tRNA-34 is tRNA-33U, which is on the opposite side of the anticodon U-turn. Because of the placement of the U-turn, modifying tRNA-33U would be unlikely to influence reading at tRNA-34. Also, tRNA-33U is almost never substituted, indicating that a purine at that position might disrupt loop geometry. Modifications of tRNA-35 cannot compensate because 35 is a Watson–Crick position for coding that cannot be specified in sequence or modified in a manner that affects coding. Apparently, modifications of tRNA-37 helped to suppress wobbling at tRNA-36, particularly for tRNA-36U (i.e., tRNA-37t6A) and tRNA-36A (i.e., tRNA-37m1G) [61]. To evolve the first code, these modifications may have been universal. As systems have evolved, some compensations for some modifications may have coevolved. Wobbling at tRNA-34 (regulated) versus tRNA-36 (suppressed) appears to explain why columns 1, 2 and 4 differ in their sectoring from column 3, which is the most innovated column.
9. Conclusions
The genetic code is simpler in Archaea than in Bacteria and Eukarya, indicating that the archaeal code is most similar to the LUCA code. The code in Archaea is highly ordered, and the order provides the history for first code establishment. tRNAomes are simpler in Archaea. Organisms with the simplest tRNAomes are the closest to LUCA. tRNAome and AARSome networks of ancient organisms describe the history of the establishment of the first code.
tRNA evolved from RNA repeats and inverted repeats of known sequence. Three 31 nt minihelices were ligated and processed by orderly internal 9 nt deletion(s) into type I and type II tRNAs [15,31,120] (Figure 1 and Figure 2). In pre-life, multiple RNAs were joined as replication intermediates, generating long functional RNAs, such as tRNAs, pre-mRNAs and primitive rRNAs. tRNA evolution is a story of amino acid–RNA and protein–RNA-linked chemistry [43], so life evolved from a complex RNA–amino acid–RNA–protein–metabolism world, packaged in coevolved protocells. When coupled with coded protein synthesis, this evolving pre-life world generated remarkable complexity and fostered surprising innovation. The first proteins that coevolved with the genetic code were highly evolved, innovated and complex constructs, many of which remain largely unaltered to the present day. With the freezing of the first code, life as currently known emerged on Earth. The history of tRNA evolution is embedded in tRNA sequences, which can be read. The history of the evolution of the genetic code is embedded in code structure and interacting tRNAome, AARSome and first protein networks.
The core history of abiogenesis is the evolution of tRNA, which was recorded and preserved in tRNA sequences. The history of genetic code evolution was written into the standard genetic code structure and AARSome radiation. AARSome structure provides a history that describes genetic code structure and evolution. tRNAome structure contributed to genetic code assembly but was less important for directing the structure of the code.
Based on tRNA sequences, we posit that RNA repeat and inverted repeat worlds gave way to a 31 nt minihelix world, which evolved to a tRNA world (Figure 1 and Figure 2). Thus, peptide synthesis pre-dated tRNA, and tRNA evolved from 31 nt minihelices as an improved mechanism to synthesize long peptides (i.e., polyglycine). tRNA evolved from a world that was capable of complex metabolism and RNA sequence manipulation (i.e., the selection and ordered assembly of short, stable RNA stems and 7 nt U-turn loops). Polyglycine and polyGADV amyloids and polymers were an important feature of emerging life and the evolving code. Cells were emulsified from the inside by polypeptide polymers, coacervates and amyloids and from the outside by membrane encapsulation.
For astrobiology, because of the challenges generating a genetic adapter and a code, it is difficult to see how life could evolve separately on another planet or moon by a very different chemistry or different pathway. If there is another route to a suitable genetic adapter than tRNA, we are not certain what that might be. Life without a genetic adapter and genetic code has limited possibilities.
Abbreviations
The following abbreviations are used in this manuscript:
| AARS | Aminoacyl-tRNA synthetase |
| Aae | Aquifex aeolicus |
| Bta | Bos taurus |
| Eco | Escherichia coli |
| Hvo | Haloferax volcanii |
| Hsa | Homo sapiens |
| Lbp | Levitt base pair |
| Lla | Lactobacillus lactis |
| LUCA | Last universal common (cellular) ancestor |
| Mca | Mycoplasma capricolum |
| Pri | Primordial |
| Pfu | Pyrococcus furiosus |
| Sau | Staphylococcus aureus |
| Sgr | Streptomyces griseus |
| Tac | Thermoplasma acidophilum |
| Tth | Thermus thermophilus |
Author Contributions
All authors have read and agreed to the published version of the manuscript.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
No new data were created or analyzed in this study.
Conflicts of Interest
The authors declare no conflicts of interest.
Funding Statement
This research received no external funding.
Footnotes
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
References
- 1.Pavlinova P., Lambert C.N., Malaterre C., Nghe P. Abiogenesis through gradual evolution of autocatalysis into template-based replication. FEBS Lett. 2023;597:344–379. doi: 10.1002/1873-3468.14507. [DOI] [PubMed] [Google Scholar]
- 2.Peng Z., Linderoth J., Baum D.A. The hierarchical organization of autocatalytic reaction networks and its relevance to the origin of life. PLoS Comput. Biol. 2022;18:e1010498. doi: 10.1371/journal.pcbi.1010498. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Freeland S. Undefining life’s biochemistry: Implications for abiogenesis. J. R. Soc. Interface. 2022;19:20210814. doi: 10.1098/rsif.2021.0814. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Williamson M.P. Autocatalytic Selection as a Driver for the Origin of Life. Life. 2024;14:590. doi: 10.3390/life14050590. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Prosdocimi F., de Farias S.T. Origin of life: Drawing the big picture. Prog. Biophys. Mol. Biol. 2023;180–181:28–36. doi: 10.1016/j.pbiomolbio.2023.04.005. [DOI] [PubMed] [Google Scholar]
- 6.Farias S.T., Prosdocimi F. RNP-world: The ultimate essence of life is a ribonucleoprotein process. Genet. Mol. Biol. 2022;45:e20220127. doi: 10.1590/1678-4685-gmb-2022-0127. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.de Farias S.T., Rego T.G., Jose M.V. Origin of the 16S Ribosomal Molecule from Ancestor tRNAs. J. Mol. Evol. 2021;89:249–256. doi: 10.1007/s00239-021-10002-8. [DOI] [PubMed] [Google Scholar]
- 8.de Farias S.T., Jose M.V. Transfer RNA: The molecular demiurge in the origin of biological systems. Prog. Biophys. Mol. Biol. 2020;153:28–34. doi: 10.1016/j.pbiomolbio.2020.02.006. [DOI] [PubMed] [Google Scholar]
- 9.de Farias S.T., Rego T.G., Jose M.V. tRNA Core Hypothesis for the Transition from the RNA World to the Ribonucleoprotein World. Life. 2016;6:15. doi: 10.3390/life6020015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.de Farias S.T., do Rego T.G., Jose M.V. Evolution of transfer RNA and the origin of the translation system. Front. Genet. 2014;5:303. doi: 10.3389/fgene.2014.00303. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Lei L., Burton Z.F. Origin of Type II tRNA Variable Loops, Aminoacyl-tRNA Synthetase Allostery from Distal Determinants, and Diversification of Life. DNA. 2024;4:252–275. doi: 10.3390/dna4030017. [DOI] [Google Scholar]
- 12.Li R., Macnamara L.M., Leuchter J.D., Alexander R.W., Cho S.S. MD Simulations of tRNA and Aminoacyl-tRNA Synthetases: Dynamics, Folding, Binding, and Allostery. Int. J. Mol. Sci. 2015;16:15872–15902. doi: 10.3390/ijms160715872. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Han Z., Wang X., Wu Z., Li C. Study of the Allosteric Mechanism of Human Mitochondrial Phenylalanyl-tRNA Synthetase by Transfer Entropy via an Improved Gaussian Network Model and Co-evolution Analyses. J. Phys. Chem. Lett. 2023;14:3452–3460. doi: 10.1021/acs.jpclett.3c00366. [DOI] [PubMed] [Google Scholar]
- 14.Shao Q., Han Z., Cheng J., Wang Q., Gong W., Li C. Allosteric Mechanism of Human Mitochondrial Phenylalanyl-tRNA Synthetase: An Atomistic MD Simulation and a Mutual Information-Based Network Study. J. Phys. Chem. B. 2021;125:7651–7661. doi: 10.1021/acs.jpcb.1c03228. [DOI] [PubMed] [Google Scholar]
- 15.Lei L., Burton Z.F. Chemical Evolution of Life on Earth. Genes. 2025;16:220. doi: 10.3390/genes16020220. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Lei L., Burton Z.F. Evolution of the genetic code. Transcription. 2021;12:28–53. doi: 10.1080/21541264.2021.1927652. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Lei L., Burton Z.F. Evolution of Life on Earth: tRNA, Aminoacyl-tRNA Synthetases and the Genetic Code. Life. 2020;10:21. doi: 10.3390/life10030021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Kim Y., Opron K., Burton Z.F. A tRNA-and Anticodon-Centric View of the Evolution of Aminoacyl-tRNA Synthetases, tRNAomes, and the Genetic Code. Life. 2019;9:37. doi: 10.3390/life9020037. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Bernhardt H.S., Patrick W.M. Genetic code evolution started with the incorporation of glycine, followed by other small hydrophilic amino acids. J. Mol. Evol. 2014;78:307–309. doi: 10.1007/s00239-014-9627-y. [DOI] [PubMed] [Google Scholar]
- 20.Bernhardt H.S., Tate W.P. Evidence from glycine transfer RNA of a frozen accident at the dawn of the genetic code. Biol. Direct. 2008;3:53. doi: 10.1186/1745-6150-3-53. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Pak D., Du N., Kim Y., Sun Y., Burton Z.F. Rooted tRNAomes and evolution of the genetic code. Transcription. 2018;9:137–151. doi: 10.1080/21541264.2018.1429837. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Greenwald J., Kwiatkowski W., Riek R. Peptide Amyloids in the Origin of Life. J. Mol. Biol. 2018;430:3735–3750. doi: 10.1016/j.jmb.2018.05.046. [DOI] [PubMed] [Google Scholar]
- 23.Maury C.P.J. Origin of life: Beta-sheet amyloid conformers as the primordial functional polymers on the early Earth and their role in the emergence of complex dynamic networks. FEBS Lett. 2025;599:2693–2705. doi: 10.1002/1873-3468.70112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Xavier J.C., Gerhards R.E., Wimmer J.L.E., Brueckner J., Tria F.D.K., Martin W.F. The metabolic network of the last bacterial common ancestor. Commun. Biol. 2021;4:413. doi: 10.1038/s42003-021-01918-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Weiss M.C., Preiner M., Xavier J.C., Zimorski V., Martin W.F. The last universal common ancestor between ancient Earth chemistry and the onset of genetics. PLoS Genet. 2018;14:e1007518. doi: 10.1371/journal.pgen.1007518. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Martin W.F., Weiss M.C., Neukirchen S., Nelson-Sathi S., Sousa F.L. Physiology, phylogeny, and LUCA. Microb. Cell. 2016;3:582–587. doi: 10.15698/mic2016.12.545. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Prosdocimi F., Farias S.T. Coacervates meet the RNP-world: Liquid-liquid phase separation and the emergence of biological compartmentalization. Biosystems. 2025;252:105480. doi: 10.1016/j.biosystems.2025.105480. [DOI] [PubMed] [Google Scholar]
- 28.Jia T.Z. Primitive membraneless compartments as a window into the earliest cells. Biophys. Rev. 2023;15:1897–1900. doi: 10.1007/s12551-023-01135-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Stefano G.B., Kream R.M. Primordial Biochemicals Within Coacervate-Like Droplets and the Origins of Life. Viruses. 2025;17:146. doi: 10.3390/v17020146. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Lei L., Burton Z.F. A Recipe to Evolve Complex Life Chemically on Earth. Genes. 2025;16:1136. doi: 10.3390/genes16101136. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Lei L., Burton Z.F. The 3 31 Nucleotide Minihelix tRNA Evolution Theorem and the Origin of Life. Life. 2023;13:2224. doi: 10.3390/life13112224. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Martinez-Rodriguez L., Erdogan O., Jimenez-Rodriguez M., Gonzalez-Rivera K., Williams T., Li L., Weinreb V., Collier M., Chandrasekaran S.N., Ambroggio X., et al. Functional Class I and II Amino Acid-activating Enzymes Can Be Coded by Opposite Strands of the Same Gene. J. Biol. Chem. 2015;290:19710–19725. doi: 10.1074/jbc.M115.642876. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Carter C.W., Jr., Li L., Weinreb V., Collier M., Gonzalez-Rivera K., Jimenez-Rodriguez M., Erdogan O., Kuhlman B., Ambroggio X., Williams T., et al. The Rodin-Ohno hypothesis that two enzyme superfamilies descended from one ancestral gene: An unlikely scenario for the origins of translation that will not be dismissed. Biol. Direct. 2014;9:11. doi: 10.1186/1745-6150-9-11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Chandrasekaran S.N., Yardimci G.G., Erdogan O., Roach J., Carter C.W., Jr. Statistical evaluation of the Rodin-Ohno hypothesis: Sense/antisense coding of ancestral class I and II aminoacyl-tRNA synthetases. Mol. Biol. Evol. 2013;30:1588–1604. doi: 10.1093/molbev/mst070. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Rodin A.S., Rodin S.N., Carter C.W., Jr. On primordial sense-antisense coding. J. Mol. Evol. 2009;69:555–567. doi: 10.1007/s00239-009-9288-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Tawfik D.S., Gruic-Sovulj I. How evolution shapes enzyme selectivity—Lessons from aminoacyl-tRNA synthetases and other amino acid utilizing enzymes. FEBS J. 2020;287:1284–1305. doi: 10.1111/febs.15199. [DOI] [PubMed] [Google Scholar]
- 37.Giege R., Springer M. Aminoacyl-tRNA Synthetases in the Bacterial World. EcoSal Plus. 2016;7 doi: 10.1128/ecosalplus.esp-0002-2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Meng E.C., Goddard T.D., Pettersen E.F., Couch G.S., Pearson Z.J., Morris J.H., Ferrin T.E. UCSF ChimeraX: Tools for structure building and analysis. Protein Sci. 2023;32:e4792. doi: 10.1002/pro.4792. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Pettersen E.F., Goddard T.D., Huang C.C., Meng E.C., Couch G.S., Croll T.I., Morris J.H., Ferrin T.E. UCSF ChimeraX: Structure visualization for researchers, educators, and developers. Protein Sci. 2021;30:70–82. doi: 10.1002/pro.3943. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Goddard T.D., Huang C.C., Meng E.C., Pettersen E.F., Couch G.S., Morris J.H., Ferrin T.E. UCSF ChimeraX: Meeting modern challenges in visualization and analysis. Protein Sci. 2018;27:14–25. doi: 10.1002/pro.3235. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Cappannini A., Ray A., Purta E., Mukherjee S., Boccaletto P., Moafinejad S.N., Lechner A., Barchet C., Klaholz B.P., Stefaniak F., et al. MODOMICS: A database of RNA modifications and related information. 2023 update. Nucleic Acids Res. 2024;52:D239–D244. doi: 10.1093/nar/gkad1083. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Wolff P., Villette C., Zumsteg J., Heintz D., Antoine L., Chane-Woon-Ming B., Droogmans L., Grosjean H., Westhof E. Comparative patterns of modified nucleotides in individual tRNA species from a mesophilic and two thermophilic archaea. RNA. 2020;26:1957–1975. doi: 10.1261/rna.077537.120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Muller F., Escobar L., Xu F., Wegrzyn E., Nainyte M., Amatov T., Chan C.Y., Pichler A., Carell T. A prebiotically plausible scenario of an RNA-peptide world. Nature. 2022;605:279–284. doi: 10.1038/s41586-022-04676-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Caetano-Anolles G. The proteomic origin of the genetic code. Expert Rev. Proteom. 2026;23:79–98. doi: 10.1080/14789450.2026.2646677. [DOI] [PubMed] [Google Scholar]
- 45.Di Giulio M. An RNA Ring was Not the Progenitor of the tRNA Molecule. J. Mol. Evol. 2020;88:228–233. doi: 10.1007/s00239-020-09927-3. [DOI] [PubMed] [Google Scholar]
- 46.Demongeot J., Seligmann H. Theoretical minimal RNA rings mimick molecular evolution before tRNA-mediated translation: Codon-amino acid affinities increase from early to late RNA rings. C. R. Biol. 2020;343:111–122. doi: 10.5802/crbiol.1. [DOI] [PubMed] [Google Scholar]
- 47.Demongeot J., Seligmann H. RNA Rings Strengthen Hairpin Accretion Hypotheses for tRNA Evolution: A Reply to Commentaries by Z.F. Burton and M. Di Giulio. J. Mol. Evol. 2020;88:243–252. doi: 10.1007/s00239-020-09929-1. [DOI] [PubMed] [Google Scholar]
- 48.Demongeot J., Seligmann H. The primordial tRNA acceptor stem code from theoretical minimal RNA ring clusters. BMC Genet. 2020;21:7. doi: 10.1186/s12863-020-0812-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Demongeot J., Seligmann H. The Uroboros Theory of Life’s Origin: 22-Nucleotide Theoretical Minimal RNA Rings Reflect Evolution of Genetic Code and tRNA-rRNA Translation Machineries. Acta Biotheor. 2019;67:273–297. doi: 10.1007/s10441-019-09356-w. [DOI] [PubMed] [Google Scholar]
- 50.Di Giulio M. A polyphyletic model for the origin of tRNAs has more support than a monophyletic model. J. Theor. Biol. 2013;318:124–128. doi: 10.1016/j.jtbi.2012.11.012. [DOI] [PubMed] [Google Scholar]
- 51.Di Giulio M. The origin of the tRNA molecule: Independent data favor a specific model of its evolution. Biochimie. 2012;94:1464–1466. doi: 10.1016/j.biochi.2012.01.014. [DOI] [PubMed] [Google Scholar]
- 52.Juhling F., Morl M., Hartmann R.K., Sprinzl M., Stadler P.F., Putz J. tRNAdb 2009: Compilation of tRNA sequences and tRNA genes. Nucleic Acids Res. 2009;37:D159–D162. doi: 10.1093/nar/gkn772. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Chan P.P., Lowe T.M. GtRNAdb 2.0: An expanded database of transfer RNA genes identified in complete and draft genomes. Nucleic Acids Res. 2016;44:D184–D189. doi: 10.1093/nar/gkv1309. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Abe T., Inokuchi H., Yamada Y., Muto A., Iwasaki Y., Ikemura T. tRNADB-CE: tRNA gene database well-timed in the era of big sequence data. Front. Genet. 2014;5:114. doi: 10.3389/fgene.2014.00114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Giege R., Eriani G. The tRNA identity landscape for aminoacylation and beyond. Nucleic Acids Res. 2023;51:1528–1570. doi: 10.1093/nar/gkad007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Qin X., Deng X., Chen L., Xie W. Crystal Structure of the Wild-Type Human GlyRS Bound with tRNA(Gly) in a Productive Conformation. J. Mol. Biol. 2016;428:3603–3614. doi: 10.1016/j.jmb.2016.05.018. [DOI] [PubMed] [Google Scholar]
- 57.Abe T., Ikemura T., Sugahara J., Kanai A., Ohara Y., Uehara H., Kinouchi M., Kanaya S., Yamada Y., Muto A., et al. tRNADB-CE 2011: tRNA gene database curated manually by experts. Nucleic Acids Res. 2011;39:D210–D213. doi: 10.1093/nar/gkq1007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Abe T., Ikemura T., Ohara Y., Uehara H., Kinouchi M., Kanaya S., Yamada Y., Muto A., Inokuchi H. tRNADB-CE: tRNA gene database curated manually by experts. Nucleic Acids Res. 2009;37:D163–D168. doi: 10.1093/nar/gkn692. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Zhang J., Ferre-D’Amare A.R. The tRNA Elbow in Structure, Recognition and Evolution. Life. 2016;6:3. doi: 10.3390/life6010003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Shi H., Moore P.B. The crystal structure of yeast phenylalanine tRNA at 1.93 A resolution: A classic structure revisited. RNA. 2000;6:1091–1105. doi: 10.1017/S1355838200000364. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Lei L., Burton Z.F. “Superwobbling” and tRNA-34 Wobble and tRNA-37 Anticodon Loop Modifications in Evolution and Devolution of the Genetic Code. Life. 2022;12:252. doi: 10.3390/life12020252. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Alkatib S., Scharff L.B., Rogalski M., Fleischmann T.T., Matthes A., Seeger S., Schottler M.A., Ruf S., Bock R. The contributions of wobbling and superwobbling to the reading of the genetic code. PLoS Genet. 2012;8:e1003076. doi: 10.1371/journal.pgen.1003076. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Rogalski M., Karcher D., Bock R. Superwobbling facilitates translation with reduced tRNA sets. Nat. Struct. Mol. Biol. 2008;15:192–198. doi: 10.1038/nsmb.1370. [DOI] [PubMed] [Google Scholar]
- 64.Pak D., Kim Y., Burton Z.F. Aminoacyl-tRNA synthetase evolution and sectoring of the genetic code. Transcription. 2018;9:205–224. doi: 10.1080/21541264.2018.1429837. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Ikehara K. Why Were [GADV]-amino Acids and GNC Codons Selected and How Was GNC Primeval Genetic Code Established? Genes. 2023;14:375. doi: 10.3390/genes14020375. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Ikehara K. Evolutionary Steps in the Emergence of Life Deduced from the Bottom-Up Approach and GADV Hypothesis (Top-Down Approach) Life. 2016;6:6. doi: 10.3390/life6010006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Ikehara K. [GADV]-protein world hypothesis on the origin of life. Orig. Life Evol. Biosph. 2014;44:299–302. doi: 10.1007/s11084-014-9383-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Ikehara K. Possible steps to the emergence of life: The [GADV]-protein world hypothesis. Chem. Rec. 2005;5:107–118. doi: 10.1002/tcr.20037. [DOI] [PubMed] [Google Scholar]
- 69.Fukai S., Nureki O., Sekine S., Shimada A., Vassylyev D.G., Yokoyama S. Mechanism of molecular interactions for tRNA(Val) recognition by valyl-tRNA synthetase. RNA. 2003;9:100–111. doi: 10.1261/rna.2760703. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Sordyl D., Boileau E., Bernat A., Maiti S., Mukherjee S., Moafinejad S.N., Farsani M.A., Shavina A., Cappannini A., Agostini G., et al. MODOMICS: A database of RNA modifications and related information. 2025 update and 20th anniversary. Nucleic Acids Res. 2025;54:D219–D225. doi: 10.1093/nar/gkaf1284. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Silvian L.F., Wang J., Steitz T.A. Insights into editing from an ile-tRNA synthetase structure with tRNAile and mupirocin. Science. 1999;285:1074–1077. doi: 10.1126/science.285.5430.1074. [DOI] [PubMed] [Google Scholar]
- 72.Nakanishi K., Ogiso Y., Nakama T., Fukai S., Nureki O. Structural basis for anticodon recognition by methionyl-tRNA synthetase. Nat. Struct. Mol. Biol. 2005;12:931–932. doi: 10.1038/nsmb988. [DOI] [PubMed] [Google Scholar]
- 73.Fukunaga R., Yokoyama S. Aminoacylation complex structures of leucyl-tRNA synthetase and tRNALeu reveal two modes of discriminator-base recognition. Nat. Struct. Mol. Biol. 2005;12:915–922. doi: 10.1038/nsmb985. [DOI] [PubMed] [Google Scholar]
- 74.Fukunaga R., Yokoyama S. Crystal structure of leucyl-tRNA synthetase from the archaeon Pyrococcus horikoshii reveals a novel editing domain orientation. J. Mol. Biol. 2005;346:57–71. doi: 10.1016/j.jmb.2004.11.060. [DOI] [PubMed] [Google Scholar]
- 75.Throll P., Dolce L.G., Rico-Lastres P., Arnold K., Tengo L., Basu S., Kaiser S., Schneider R., Kowalinski E. Structural basis of tRNA recognition by the m(3)C RNA methyltransferase METTL6 in complex with SerRS seryl-tRNA synthetase. Nat. Struct. Mol. Biol. 2024;31:1614–1624. doi: 10.1038/s41594-024-01341-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Delagoutte B., Keith G., Moras D., Cavarelli J. Crystallization and preliminary X-ray crystallographic analysis of yeast arginyl-tRNA synthetase-yeast tRNAArg complexes. Acta Crystallogr. D Biol. Crystallogr. 2000;56:492–494. doi: 10.1107/S0907444900001700. [DOI] [PubMed] [Google Scholar]
- 77.Longo L.M., Despotovic D., Weil-Ktorza O., Walker M.J., Jablonska J., Fridmann-Sirkis Y., Varani G., Metanis N., Tawfik D.S. Primordial emergence of a nucleic acid-binding protein via phase separation and statistical ornithine-to-arginine conversion. Proc. Natl. Acad. Sci. USA. 2020;117:15731–15739. doi: 10.1073/pnas.2001989117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Shi W., Yoshida A., Kosono S., Nishiyama M. Evolution of lysine and arginine biosynthesis revealed by substrate specificity of lysine biosynthetic enzymes in Thermus thermophilus. FEBS J. 2025;293:1727–1740. doi: 10.1111/febs.70274. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Hashim M., Alam I., Ahmad M., Badruddeen, Akhtar J., Khan M.I., Islam A., Parveen S. Comprehensive Review of L-Lysine: Chemistry, Occurrence, and Physiological Roles. Curr. Protein Pept. Sci. 2025;27:150–162. doi: 10.2174/0113892037381647250526073248. [DOI] [PubMed] [Google Scholar]
- 80.Wu Y., Zhang J., Wang B., Zhang Y., Li H., Liu Y., Yin J., He D., Luo H., Gan F., et al. Dissecting the Arginine and Lysine Biosynthetic Pathways and Their Relationship in Haloarchaeon Natrinema gari J7-2 via Endogenous CRISPR-Cas System-Based Genome Editing. Microbiol. Spectr. 2023;11:e0028823. doi: 10.1128/spectrum.00288-23. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Hauenstein S., Zhang C.M., Hou Y.M., Perona J.J. Shape-selective RNA recognition by cysteinyl-tRNA synthetase. Nat. Struct. Mol. Biol. 2004;11:1134–1141. doi: 10.1038/nsmb849. [DOI] [PubMed] [Google Scholar]
- 82.Mukai T., Crnkovic A., Umehara T., Ivanova N.N., Kyrpides N.C., Soll D. RNA-Dependent Cysteine Biosynthesis in Bacteria and Archaea. mBio. 2017;8:10–1128. doi: 10.1128/mBio.00561-17. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Sankaranarayanan R., Dock-Bregeon A.C., Romby P., Caillet J., Springer M., Rees B., Ehresmann C., Ehresmann B., Moras D. The structure of threonyl-tRNA synthetase-tRNA(Thr) complex enlightens its repressor activity and reveals an essential zinc ion in the active site. Cell. 1999;97:371–381. doi: 10.1016/S0092-8674(00)80746-1. [DOI] [PubMed] [Google Scholar]
- 84.Yaremchuk A., Cusack S., Tukalo M. Crystal structure of a eukaryote/archaeon-like protyl-tRNA synthetase and its complex with tRNAPro(CGG) EMBO J. 2000;19:4745–4758. doi: 10.1093/emboj/19.17.4745. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85.Cavarelli J., Eriani G., Rees B., Ruff M., Boeglin M., Mitschler A., Martin F., Gangloff J., Thierry J.C., Moras D. The active site of yeast aspartyl-tRNA synthetase: Structural and functional aspects of the aminoacylation reaction. EMBO J. 1994;13:327–337. doi: 10.1002/j.1460-2075.1994.tb06265.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86.Blaise M., Bailly M., Frechin M., Behrens M.A., Fischer F., Oliveira C.L., Becker H.D., Pedersen J.S., Thirup S., Kern D. Crystal structure of a transfer-ribonucleoprotein particle that promotes asparagine formation. EMBO J. 2010;29:3118–3129. doi: 10.1038/emboj.2010.192. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87.Rampias T., Sheppard K., Soll D. The archaeal transamidosome for RNA-dependent glutamine biosynthesis. Nucleic Acids Res. 2010;38:5774–5783. doi: 10.1093/nar/gkq336. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 88.Tian Q., Wang C., Liu Y., Xie W. Structural basis for recognition of G-1-containing tRNA by histidyl-tRNA synthetase. Nucleic Acids Res. 2015;43:2980–2990. doi: 10.1093/nar/gkv129. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 89.Di Giulio M. The phylogenetic distribution of the glutaminyl-tRNA synthetase and Glu-tRNA(Gln) amidotransferase in the fundamental lineages would imply that the ancestor of archaea, that of eukaryotes and LUCA were progenotes. Biosystems. 2020;196:104174. doi: 10.1016/j.biosystems.2020.104174. [DOI] [PubMed] [Google Scholar]
- 90.Raczniak G., Becker H.D., Min B., Soll D. A single amidotransferase forms asparaginyl-tRNA and glutaminyl-tRNA in Chlamydia trachomatis. J. Biol. Chem. 2001;276:45862–45867. doi: 10.1074/jbc.M109494200. [DOI] [PubMed] [Google Scholar]
- 91.Salazar J.C., Zuniga R., Raczniak G., Becker H., Soll D., Orellana O. A dual-specific Glu-tRNA(Gln) and Asp-tRNA(Asn) amidotransferase is involved in decoding glutamine and asparagine codons in Acidithiobacillus ferrooxidans. FEBS Lett. 2001;500:129–131. doi: 10.1016/S0014-5793(01)02600-X. [DOI] [PubMed] [Google Scholar]
- 92.Sekine S., Nureki O., Dubois D.Y., Bernier S., Chenevert R., Lapointe J., Vassylyev D.G., Yokoyama S. ATP binding by glutamyl-tRNA synthetase is switched to the productive mode by tRNA binding. EMBO J. 2003;22:676–688. doi: 10.1093/emboj/cdg053. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 93.Naganuma M., Sekine S., Chong Y.E., Guo M., Yang X.L., Gamper H., Hou Y.M., Schimmel P., Yokoyama S. The selective tRNA aminoacylation mechanism based on a single G*U pair. Nature. 2014;510:507–511. doi: 10.1038/nature13440. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 94.Fukunaga R., Yokoyama S. Structure of the AlaX-M trans-editing enzyme from Pyrococcus horikoshii. Acta Crystallogr. D Biol. Crystallogr. 2007;63:390–400. doi: 10.1107/S090744490605640X. [DOI] [PubMed] [Google Scholar]
- 95.Fournier G.P., Alm E.J. Ancestral Reconstruction of a Pre-LUCA Aminoacyl-tRNA Synthetase Ancestor Supports the Late Addition of Trp to the Genetic Code. J. Mol. Evol. 2015;80:171–185. doi: 10.1007/s00239-015-9672-1. [DOI] [PubMed] [Google Scholar]
- 96.Moor N., Kotik-Kogan O., Tworowski D., Sukhanova M., Safro M. The crystal structure of the ternary complex of phenylalanyl-tRNA synthetase with tRNAPhe and a phenylalanyl-adenylate analogue reveals a conformational switch of the CCA end. Biochemistry. 2006;45:10572–10583. doi: 10.1021/bi060491l. [DOI] [PubMed] [Google Scholar]
- 97.Goldgur Y., Mosyak L., Reshetnikova L., Ankilova V., Lavrik O., Khodyreva S., Safro M. The crystal structure of phenylalanyl-tRNA synthetase from thermus thermophilus complexed with cognate tRNAPhe. Structure. 1997;5:59–68. doi: 10.1016/S0969-2126(97)00166-4. [DOI] [PubMed] [Google Scholar]
- 98.Kobayashi T., Nureki O., Ishitani R., Yaremchuk A., Tukalo M., Cusack S., Sakamoto K., Yokoyama S. Structural basis for orthogonal tRNA specificities of tyrosyl-tRNA synthetases for genetic code expansion. Nat. Struct. Biol. 2003;10:425–432. doi: 10.1038/nsb934. [DOI] [PubMed] [Google Scholar]
- 99.Shen N., Guo L., Yang B., Jin Y., Ding J. Structure of human tryptophanyl-tRNA synthetase in complex with tRNATrp reveals the molecular basis of tRNA recognition and specificity. Nucleic Acids Res. 2006;34:3246–3258. doi: 10.1093/nar/gkl441. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 100.Wehbi S., Wheeler A., Morel B., Manepalli N., Minh B.Q., Lauretta D.S., Masel J. Order of amino acid recruitment into the genetic code resolved by last universal common ancestor’s protein domains. Proc. Natl. Acad. Sci. USA. 2024;121:e2410311121. doi: 10.1073/pnas.2410311121. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 101.Sun F.J., Caetano-Anolles G. Transfer RNA and the origins of diversified life. Sci. Prog. 2008;91:265–284. doi: 10.3184/003685008X360650. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 102.Hauenstein S.I., Perona J.J. Redundant synthesis of cysteinyl-tRNACys in Methanosarcina mazei. J. Biol. Chem. 2008;283:22007–22017. doi: 10.1074/jbc.M801839200. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 103.Tumbula-Hansen D., Feng L., Toogood H., Stetter K.O., Soll D. Evolutionary divergence of the archaeal aspartyl-tRNA synthetases into discriminating and nondiscriminating forms. J. Biol. Chem. 2002;277:37184–37190. doi: 10.1074/jbc.M204767200. [DOI] [PubMed] [Google Scholar]
- 104.Feng L., Stathopoulos C., Ahel I., Mitra A., Tumbula-Hansen D., Hartsch T., Soll D. Aminoacyl-tRNA formation in the extreme thermophile Thermus thermophilus. Extremophiles. 2002;6:167–174. doi: 10.1007/s007920100245. [DOI] [PubMed] [Google Scholar]
- 105.Ikehara K. Pseudo-replication of [GADV]-proteins and origin of life. Int. J. Mol. Sci. 2009;10:1525–1537. doi: 10.3390/ijms10041525. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 106.Oba T., Fukushima J., Maruyama M., Iwamoto R., Ikehara K. Catalytic activities of [GADV]-peptides. Formation and establishment of [GADV]-protein world for the emergence of life. Orig. Life Evol. Biosph. 2005;35:447–460. doi: 10.1007/s11084-005-3519-5. [DOI] [PubMed] [Google Scholar]
- 107.Liras P., Martin J.F. Interconnected Set of Enzymes Provide Lysine Biosynthetic Intermediates and Ornithine Derivatives as Key Precursors for the Biosynthesis of Bioactive Secondary Metabolites. Antibiotics. 2023;12:159. doi: 10.3390/antibiotics12010159. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 108.Fazius F., Zaehle C., Brock M. Lysine biosynthesis in microbes: Relevance as drug target and prospects for beta-lactam antibiotics production. Appl. Microbiol. Biotechnol. 2013;97:3763–3772. doi: 10.1007/s00253-013-4805-1. [DOI] [PubMed] [Google Scholar]
- 109.Ouchi T., Tomita T., Horie A., Yoshida A., Takahashi K., Nishida H., Lassak K., Taka H., Mineki R., Fujimura T., et al. Lysine and arginine biosyntheses mediated by a common carrier protein in Sulfolobus. Nat. Chem. Biol. 2013;9:277–283. doi: 10.1038/nchembio.1200. [DOI] [PubMed] [Google Scholar]
- 110.Nishida H., Nishiyama M. Evolution of lysine biosynthesis in the phylum deinococcus-thermus. Int. J. Evol. Biol. 2012;2012:745931. doi: 10.1155/2012/745931. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 111.Miyazaki J., Kobashi N., Nishiyama M., Yamane H. Functional and evolutionary relationship between arginine biosynthesis and prokaryotic lysine biosynthesis through alpha-aminoadipate. J. Bacteriol. 2001;183:5067–5073. doi: 10.1128/JB.183.17.5067-5073.2001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 112.Kosuge T., Hoshino T. Lysine is synthesized through the alpha-aminoadipate pathway in Thermus thermophilus. FEMS Microbiol. Lett. 1998;169:361–367. doi: 10.1111/j.1574-6968.1998.tb13341.x. [DOI] [PubMed] [Google Scholar]
- 113.Burroughs A.M., Aravind L. The Origin and Evolution of Release Factors: Implications for Translation Termination, Ribosome Rescue, and Quality Control Pathways. Int. J. Mol. Sci. 2019;20:1981. doi: 10.3390/ijms20081981. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 114.Wolf Y.I., Aravind L., Grishin N.V., Koonin E.V. Evolution of aminoacyl-tRNA synthetases—Analysis of unique domain architectures and phylogenetic trees reveals a complex history of horizontal gene transfer events. Genome Res. 1999;9:689–710. doi: 10.1101/gr.9.8.689. [DOI] [PubMed] [Google Scholar]
- 115.Kaiser F., Krautwurst S., Salentin S., Haupt V.J., Leberecht C., Bittrich S., Labudde D., Schroeder M. The structural basis of the genetic code: Amino acid recognition by aminoacyl-tRNA synthetases. Sci. Rep. 2020;10:12647. doi: 10.1038/s41598-020-69100-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 116.O’Donoghue P., Luthey-Schulten Z. On the evolution of structure in aminoacyl-tRNA synthetases. Microbiol. Mol. Biol. Rev. 2003;67:550–573. doi: 10.1128/MMBR.67.4.550-573.2003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 117.Tennakoon R., Cui H. Aminoacyl-tRNA synthetases. Curr. Biol. 2024;34:R884–R888. doi: 10.1016/j.cub.2024.08.029. [DOI] [PubMed] [Google Scholar]
- 118.Giege R. The early history of tRNA recognition by aminoacyl-tRNA synthetases. J. Biosci. 2006;31:477–488. doi: 10.1007/BF02705187. [DOI] [PubMed] [Google Scholar]
- 119.Kelley L.A., Mezulis S., Yates C.M., Wass M.N., Sternberg M.J. The Phyre2 web portal for protein modeling, prediction and analysis. Nat. Protoc. 2015;10:845–858. doi: 10.1038/nprot.2015.053. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 120.Burton Z.F. The 3-Minihelix tRNA Evolution Theorem. J. Mol. Evol. 2020;88:234–242. doi: 10.1007/s00239-020-09928-2. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
No new data were created or analyzed in this study.









































