Skip to main content
Nucleic Acids Research logoLink to Nucleic Acids Research
. 2024 Nov 4;52(21):13305–13324. doi: 10.1093/nar/gkae992

A genomic database furnishes minimal functional glycyl-tRNA synthetases homologous to other, designed class II urzymes

Sourav Kumar Patra 1, Jordan Douglas 2,3, Peter R Wills 4, Laurie Betts 5, Tang Guo Qing 6, Charles W Carter Jr 7,
PMCID: PMC11602164  PMID: 39494520

Abstract

The hypothesis that conserved core catalytic sites could represent ancestral aminoacyl-tRNA synthetases (AARS) drove the design of functional TrpRS, LeuRS, and HisRS ‘urzymes’. We describe here new urzymes detected in the genomic record of the arctic fox, Vulpes lagopus. They are homologous to the α-subunit of bacterial heterotetrameric Class II glycyl-tRNA synthetase (GlyRS-B) enzymes. AlphaFold2 predicted that the N-terminal 81 amino acids would adopt a 3D structure nearly identical to our designed HisRS urzyme (HisCA1). We expressed and purified that N-terminal segment and the spliced open reading frame GlyCA1–2. Both exhibit robust single-turnover burst sizes and ATP consumption rates higher than those previously published for HisCA urzymes and comparable to those for LeuAC and TrpAC. GlyCA is more than twice as active in glycine activation by adenosine triphosphate as the full-length GlyRS-B α2 dimer. Michaelis–Menten rate constants for all three substrates reveal significant coupling between Exon2 and both substrates. GlyCA activation favors Class II amino acids that complement those favored by HisCA and LeuAC. Structural features help explain these results. These minimalist GlyRS catalysts are thus homologous to previously described urzymes. Their properties reinforce the notion that urzymes may have the requisite catalytic activities to implement a reduced, ancestral genetic coding alphabet.

Graphical Abstract

Graphical Abstract.

Graphical Abstract

Introduction

Aminoacyl-tRNA synthetases (AARS), are nanomachines that translate genes into proteins (1). AARS and their cognate tRNAs store all of the stereochemical information necessary to convert the codon strings in genes into the alphabet of proteins. AARS function like computational AND gates. They synthesize code-specific aminoacyl-tRNA molecules if and only if they bind simultaneously to the correct amino acid and the correct tRNA (2). In that sense, they are ‘code-keys’ kept by Nature's locksmith. Nature selected them on the basis of their catalysis and specificity. We need experimental models for ancestral AARS•tRNA cognate pairs in order to measure these aspects of their enzymology. Only then can we propose and compare evolutionary routes to the coding table. That evolutionary history, in turn, is key to how AARS first learned to enforce the coding rules by which they assembled themselves. That reflexivity is the main challenge in working out how genetic coding began (3–5).

AARS urzymes (6–12) and protozymes (13) are the most extensively studied such models. The former are minimal excerpts of AARS catalytic domains that retain the full range of their catalytic properties, albeit at diminished rates and with substantial substrate promiscuity. The latter are a surprisingly active subset of the urzymes that accelerate amino acid activation by roughly a million-fold, despite having only 46 amino acid residues. Ribozymes have been described that can either activate amino acids (13,14), or acylate tRNAs with pre-activated amino acids (15,16). Unlike these ribozymes, AARS urzymes catalyze both amino acid activation and tRNA aminoacylation. Their catalytic rate accelerations exceed those of ribozymes by orders of magnitude. These properties, and the sequence conservation shared by all 10 from each Class, discussed next, give AARS urzymes special biological relevance as models for ancestral AARS.

The AARS also exist in two Classes that discriminate appropriately between amino acids from their own class (2,9,17,18). Class I active-site domains are based on the Rossmann dinucleotide binding fold (19) with parallel β-structure. In contrast, Class II AARS are built around an extended array of antiparallel β-strands. Thus, the two Classes are structurally unrelated. Nonetheless, statistical studies of the coding sequences by Rodin and Ohno (20,21) showed that the DNA sequences of Class-I defining HIGH and KMSKS active-site signatures had extraordinarily unlikely base pairing in antiparallel alignments with the Class-II defining motifs 2 and 1, respectively. They proposed on that basis that ancestral forms of the two AARS Classes originated on opposite strands of the same gene. We note that the inverted order of the complementary Class-defining signatures also implies an antiparallel alignment of the genes for the two Classes.

Previously studied urzymes were all engineered for the purpose of testing the Rodin-Ohno (20) hypothesis of bidirectional ancestry for Class I and II AARS. We constructed them by 3D superposition of known crystal structures, eliminating segments that were not highly conserved across the entire superfamily (12) and fusing the disparate pieces together with single peptide bonds. Thus, they were driven by hypothesis, not by observations from natural sources.

We recently identified a sequence in GenBank whose 3D structure predicted by AlphaFold2 (22) closely resembles a Class II urzyme. This curious putative protein is a truncation of the α chain of the bacterial ‘orphan’ glycyl-tRNA synthetase (GlyRS-B) (23). It is purported to reside in an Arctic Fox (Vulpes lagopus) genome. However, it is more likely a bacterial contaminant. Only the gene for the α-subunit is present.

GlyRS-B is an α2β2 tetrameric Class II AARS found in most bacteria (1,24) and is most closely related to AlaRS (25,26). It is distinct from the homodimeric GlyRS-A found in archaea, eukaryotes and some bacteria (3,27). GlyRS-B usually occurs as an α2β2 heterotetramer, with glycine activation being performed by the short α chains. Interestingly, in some bacteria and chloroplasts GlyRS-B is expressed as a fusion of the two chains (αβ)2 and the protein operates as a homodimer (28). tRNA recognition by GlyRS has been characterized structurally for both oligomerization modes α2β2 (29,30) and (αβ)2 (31). Only the α-chains resemble other Class II AARS in any way as they alone contain the three Class-defining motifs, 1–3 defined by Eriani (32). The α-subunit of the eubacterial human pathogen Campylobacter jejuni has been crystallized as an α2 dimer and characterized by steady-state kinetics (33).

The GlyRS-B α gene is from a distinct clade of the AARS phylogeny (3) compared with our previous urzymes, which were from (Class II) HisRS (10) and (Class I) LeuRS (7,34) and TrpRS (11,12). The former fragment contains only motifs 1 and 2, previously shown sufficient for the HisRS1 and 2 urzymes (9,10) to aminoacylate tRNAHis.

For consistency with our recent nomenclature for Class I and II urzymes (7), we will henceforth refer to the first 81 amino acids of ORF1 as GlyCA because it contains the coded information for Motifs 1 and 2 in the reverse order to those for the HIGH and KMSKS motifs in TrpAC and LeuAC (see Supplementary Figure S1A). There is no convenient corresponding designation for the product of splicing suggested by the adventitious database annotation, so we will refer to it as GlyCA1–2 to indicate the implied spliced gene containing exon 2. GlyCA1–2 differs from the intact α-gene because it is missing much of the inserted antiparallel β-insertion domain as well as motif-3 (See Figure 1 and Section 3.1 for details). We retain the term ‘Exon’ when referring explicitly to its contribution to various rates and stoichiometry.

Figure 1.

Figure 1.

Architecture of the ORFs from the V. lagopus GlyRS-B α-subunit. A. Schematic of unique secondary structures in heterotetrameric α2β2 GlyRS-B α-subunits. This figure shows a multiple structure alignment between the GlyCA AlphaFold structure and the alpha chains of three solved GlyRS-B structures (PDB codes indicated). Alpha helices are denoted by cylinders, beta strands by arrows, and white space denotes a gap in the alignment. The alignment was generated by 3DCOMB (48). Primary structures of motifs 1, 2 and 3 that characterize Class II AARSs are shown above. The C-terminal alpha helix bundle is unique to GlyRS-B synthetases and forms the dimer interface. B. Two inserted purine bases (vertical arrows) create frameshifting (tan) and an internal stop codon that produce two ORFs. ORF1 ends with a stop codon C-terminal to the red frameshifted sequence. These differences change the modularity of the V. lagopus GlyRS α-chain. C. AlphaFold2 prediction for the 3D structures of both ORFs. Modules formed by the residue numbers indicated are exploded around the structure of ORFs 1,2. Colors are those used in B. AlphaFold2 prediction matches closely that observed in PDB ID 7YSE. That allows visualization of likely binding geometries for glycine-5′ sulfoamyl adenylate (GSA, spheres) and the 3′-terminal adenosine of tRNAGly (A76, dots). The annotated intron coding sequence is frame-shifted and corresponds to residues 133–210 (Green) at the lower right. Motif 3 and an internal helix-turn-helix motif that covers the terminal adenosine in HisCA2 (10) are thus missing in GlyCA. Residues 88–119 were omitted from exon 1 leaving it with only 81 residues (Created using PYMOL (49)).

We compare GlyCA and GlyCA1–2 in detail in this report. The comparison is of special interest because it further confirms the generality of our previous work. In this article, we demonstrate that the GlyCA urzyme shows aminoacylation activity. Much like our previous urzymes, the GlyCA is highly promiscuous, favoring amino acids activated by its own AARS Class (Class II).

These constructs are urzymes for the fourth amino acid to be characterized. The four represent a balanced sample of Classes I and II AARS. Their properties argue that they do resemble the ancestral assignment catalysts that enabled nature to learn to read the genetic coding language. Both sets have extensive structural homology. All exhibit nearly the same amino acid activation and aminoacylation rates. Moreover, they have complementary amino acid specificities. These extensive similarities suggest that similar minimalist catalysts can be constructed readily from the remaining 16 AARS, and that they will behave in comparable ways to those we have already described.

Materials and methods

The GlyCA urzyme sequence

The GlyCA sequence was extracted from GenBank: gene number 121489478 (Supplementary Figure S1; (35). We expressed both residues 6–87 of ORF1 and the fusion of ORFs1 and 2. Both have comparable activities; although, as noted below, the presence of Exon2 has numerous subtle influences on the relative activities of glycine activation and interactions with tRNAGly.

Cell lysis and purification of GlyCA and GlyCA1–2.

Plasmids pMAL-c2X containing the GlyCA and GlyCA1–2 genes were synthesized by Twist Bioscience and expressed in BL21 (DE3)pLysS competent cells (Promega). Cells were grown at 37°C and early log phase cells induced with 300 μM IPTG for overnight. Harvested cells were resuspended in a buffer containing 20 mM Tris, pH 7.4, 1 mM EDTA, 5 mM β-ME, 17.5% Glycerol, 0.1% NP40, 33 mM (NH4)2SO4, 1.25% Glycine, 300 mM Guanidine Hydrochloride plus cOmplete protease inhibitor (Roche). The cell suspension was lysed using a glass homogenizer followed by sonication (with 8 pulses of 10 second with 70% amplitude sonic vibrations) keeping 20 s of pause time, ensuring the tube remained in ice during sonication. GlyCA crude extract was then pelleted at 4°C with centrifugation of 15K rpm for 30 min to remove insoluble material. The extract supernatant was then diluted 1:4 with lysis buffer and loaded onto equilibrated Amylose FF resin (Cytiva). The resin was washed with five column volumes of buffer and the protein was eluted with 30 mM maltose in optimal buffer. The purified fraction was then dialyzed overnight with 50 mM HEPES buffer containing 1 mM EDTA, 5 mM β-mercaptoethanol, 17.5% glycerol. After dialysis, fractions containing protein were pooled together, concentrated, and mixed with 50% glycerol and stored at −80 o C. Protein concentrations were determined using the Pierce™ Detergent-Compatible Bradford Assay Kit (Thermo Scientific) (7).

We used Amylose affinity column only but not the combination of Amylose and Nickel-NTA affinity chromatography to avoid losses of the active fractions. Amylose column gives much more pure samples than Ni-NTA columns. We also found single band activity in zymography gels which suggested that no other interfering proteins were present in our samples. We continue to explore the combined use of the two affinity methods in ongoing work aimed at structural studies.

Label free proteomics; LC-ESI-MS/MS analysis

The protein lysates were first lyophilized and concentrated then dissolved in 50 mM sodium acetate solution. Urea (8M) was added to the 150 μg in-solution protein sample, then reduced with 5 mM DTT for 30 min and alkylated with 15 mM iodoacetamide for 45 min. The samples were diluted to 1M urea, then digested with MS grade trypsin (Promega) at 37°C overnight. Peptides were desalted with peptide desalting spin columns (Thermo) and dried via vacuum centrifugation. Each sample was analyzed in duplicate by LC-MS/MS using Easy nLC 1200 coupled to a QExactive HF (Thermo Scientific). Data analysis was done by Proteome Discoverer version 2.5 (Thermo Scientific).

The instrument used an Easy Spray PepMap C18 column (Thermo Scientific) and separated over a 120 min method. The gradient for separation was from 5 to 36% mobile phase B at a 250 nl/min flow rate, where mobile phase A was 0.1% formic acid in water and mobile phase B consisted of 80% acetonitrile, 0.1% formic acid. The QExactive HF was operated in data-dependent mode where the 15 most intense precursors were selected for subsequent HCD fragmentation. Resolution for the precursor scan (m/z 350–1700) was set to 60 000 with a target value of 3 × 106 ions, 100 ms inject time. MS/MS scans resolution was set to 15000 with a target value of 1 × 105 ions, 75ms inject time. The normalized collision energy was set to 27% for HCD, with an isolation window of 1.6 m/z. Peptide match was set to preferred, and precursors with unknown charge or a charge state of 1 and ≥ 8 were excluded.

Peptides detected by LC-ESI-MS/MS were matched with the Escherichia coli database downloaded from Uniprot in FASTA format along with the GlyCA protein sequence. The following parameters were used to identify tryptic peptides: 10 ppm precursor ion mass tolerance; 0.02 Da product ion mass tolerance; up to two missed trypsin cleavage sites; (C) carbamidomethylation was set as a fixed modification; (M) oxidation was set as a variable modification. Peptide false discovery rates (FDR) were calculated by the percolator node using a decoy database search and peptides were filtered using a 1% FDR cutoff (36).

Single turnover active-site titration assay of GlyCA

Active-site titration (AST) assays were performed using the same principle as described in Francklyn et al. and Fersht et al. (37,38) with slight modifications. Briefly, 3 μM of GlyCA or GlyCA1–2 protein was added to a reaction mix containing 50 mM HEPES, pH 7.5, 10 mM MgCl2, 5 μM adenosine triphosphate (ATP), 50 mM Glycine, 1 mM DTT, 0.005 units of inorganic pyrophosphatase and ∼5000 cpm α- labeled [32P] ATP to start the reaction. A volume of 2 μL for all representative timepoints was added to separate tubes containing 4 μL quenching buffer (0.4 M sodium acetate, 0.1% sodiumdodecyl sulphate (SDS) and kept on ice until all time points had been collected. Around 3 μL of quenched samples were spotted on pre-run (PEI) thin-layer chromatography (TLC) plates and run in TLC running buffer containing 850 mM Tris, pH 8.0. The plate was then dried and exposed for varying amounts of time to a phosphor image screen and visualized with a Typhoon Scanner (Cytiva). The intensities of each nucleotide were quantified by densitometry scanning using measure functions of ImageJ (39). The time-dependences of loss (ATP) or de novo appearance (ADP,) of the three adenine nucleotide phosphates were fitted using the nonlinear regression module of JMP™ Pro to Equation (1):

graphic file with name M0001.gif (1)

(37) where kchem is the first-order rate constant, k3 is the turnover rate, A is the amplitude of the first-order process and C is an offset.

Zymography

To visualize amino acid activation in a native polyacrylamide (PAGE) gel chromogenically, zymography was done using a 1.5 mm thick native gel of 8% resolving and 5% stacking, which is devoid of SDS. Protein samples were prepared without adding SDS and β-ME to maintain the native conformation of protein (40). Prepared protein (50 μg) was then loaded into two separate wells of native gel. Electrophoresis was done with 40 mA steady current at 4°C and carefully observed. When the dye front reached the bottom of the gel, the gel was electrophoresed further for 30 min. The gel was removed from the glass plates and placed in a glass box. Then the gel was washed with double distilled water for 5 min in shaking condition, and the step was repeated for two times.

After the washing step, substrate reaction buffer containing 50 mM HEPES of pH 7.5, 100 mM glycine, 20 mM MgCl2, 50 mM KCl, pyrophosphatase solution [NEB] (0.1Unit/ml) and polyethylene glycol (PEG-8000; Sigma-Aldrich, Cat. No. 25322–68–3) into the mixture to a final concentration of 5% (w/v) was poured into the glass box containing the gel and keep the setup on a shaker for 45 min at 4°C. This step ensures complete soaking of substrate mixture into the gel; the low temperature reduces inactivation of the enzyme during the shaken perfusion.

After the perfusion most of the solution was decanted, leaving minimal solution in a static condition at 37°C. Amino acid activation was activated by adding 5 mM ATP solution dropwise onto the gel surface to cover the whole gel. The gel was incubated in this condition for 30 min. After decanting the reaction mix from the gel box, the staining solution (0.05% Malachite Green (MG) in 0.1 N HCl and 5% hexa-ammonium heptamolybdate tetrahydrate solution in 4 N HCl) was added directly onto the gel box. Staining was promoted by shaking for 2 min on a gyro-shaker.

The staining solution was made as described in Onodera et al. (14). The gradual development of green bands (620 nm) around GlyCA protein present in the gel will signify the phosphomolybdate-MG complex formation and thus the in situ activity of amino acid activation by GlyCA. The gel was photographed using Gel Doc™ XR + from BIO RAD imaging machine.

General considerations arising from enzymological analysis of urzymes

AARS urzymes are generally about 100 000 times less active than full-length AARS. That difference has important consequences for the conduct of activity assays. Product formation is proportional to the product of (Enz) and time. Statistically significant measurements of product formation thus require similar increases either in the former, the latter, or in the sensitivity of the assay. In practice, we have typically used increases in both (Enz) and time. This problem is most severe with aminoacylation assays of the two constructs described here. It is compounded by the low percentage of aminacylability of tRNAGly. The concentration dependence of aminoacylation requires high enough enzyme to substrate ratios that we cannot accurately measure initial rates of aminoacylation. This is the case for the two GlyRS urzyme constructs and accounts for the low signal to noise of the resulting steady-state parameters, which are thus somewhat qualitative.

This problem is not unique to these studies. It arises at the frontier of what is possible. An especially salient example is the difficulty of measuring the high affinity of GTPase enzymes for both guanine nucleotides, which took decades to resolve satisfactorily (41). The early, good-faith efforts to characterize their enzymology nevertheless allowed development of coherent, essentially correct models for their role in signaling (42).

We have addressed this problem in several ways. First, we assume that the resulting systematic errors are similar and thus have less impact on the relative values for different constructs. Second, we take care to find the best compromise between limiting experimental noise by using longer assay times, increased specific radioactivity of substrates and increased enzyme concentrations necessary to produce sufficient signal to noise. Third, we rely on the variance of replicated data to assess significance. Finally, regression methods allow us to identify consistent, physically plausible ‘predictors" responsible for significant differences between multiple measurements for different factorial combinations.

Michaelis–Menten kinetics of amino acid activation

Michaelis–Menten kinetics of amino acid activation assays for individual amino acids were done using a reaction mixture of 50mM HEPES of pH 7.0, 20 mM MgCl2, 50 mM KCl, 0.5 mM ATP, 0.1 Unit of inorganic pyrophosphatase [NEB] and 7 μM of either GlyCA or GlyCA1–2 enzyme and varied concentrations of a particular amino acid in separate tubes in a total reaction volume of 100 μL for each tube. The reaction was carried out at 37°C for 1 h along with an enzyme blank in a separate tube. After this, 400 μL of MG-ammonium molybdate solution was added to the reaction tubes and kept for 5 min after mixing properly to develop the phosphomolybdate complex. Around 40 uL of sodium citrate (w/v) were then added to the tubes, then solutions were allowed to stand for 20 min and the optical absorbance at 620 nm was measured with DU800 spectrophotometer (Beckman Coulter Inc., Brea, CA, USA). The MG-ammonium molybdate solution was prepared as described by Onodera et. al. (14). The phosphoric acid concentration was calculated from a calibration curve prepared by using 0–250 μM K2HPO4, which indicates the relationship between phosphoric acid concentration and absorbance. The specific activity of the GlyCA for individual amino acid concentrations were calculated and plotted in a specialized non-linear fit using the modified Michaelis–Menten equation introduced by Johnson (43) (Equation (2)) and implemented in JMPTM Pro software. The KM and kcat of GlyCA for individual amino acids were determined, with standard deviations, from the maximum likelihood fit.

graphic file with name M0001a.gif (2)

where Ksp = kcat/KM.

Michaelis–Menten kinetics of tRNAGly aminoacylation

A gene with the oligonucleotide sequence of tRNAGly of S. alactolyticus was synthesized by Integrated DNA Technologies and used as template for in vitro transcription using commercially available T7 RNA polymerase (HiScribe® T7 High Yield RNA Synthesis Kit, NEB). Following a 4-h transcription at 37°C. The tRNA was extracted and purified by phenol chloroform isoamyl alcohol extraction, filter concentrated, aliquoted and stored at –20°C. The tRNA was labeled by using alpha P32 labelled ATP and CCA adding enzymes as described by Hobson et al. (7). Aminoacylations were performed in 50 mM HEPES, pH 7.5, 10 mM MgCl2, 20 mM KCl and 5 mM DTT with indicated amounts of ATP and amino acids. Desired amounts of un-labeled tRNA mixed with [32P] A76-labeled tRNA for assays by GlyCA and GlyCA1–2 were heated in 37 mM HEPES, pH 7.5, 30 mM KCl to 90°C for 2 min. The tRNA was then cooled linearly (drop 1°C/30 s) until it reached 80°C when MgCl2 was added to a final concentration of 10 mM. The tRNA continued to cool linearly until it reached 20°C. The desired amount of re-folded tRNA was mixed with buffer and a zero timepoint collected prior to initiation of the reaction by addition of the enzyme to 15 μM. Ten timepoints were quenched by adding into a solution of 0.4 M sodium acetate, pH 5.2, 6.25 mM Zn Acetate, 10U P1 nuclease and stored on ice until all timepoints had been collected. Quenched samples were incubated at 37°C for 10 min to allow digestion of the tRNA by the P1 nuclease. Samples were spotted on pre-run PEI TLC plates and developed in 10% NH4Cl, 5% acetic acid. Dried TLC plates were exposed overnight to a phosphor screen and visualized on a Typhoon Scanner. Integrated densities for AMP and Glycyl-AMP were processed for analysis as single turnover experiments by computing first the fraction represented by Gly-AMP. Kinetic parameters came from fitting the tRNA-dependence to Equation (2).

Data processing and statistical analysis by multiple regression methods

We visualized TLC plates with phosphor imaging screens, which we digitized using ImageJ. Data were transferred to JMP16PRO™ Pro 16 via Microsoft Excel (version 16.49), after intermediate calculations. We fitted AST curves and Michaelis–Menten assays using the nonlinear regression module of JMP16PRO.

Multiple regression analyses of factorial designs exploit the replication inherent in the full collection of experiments to estimate experimental variances on the basis of t-test P-values, in contrast to the presenting error bars showing the variance of individual datapoints. Multiple regression analyses reported here also entail triplet experimental replicates, which enhance the associated analysis of variance.

We analyzed relationships between dependent (ΔG values for first-order, turnover and steady-state rates, n-values) and independent (presence of substrates, protein modules) variables in factorial design matrices, e.g. Tables 1, S3 and S4, with the Fit Model multiple regression analysis module of JMP16PRO™ Pro, using an appropriate form of Equation (3) (44).

graphic file with name M0002.gif (3)

where Yobs is a dependent variable, usually an experimental observation, β0 is a constant derived from the average value of Yobs, βi and βij are coefficients to be fitted, Pi,j are independent predictor variables from the design matrix and ϵ is a residual to be minimized. All rates were converted to free energies of activation, ΔG = – RTln (k), before regression analysis because free energies are additive, whereas rates are multiplicative. For example, the activation free energy for the first-order decay rate in single-turnover experiments is ΔG (kchem).

Results

Provenance, nomenclature and likely architecture of the GlyCA urzyme

We identified the GlyRS urzyme sequence within the genome sequence for the Arctic Fox, Vulpes lagopus. We extracted the sequence from GenBank: gene number 121489478 (35). The annotation for the GenBank entry describes two open reading frames (ORFs 1, 2; see Figure 1). ORF-1 contains motifs 1 and 2 plus 30 residues from the N-terminus of the insertion domain. These are the only structures homologous to other Class II AARS. ORF-2 contains a three-helix bundle from the C-terminus of the α-subunit. The GlyRS-B α-Chains assemble into dimers through that helical bundle (33). These dimers can activate glycine even in the absence of the β subunit (45). They are not thought to have aminoacylation activity.

We recently introduced a nomenclature for Class I AARS urzymes, based on the sequential order of modules. The ATP-binding protozyme module containing the HIGH signature is N-terminal in all Class I AARS, and is denoted A. It also contains part of the amino acid binding site. A variable-length insertion element (connecting Peptide 1; CP1) follows the protozyme and contains various elements that enhance the amino acid specificity but are dispensable for both amino acid activation and aminoacylation. That insertion is segment B. The second signature, KMSKS, follows CP1 (46,47) and is segment C. The anticodon-binding domain, ABD, which is always C-terminal in Class I AARS, is segment D. We designated the LeuRS urzyme LeuAC because it contains segments A and C, but neither B nor D.

Motifs 1 and 2 in Class II urzymes have high codon middle-base pairing frequencies in the opposite order in antiparallel alignment with the protozyme (segment A) and the KMSKS signature (segment C) Class I urzymes. The long, variable insertion in Class II is C-terminal to the protozyme and comes before motif 3 and the ABD. The order of comparable modules in Class II aaRS is thus C-A-B-D. Inasmuch as the Class II ATP-binding module (motif 2) is C-terminal in Class II AARS, we will designate residues 6–87 of ORF1 as GlyCA (Figure 1A).

The tRNA binding modules are all located on the β subunit of GlyRS-B family members but are integral in all other Class II AARS and unlike any of the four domains in the β subunit. In any case, the V. lagopus genome deposition contains only a mutated α subunit without an accompanying β subunit. For this reason, we do not concern ourselves in this work either with the β-subunit or with the possible biology of the mutated α gene in the genomic database.

We selected the 81-residue GlyCA and the 229-residue GlyCA1–2 for detailed kinetic studies. We initially expressed and purified both constructs as maltose-binding protein (MBP)-fusion proteins. The entire annotated gene (Gly-CA1-2 contains part of ORF 1 and all of ORF 2) allowed us to determine the activity as represented in the genomic data. Residues 6–87 from ORF1 correspond to the motif 1 and 2 segments of the simplest HisRS urzyme (HisCA1) (10).

Several features of that construct drew our attention.

  • A BLAST search revealed that sequences of both ORFs are 99.75% identical to the corresponding sequences of the bacterial GlyRS-B from Streptococcus alactolyticus sequenced from pig gut. Thus, it resembles a typical bacterial heterotetrameric GlyRS and likely arises either via contamination or, less likely, via horizontal gene transfer from a commensal bacterium.

  • AlphaFold2 predicts a tertiary structure for the continuous coding sequences of ORFs 1 and 2 that is nearly identical to that of the catalytic center of the α-subunit in the 2.7 Å crystal structure of the E. coli α2β2 GlyRS-B (23) (PDB ID 7YSE).

  • There is no evidence of the mRNA being expressed in Arctic Fox tissues, and the splice site boundaries are computational predictions (35).

  • The premature stop codon is C-terminal to motif 2, but N-terminal to the insertion module preceding motif 3. It therefore includes the sequences homologous to the Class II HisCA1 (10). We elaborate on this point in the 'Discussion' section.

  • The second ORF covers the entire C-terminal α-helical domain that forms the interface between the two α-subunits but is missing the insertion domain and motif 3. GlyCA lacks the C-terminal three-helix bundle, so is expected to be monomeric.

  • There is no sequence in the V. lagopus genome for a corresponding β subunit. The β-subunit, a highly idiosyncratic protein unlike any other AARS subunit, is absent. The β-subunit provides a variety of RNA binding domains and is normally required for aminoacylation.

GlyCA and GlyCA1–2 are stable, soluble and functional amino acid activating enzymes

As models for early stages in the evolution of genetic coding, AARS urzymes are unique catalysts. A key distinction of the urzymes is that their catalytic activities are reduced by roughly five orders of magnitude from those of their putative descendant full-length AARS. The essential task is to prove that their evident catalysis is real. We address this question here with two new approaches proposed by one of us (SKP; (i) (ii) below), together with all of our previous approaches.

We purified both constructs as MBP fusion proteins and characterized them directly, without removal of the tag by TEV cleavage. The taxing protein chemistry of other AARS urzymes is greatly simplified because the tag makes them much more soluble than the urzymes themselves. It is far more practical to keep concentrated storage stocks of fusion protein. TEV cleavage before assaying enhances activity to variable extents, depending on the AARS from which it is derived. Thus, the robust catalysis documented below may underestimate the actual kinetic parameters.

Both purified fusion proteins are, for all intents and purposes, the exclusive sources of observed catalytic activity. We supplement the methods used previously to assure the authenticity of the observed catalysis with two new measurements ((i) and (ii) below) that provide new, direct, physical evidence for claims that previously had only indirect support. These additional data make the strongest case yet for authenticity.

i. LC-ESI-MS/MS analysis (Figure 2A) showed that GlyCA represented about 70% of the mass. The remaining proteins identified by the proteomics algorithm Protein Discoverer (50) included numerous contaminants present at <3%. These contaminants included neither AARS nor other enzymes—kinases, ATPases, pyrophosphatases or enzymes that would generate either orthophosphate in the MG assay (14) or AMP in ASTs.

Figure 2.

Figure 2.

Purity and authentication of GlyCA and GlyCA1–2 catalysis. A. Mass spectroscopy of purified GlyCA and GlyCA1–2. Liquid chromatography electrospray ionization mass spectroscopy was used to identify individual components of purified GlyCA and GlyCA1–2. Detailed summaries of contaminants are given in Supplementary Tables S1 and S2 and are discussed in Supplementary §B. GlyCA is about 2/3 of the sample whereas GlyCA1–2 is 3/4 of the purified samples. Contaminants the two samples individually are broadly distributed among different metabolic pathways. None is capable of producing orthophosphate in the presence of amino acid and ATP. B, C. Comparisons of a Coomassie stained native gel (right) to a zymogram visualized în situ using absorption of the MG orthophosphate complex by GlyCA (B) and GlyCA1–2 (C) (52).

The identities of the prominent contaminants are given in Supplementary Tables S1, S2 and summarized in Supplementary section B. Note in particular that the relative concentrations of contaminants are so low that they remain beneath the detection limits of the Coomassie staining of native gels in Figure 2B, C. The sole exception is the presence of GroEL in the GlyCA1–2 sample, which is present at 9%. GroEL does consume ATP but is neither glycine, or tRNAGly-dependent (see §3.5 and §4.4 below).

ii. A new zymographic technique ((51) Figure 2B,C) confirmed that conclusion. Zymography provides a way to visualize enzymatic activity by staining native PAGE gels. Separation of purified catalysts on native gels allowed us to use MG staining to visualize AARS activity. GlyCA and GlyCA1–2 are the only catalysts contributing to the achromatic green bands in the zymogram gels. This result confirms visually that the activity resides in the dominant molecular species with the molecular mass of the fusion protein.

The visible activity band corresponds to the major band in the Coomassie stained native PAGE gels. That clearly indicates that only the GlyCA and GlyCA1-2 are active in the corresponding samples during amino acid activation assays using MG even though other proteins are present in the sample. This is the first time that we developed a new zymography technique to demonstrate amino acid activation by an AARS urzyme. It is hard to overstate the importance of this novel evidence for authenticity. There is a second band of activity in the GlyCA1–2 sample, suggesting limited proteolysis. Both zymogram and coomassie stained gels in panel C have similar relative intensities for the two bands. For that reason, we do not address this question further here.

iii. Single turnover kinetics (Figure 3A) show high burst sizes (Figure 3C, D). AST assays (Supplementary Table S3) show that 0.85 ± 0.12 (Figure 3C) of the GlyCA molecules and 0.68 ± 0.13 (Figure 3D) of GlyCA1–2 molecules contribute to ATP consumption. These values refer to the relative purity of the samples determined by proteomics (Figure 2A). Previous studies of AARS urzymes relied heavily on AST, AST – single turnover assays that estimate burst sizes – to demonstrate that observed catalytic activities originated from the principal component in purified samples (7,10,11,34).

Figure 3.

Figure 3.

Single-turnover kinetic time-courses for ATP consumption, ADP production and AMP production. A, B. Autoradiograms of a single representative sample of AST assays. Data points and nonlinear regression fits for single turnover triplicate AST assay of GlyCA C) and GlyCA1–2 (D), showing fitted curves and fitting parameters for Equation 1 from densitometry of TLC plate autoradiographs. Use of 32Pα-ATP allows visualization of all three adenine nucleotides and thus the distribution of the two products, ADP and AMP. Fitted first-order (kchem), steady-state (k3), burst (C) and offset (A) values, together with n-Values for the burst sizes are given in Supplementary Table S3.

These data are illustrated in Supplementary Figure S3 and discussed in detail in Supplementary Section D. Together with the Michaelis–Menten data in the next section, they confirm the unique contribution made by the translated products from the adventitious V. lagopus GlyRS α gene.

Both GlyCA and GlyCA1–2 exhibit [tRNAGly]-dependent aminoacylation.

One of three replicates of tRNAGly-dependent aminoacylation by GlyCA and GlyCA1–2 is shown in Figure 4. Although not evident from casual visual comparison between the autoradiograms in Figure 4A,B, there is evidently a significant difference between the apparent tRNAGly affinity of the two catalysts. To assess that possibility, we fitted the data in Figure 5C,F to both linear and rectangular hyperbola (Johnson (43)) models. The Johnson steady state model gave a superior fit in both cases. This improvement is shown by the residuals between the two fitted models and the average observed data for each tRNAGly concentration (Figure 4C). The two curves are nearly perfectly quadratic, with systematic under estimation by the linear model at the two extremes and over estimation for the three intermediate data points. This behavior clearly justifies semi-quantitative comparisons described further in §3.5.

Figure 4.

Figure 4.

Autoradiograms of thin-layer chromatograms showing [tRNAGly-dependent aminoacylation activities of GlyCA (A) and GlyCA1–2 (B). tRNAGly concentrations are effective concentrations, corrected for fractional acylatability, determined separately. Molar concentrations of Gly-A76 produced at each concentration and used for subsequent kinetic analysis were determined from the ratio of the integrated intensities of Gly-A76 to the total of Gly-A76 plus that for A76 multiplied by the [tRNAGly] at each concentration. (C) Comparison between linear and steady-state kinetic fits to the data derived from densitometry and shown in Figure 5C,F.

Figure 5.

Figure 5.

Michaelis–Menten kinetic analyses of glycine activation (MG assay) and tRNAGly aminoacylation (32P-labeled A76) by GlyCA (A–C) and GlyCA1–2 (D–F). The plots are fitted to the substrate concentration dependence of triplicate rate measurements. Ksp= kcat/KM and kcat parameter estimates are given with their variance parameters in the table as well as KM values derived from them. The mean correlation between Ksp and kcat values is ∼0.5 ± l 0.01 except for those for aminoacylation, which are >0.95 for both GlyCA and GlyCA1–2.

Steady-state kinetic analyses are comparable to those of other AARS urzymes.

Fitted Michaelis–Menten kinetics of GlyCA [glycine]-, [ATP]- and [tRNAGly]-dependence are shown, together with fitted parameters, in Figure 5AC for GlyCA and Figure 5DF for GlyCA1–2. [Glycine]- and [ATP]-dependent assays were done using the MG assay (52). The picomolar sensitivity of the assay allows us to observe saturation behavior over an unusually large (30 000-fold) range of glycine concentrations (50 μM–1.5 M).

Glycine binds relatively weakly to GlyCA, KM = ∼31 mM, consistent with the absence of a side chain to provide either buried surface area or electrostatic attraction. Corresponding values for full-length tetrameric GlyRS-B enzymes (110 μM for Aquifex aeolicus (45); 160 μM for E. coli (53)) are several orders of magnitude tighter. The significant elevation of the GlyCA KM over those of the two full-length GlyRS-B measurements reinforces our conclusion from Figure 2 that the GlyCA urzyme is the sole source of observed catalytic activity.

ATP binds unusually strongly to both GlyCA and GlyCA1–2. This reinforces an observation we have made previously, that ATP affinity decreases progressively as ancestral AARS grow from protozyme to urzyme to full-length (13,54,55).

The curves in Figure 5C and F merit special comment. Neither GlyCA nor GlyCA1–2 is even close to saturation. It is not obvious that either plot differs significantly from a linear dependence. As noted in the 'Materials and methods' section, this results from a number of factors, including the difficulty of preparing tRNAGly with a high fraction of acylatable tRNA, which places a practical upper limit to the [tRNAGly] we can use in assays. The corresponding activation free energies for kcat/KM ΔG (kcat/KM) are ∼ –1.2 kcal/mole for GlyCA and –2.2 kcal/mole for GlyCA1–2. Both values are comparable to the range (–3.6 to –1.3 kcal/mole) previously established for other AARS urzymes (2).

Factorial analysis by multiple regression modeling identifies strong energetic coupling between Exon2, glycine and tRNA.

Both constructs derived from the adventitious Arctic Fox GlyRS-B α-subunit show aminoacylation activity (Figures 4 and 5). The replicated data for the steady-state kinetic parameters (Table 1) show that Exon2 has a surprising impact on the relative affinities for the three different substrates. High reproducibility of triplicate assays makes these effects especially instructive of the usefulness of factorial design.

Table 1.

Design matrix for factorial analysis of the interplay between steady-state kinetic parameters (columns 2–7) and independent parameters Exon2 and substrates (columns 8–11) for GlyCA (top) and GlyCA1–2 (bottom). Triplicate measurements for Glycine, ATP, and tRNAGly dependencies appear in succession for each catalyst, followed by averages and error estimates

SET kcat, /s KM, M Kcat/KM, /M/s ΔG (kcat) ΔG (KM) ΔG (kcat/KM) Exon2 Glycine ATP tRNAGly
GlyCA
All 0.0016 0.031 0.051 3.81 2.04 -1.00 0 1 0 0
1 0.0016 0.027 0.059 3.82 2.14 -1.06 0 1 0 0
2 0.0016 0.032 0.051 3.79 2.04 -0.98 0 1 0 0
3 0.0016 0.037 0.043 3.82 1.96 -1.02 0 1 0 0
Mean 0.0016 0.032 0.051 3.81 2.04 1.77
Stdev 0.00003 0.0040 0.0062 0.01 0.08 0.07
All 8.25E-03 6.37E-05 130 2.84 5.72 -2.88 0 0 1 0
1 7.97E-03 6.33E-05 126 2.86 5.72 -2.86 0 0 1 0
2 8.22E-03 6.86E-05 120 2.84 5.68 -2.83 0 0 1 0
3 7.73E-03 4.57E-05 169 2.88 5.92 -3.04 0 0 1 0
Mean 8.04E-03 6.03E-05 136 2.86 5.76 -2.90
Stdev 2.44E-04 1.00E-05 2.24E + 01 1.80E-02 1.07E-01 9.14E-02
All 2.27E + 0 2.89E-01 7.86 -0.49 0.74 -1.22 0 0 0 1
1 1.81E + 0 2.68E-01 6.76 -0.35 0.78 -1.13 0 0 0 1
2 9.97E-01 1.35E-01 7.36 0.00 1.18 -1.18 0 0 0 1
3 1.95E + 1 2.48E + 0 7.86 -1.76 -0.54 -1.22 0 0 0 1
Mean 6.15E + 0 7.94E-01 7.46 -0.65 0.54 -1.19
Stdev 8.93E + 0 1.13E + 0 5.22E-01 7.68E-01 7.47E-01 4.22E-02
GlyCA1–2
All 5.40E-04 7.20E-02 0.0072 4.45 1.53 2.92 1 1 0 0
1 4.75E-04 6.45E-02 0.0074 4.53 1.62 2.91 1 1 0 0
2 6.44E-04 1.23E-01 0.0052 4.35 1.24 3.11 1 1 0 0
3 6.27E-04 1.14E-01 0.0055 4.37 1.28 3.08 1 1 0 0
Mean 5.72E-04 9.41E-02 0.006 4.42 1.42 3.00
Stdev 7.89E-05 2.87E-02 1.11E-03 8.38E-02 1.86E-01 1.05E-01
All 1.18E-02 6.45E-05 186 2.63 5.67 -3.04 1 0 1 0
1 1.23E-02 7.59E-05 171 2.61 5.62 -3.01 1 0 1 0
2 1.15E-02 5.00E-05 162 2.64 5.86 -3.22 1 0 1 0
3 1.18E-02 6.37E-05 230 2.63 5.72 -3.09 1 0 1 0
Mean 1.19E-02 6.46E-05 187 2.63 5.72 -3.09
Stdev 3.06E-04 1.09E-05 3.03E + 01 1.53E-02 1.05E-01 9.17E-02
All 1.25E-02 3.21E-04 39.00 2.30 4.43 -2.13 1 0 0 1
1 7.54E-03 1.90E-04 39.64 2.89 5.07 -2.18 1 0 0 1
2 2.05E-02 5.67E-04 36.23 2.59 4.76 -2.17 1 0 0 1
3 1.24E-02 3.03E-04 41.10 2.60 4.80 -2.20 1 0 0 1
Mean 1.33E-02 3.45E-04 38.99 2.60 4.76 -2.17
Stdev 5.38E-03 1.59E-04 2.04E + 00 2.42E-01 2.65E-01 3.14E-02

A vexing possibility arises from the dual observation that the GlyCA1–2 protein not only shows indication of some proteolysis, and that the proteomics indicates a high concentration (∼10%) of GroEL. Both raise the possibility that ATP consumption may be unduly influenced by the repeated ATP hydrolysis of the GroEL chaperone folding cycle. That is unlikely in this case because KM is also characteristic of the catalyst and we assay two different products – glycine activation and tRNA aminoacylation – in Table 1 and Figure 6.

We previously showed that data such as that in Table 1 can be represented in two alternate coordinate systems (6,34,56). Figure 6 shows both coordinate systems and the conversion of one into the other by regression methods (6,34,57). In this case, the transformation shows how tRNAGly binding significantly alters the active site.

Figure 6.

Figure 6.

Factorial analysis of GlyCA and GlyCA1–2 steady-state kinetic parameters ΔGkcat (A, D, G), ΔGKM (B, E, H) and ΔGkcat/KM (C,F,I) for glycine- ((blue bars) and ATP- (green bars) dependent amino acid activation. The panels are arranged to highlight the role of the regression models (middle row) as linear transformations relating the experimental measurements for different constructs with each substrate in the top row to their relative contributions to each rate or binding constant in the bottom row. (A–C) Mean values and standard deviations of the free energy parameters summarized in Table 1. Histogram bars are colored as in Table 1 to differentiate substrates glycine (G, blue), ATP (T, green) and tRNAGly (R, yellow). Note that the top histogram in C is the difference between, and is thus a linear combination of, histograms in B and A. (D–F) Regression models for conversion of values in A–C into relative free energy contributions of Exon2, amino acid and tRNAGly to values in A–C. (G–I) Histograms showing those relative free energy contributions to ΔG (kcat), ΔG (KM), and ΔG (kcat/KM). Histogram bars are colored differently, to highlight the new coordinate system. Significant favorable effects are colored green, adverse effects are colored red.

Comparison between Figure 6AC with Table 1 should help clarify the unusual energetic coupling represented by the interactions of Exon2 with the amino acid and tRNA substrate (Figure 6GI). tRNAGly increases ΔG (kcat/KM) for amino acid activation (Figure 6C). Indeed, entries in Table 1 show that the PPi exchange reaction by GlyCA is slower by two orders of magnitude than aminoacylation of tRNAGly.

Neither Exon2 nor tRNAGly have significant impact on use of the ATP substrate (green bars, Figure 6AC). Exon 2 and tRNAGly have significant and opposite effects on the second-order rate constants for amino acid (blue bars) and tRNAGly (yellow bars) substrates. Exon 2 has no effect on the rate of glycine activation (Figure 6A, B), but it greatly reduces the affinity for glycine. Exon 2 increases both the affinity for tRNAGly and the acylation rate constant (Figure 6A, B). As a consequence, the blue and yellow bars in Figure 6C change in opposite directions, slowing the rate activation rate for glycine while enhancing the acylation rate for tRNA.

Although counter-intuitive, these observations on the synergy between GlyCA1–2 and tRNAGly are borne out by single turnover experiments performed in the presence of tRNAGly (Figure 7). tRNAGly enhances the first stage of the two-stage reaction in two ways. First, it increases the first-order rate of glycine activation (Figure 7A). Then, it also increases the proportion of ATP consumed that is productively converted into Gly-5′AMP, which is the requisite precursor to the acyl-transfer reaction (Figure 7B). Together, these two effects contribute to substantially resolve the puzzle over how the tRNAGly aminoacylation rate can exceed the rate of amino acid activation by such a large margin.

Figure 7.

Figure 7.

tRNAGly-dependent active site titration shows that the tRNA substrate increases the first-order rate, kchem, of glycine activation (A) and the fraction of ATP that is converted into Gly-5′AMP (B). Plots show the correlation between observed values of the respective dependent variables – (ΔG (kchem) in (A) and the ratio of AMP to ADP produced in the first round of catalysis (M/D) in (B) – and values computed using the regression coefficients in the tables (bottom row).

Figure 7 has other curiosities that are harder to resolve. Notably, Exon2 substantially alters the relative affinities for amino acid and RNA substrates (Figure 7B). Part of the reason for this unusual effect appears to be that Exon2 provides a partial pocket covering the binding site for A76, as noted further in the Discussion (§4.4).

GlyCA prefers Class II to Class I amino acids and recognizes a set of amino acids complementary to those recognized by other AARS urzymes.

The MG assay enhances the throughput of enzymatic measurements, enabling us to perform Michaelis-Menten experiments for all 20 of the canonical amino acids. Free energies of activation for kcat/KM are summarized in Figure 8. The histograms divide the Class I (left) from Class II amino acids (right) and reflect the relative throughput of different amino acids and is therefore a quantitative metric for the ability of GlyCA to administer the code for glycine. It is worth noting that the GlyCA specificity spectrum is the least ambiguous of the three we have published in the sense that glycine is preferred to a greater extent over other amino acids. Moreover, GlyCA favors a distinct set amino acids almost orthogonal to those favored by LeuAC and HisCA. We did not characterize the specificity spectrum for GlyCA1–2 because the triple helix bundle it provides does not occur in any other Class II AARS.

Figure 8.

Figure 8.

GlyCA amino acid specificity spectrum compared to those previously determined for LeuAC and HisCA2 (adapted from reference (2)). Michaelis–Menten experiments for GlyCA were performed using the MG assay as described in METHODS. Similar experiments for LeuAC and HisCA2 were performed using the conventional assay with radiolabeled 32P-pyrophosphate. The Class of each urzyme is appended in parentheses. Class I amino acids are sorted in the left, Class II amino acids on the right of each panel, according to their activation free energies, ΔGkcat/KM. Amino acids from the same Class as the urzyme are colored dark green; those from the opposite Class are colored light blue. All three urzymes are promiscuous. GlyCA favors a subset of Class II amino acids (see yellow shaded columns showing Class averages in black) as does HisCA; LeuAC favors a subset of Class I amino acids. Amino acids favored by GlyCA (Gly, Ala) complement those favored by LeuAC (Ile, Met, Leu) and HisCA (Lys, His), suggesting a functional, experimental basis for implementing ancestral genetic codes with reduced alphabets.

Structural modeling is qualitatively consistent with the GlyCA amino acid specificity spectrum.

Modeling amino acid side chains into the GlyCA active site poses several problems. First, we have only two choices for atomic coordinate sets. One, X-ray crystal structures of homologous proteins, cannot properly represent the structural consequences of either amino acid sequence changes or missing modules. The other, AlphaFold predictions, are based on AI neural networks, not experimental data. Such predictions are also strongly conditioned by the available crystal structures. Moreover, they cannot represent ligand positions. The near identity of the AlphaFold GlyCA prediction (Figure 1) and homologous 7YSE coordinates led to the approximate representations in Figure 1C.

Crystallographic coordinates are available for all 20 amino aminoacyl-5′AMP ligands. However, Class I adenylate ligands cannot be compared directly, because the amino acyl phosphate bonds all link to a different prochiral phosphate oxygen position (Figure 9A). Thus, we constructed a set of enantiomeric Class I adenylates for proper comparison. These ligands are illustrated in Figure 9B. Better substrates appear qualitatively to fit better within the active site. Worse substrates either create bad van der Waals contacts or project into solution, consistent with the pattern of free energies derived from the specificity constants, kcat/KM, illustrated in Figure 8.

Figure 9.

Figure 9.

Comparison of amino acid substrate binding to GlyCA. Aminoacyl-5′AMP or -5′sulfoamylAMP structures were superimposed onto the Gly-5′sulfoamylAMP of PDB ID 7YSE using only the adenosine coordinates to emphasize different potential interactions with the GlyRS active site. A. Direct comparison of amino acid substrate binding to the GlyCA active site is complicated by the inherent diastereoisomerism of the naturally occurring aminoacyl-5′AMP complexes taken from Class I and II crystal structures. All amino acids in proteins are L stereoisomers. Class I amino acids, however, excepting Trp and Tyr, are transferred to the opposite prochiral site on the ATP α-phosphate by Class I AARS. This orientation is highlighted in blue for Class I and red for Class II amino acids. The comparison in B therefore required flipping the Class I amino acids over to the Class II orientation, which is directly aligned for nucleophilic attack from the 3′OH group of A76. B. Structural modeling of all 20 putative canonical amino acid binding modes. Central panel is taken from PDB ID 7YSE. Others were taken directly from appropriate X-ray crystal structures. Class I aminoacyl-5′AMP structures were corrected for the enantiomeric configuration of the α-Phosphate as described in the text. Amino acid side chains are labeled either as better substrates on the left or worse substrates on the right. Class I amino acid names are blue; Class II names are red (created using PYMOL (49)).

Efforts to quantify the structural comparisons using metrics derived from the structures (i.e. relative buried surface areas, matching hydrogen bond donor with acceptors and compiling clash lists using molprobity (58)) were not successful. Regression models failed to produce coefficients with significant t-test P-values.

Discussion

The GlyCA urzyme arose from a different paradigm from that of previously characterized AARS urzymes.

Identification of what appears to be a minimally modified S. alactolyticus GlyRS-B α-subunit gene is either a contaminant (V. lagopus is the only one of four fox genomes that have this gene) or the result of horizontal gene transfer (59). The absence of the corresponding GlyRS-B β-subunit gene makes the former possibility more likely. Moreover, the GlyCA constructs may result from adventitious sequencing errors, as we see no rhyme nor reason for the creation of the intron. Thus, a genomic database has spawned a spontaneous example of a functional Class II GlyRS-B urzyme.

The Arctic fox GlyCA and its matching gene (Streptococcus alactolyticus GlyRS) differ at two positions (position 358 and 500). A single NT insertion after the codon of W119 creates a stop codon after residue D211, ending the first ORF1. The stop codon leads to ‘splicing’ in the annotated database, and therefore, the resulting protein sequence resembles GlyRS only at the N- and C- termini. The correct reading frame is restored downstream by the combination of a second inserted nucleotide and an intron that is not an integral multiple of codons in length. The annotated ‘intron’ between ORF1 and ORF2 includes all of what is often referred to as the Insertion Domain, ID or INS (60), a three stranded antiparallel β module preceded by a short α-helix (Figure 1C). Thus, derivation of the GlyCA architecture from its parent full-length GlyRS is entirely different from those of TrpAC, LeuAC or HisCA urzymes, all of which were designed to test the Rodin-Ohno hypothesis (20). In that important sense, GlyCA is derived from observation rather than by hypothesis.

Accurate prediction of intron-exon boundaries remains an open problem in bioinformatics (61). Existing methods combine all kinds of information sources, not just identification of 5′ donor and 3′ acceptor splice sites in the putative pre-mRNA sequence. The identification of the V. lagopus GlyRS-B ORFs 1 and 2 may be no more than coincidence, perhaps arising simply from sequencing errors.

Notwithstanding, the GlyCA construct characterized here proved to behave as a robust ancestral AARS construct. It is therefore, unquestionably, a new Class II AARS urzyme (see Supplementary Figure S2 and Supplementary Section C). We coined the term ‘urzyme’ (11) to describe putative ancestral genes with high amino acid sequence identity with the corresponding full-length enzymes, and that retain all their functions, albeit at reduced levels. They help define the minimum structural requirements for catalytic activity. They are thus also ‘null mutants’ for extra domains that subsequently refined their functionality during subsequent adaptive evolutionary diversifications. Modular combinatorial studies of the evolution of catalysis and specificity (34,62) illustrate their utility in this regard. The GlyCA gene qualifies on all these counts.

Glycine activation rates by GlyCA and GlyCA1–2 are comparable to those observed for other class I and II AARS urzymes.

GlyRS-B α-subunits retain amino acid activation activity in the absence of β-subunits (45), which are themselves essential determinants of tRNAGly affinity. That behavior appears to contrast with PheRS α2β2 systems, which are claimed to require both subunits (63,64). As we have noted elsewhere (6), failure to demonstrate amino acid activation may not necessarily imply inactivity.

The glycine system is the fourth AARS family for which we have been able to characterize the corresponding urzymes. Figure 10 compares the first order and steady state rate enhancements of the various glycyl, histidyl, tryptophanyl and leucyl AARS urzymes and, in the case of steady-state kinetics, two corresponding full-length AARS. Reference (12) discusses the estimation of the uncatalyzed rate in detail. Both histograms are sorted in order of increasing activity to the right.

Figure 10.

Figure 10.

Activation free energies, ΔGkchem for first-order rates (A) and overall rate enhancements kcat/KM/knon (B) for amino acid activation by glycyl, histidyl, tryptophanyl and leucyl AARS urzymes. The former are in kcal/mole; the latter are dimensionless. Error bars are shown only for single turnover experiments, as they are unavailable for some of the steady-state rates. Class I catalysts are shown in shades of green, Class II catalysts in shades of blue. The darkest shades in B are full-length AARS. Values from this work are in bright blue.

The correlation between the two activity metrics is only approximate. It is instructive, however, that GlyCA, the smallest of the urzymes, with only 81 amino acids, is, nevertheless substantially more active than the four other HisRS urzymes. It is even 2.8 times more active than the full-length A. aeolicus GlyRS-B α2 dimer ((45); Figure 10B). That rate enhancement is also ∼3 times greater than we previously observed for the Class II HisCA3 urzyme (9), and within a 40% of that for LeuAC, the most active of the AARS urzymes characterized to date (6).

The single turnover rates of ATP consumption by GlyCA and GlyCA1–2 in Figure 10A are the same, within experimental error. Yet, the steady-state rate of glycine activation by GlyCA is 540 times that of GlyCA1–2 in Figure 10B. This curious phenomenon is discussed elsewhere in the context of Figures 6, 7.

It is counterintuitive that both single turnover and steady-state rate accelerations of GlyCA exceed those of GlyCA1–2 and also that of the A. aeolicus α2-dimer. One might expect that removing the helix bundle that provides the dimer interface might compromise enzymatic activity rather than enhancing it. The authors describing the activity of the dimer note that it has a surprisingly reduced activity, relative to that of the α2β2 tetramer. It seems likely to us that this implies sophisticated communication between the two different subunits in the intact enzyme to coordinate catalysis of amino acid activation and tRNA acylation as well as specific recognition of both subunits. If that communication is mediated via the dimer interface, then it is possible that releasing the catalytic site from constraints imposed by the integrated behavior might activate it. Further experiments will be necessary for a fuller understanding. In any case, the primary and predicted tertiary structures of both excerpts and their robust catalytic properties validate the robustness of the urzyme as an experimental model.

Overall rate enhancements for tRNAGly acylation are 9.5 × 104 for GlyCA and 5.1 × 105 for GlyCA1–2.

Evidence summarized in Figure 2, 3 represents the most extensive demonstration to date of the authenticity of urzyme catalysis.

We supplement the single turnover kinetic analysis (i.e. large burst size) and the two orders of magnitude increased KM with the combination of LC_ESI_MS/MS and, especially zymographic evidence of in situ activity demonstrated in a native gel in documenting a comprehensive documentation of authenticity of the observed catalytic activity. These factors, together with the unusual provenance of GlyCA, strongly underscores the legitimacy of using AARS urzymes as models for emergence and early evolution of the genetic coding table.

We recently used thermodynamic cycle analysis to show that the active site signatures in Class I LeuAC also contribute only modestly to transition-state stabilization unless their functionalities are coupled together by the relative motions of domains not present in Class I urzymes (34,62). The absence of Motif 3 from the GlyCA urzyme reinforces our previous finding (9,10) that the second arginine tweezer (65) is not required for catalysis of aminoacyl-tRNA synthesis. The ancestral requirements for enzymatic activity were surprisingly simple. The absence of Motif 3 from both GlyCA and GlyCA1–2 provides additional evidence that Motif 3 is dispensable for rudimentary functionality of ancestral AARS in implementing a rudimentary genetic code.

tRNAGly and Exon2 combine to increase the efficiency of ATP utilization

The presence of tRNAGly significantly alters the behavior of single turnover kinetics in another important way. tRNAGly and Exon2 introduce a profound change in the fraction of input ATP that is usefully converted to Gly-5′AMP synthesis. We refer to that ratio as M/D adapting a shorthand from the bioenergetics literature in which T = ATP, D = ADP, and M = AMP. The results are shown in (Figure 11). Both Exon2 and tRNAGly decrease the mean M/D ratio for GlyCA, but together, they increase the M/D ratio in GlyCA1–2 by nearly four-fold (P <<0.0001).

Figure 11.

Figure 11.

tRNAGly increases the efficiency of ATP utilization in single turnover assays. In the table of regression coefficients β is the multiplier for the presence of each term in the regression formula and σ is its standard deviation. The t Ratio is equal to β/σ.

The M/D ratio may provide a novel metric of the efficiency of free energy usage by AARS and their urzymes. The nucleotide products of ATP consumption, AMP and ADP, occur in a 30:70 ratio. That M/D ratio for GlyCA (0.43) is similar to that for Class I LeuAC (0.34) and about 12% of that for full length LeuRS (3.5). The putative active site of LeuAC revealed vacant surfaces complementary to two ATP configurations reported to predominate its solution configuration (7). That observation, together with a delayed increase in AMP production led us to propose that ATP can successively phosphorylate the bound leucyl-5′AMP, restoring a transition-state configuration that is more active than the aminoacyl-5′AMP complex.

An obvious possibility is that ADP is produced adventitiously by contaminating enzymes, notably in this case by GroEL. That seems not to be the case here. First, urzymes tend to consume more ATP per unit of amino acid activation than do full length AARS. Second, the value itself is variable, depending on the amino acid and on active-site mutations (34). In the case of the GlyCA urzymes described here, tRNAGly and Exon2 together increase the fractional conversion of ATP to AMP by four-fold. The strong dependence on the structure of the GlyRS urzyme and the presence of tRNA convincingly rule out adventitious contaminating enzymes as the source of ADP production. It tends to support a mechanism similar to that outlined for LeuAC (7).

Our database for this metric is, as yet, too marginal to draw further conclusions. Nevertheless, we should bear in mind that such a metric could significantly favor evolutionary changes that increased it, and hence be a significant selective pressure in evolutionary changes that are difficult to understand without it (62). That question has occupied us since we first demonstrated that the Class I anticodon-binding and CP1 domains in full length TrpRS cannot enhance any functional metric of the TrpRS urzyme (56,62). That observation left the entirely unanswered question of how acquisition of either domain might have been selected, as it is unlikely that the two domains were acquired at the same moment in time. We can test/falsify that hypothesis by measuring the M/D ratio for a Class I construct containing the intact catalytic and editing domains to determine if the presence of the CP1 insert enhances the M/D ratio.

Exon2 enhances tRNA binding by covering the A76 binding site

The striking improvement in the GlyCA1–2 acylation activity suggests that some aspect of Exon 2 enhances tRNAGly binding. Crystal structures of intact E. coli GlyRS-B (7YSE; (23)) and HisRS (1HTT; (66)) provide some insight into possible reasons for the enhanced affinity (Figure 12). The A76 binding pocket is minimal in GlyCA. However, in both GlyCA1–2 and HisCA1 (excerpted from 1HTT), distinct protein modules provide some additional binding surface. In HisRS (and all Class II AARS except GlyRSs and AlaRSs, a module known as the ‘small interface’ between motifs 1 and 2 covers the A76 binding site. A similar surface arises from the dimer interface three helix bundle in GlyCA1–2.

Figure 12.

Figure 12.

Comparison of A76 binding sites in GlyCA1–2, GlyCA and HisCA1. The urzyme in each case is blue. A76, identified in the center image, is represented by dots. GlyCA provides only a small fraction of surface area for binding. In contrast, the A75 binding site in HisCA1 is covered by what is sometimes called the ‘small interface’ (wheat). In GlyCA1–2, the dimer interface plays a similar role to the small interface which is entirely missing in GlyRS-B and AlaRSs.

GlyRS urzymes enrich the context and repertoire of early genetic coding systems

Discussion of the origin of genetic coding remained static for decades (67). Revisiting the chief models – error reduction, matching of amino acids to either triplet codons or anticodons, and ‘frozen accidents’ – was not helpful. The events that built the coding table have now begun to emerge from the relevant molecular biology (2,68–70). Our view is that the chief barrier to further understanding has been the lack of good experimental models for ancestral AARS•RNA cognate pairs. This work achieves a balanced set of AARS urzymes that may serve that purpose.

The relative modular simplicity of many of the AARS argues that they often emerged by resurrecting and adapting an earlier version rather than further modifying a highly specialized AARS (3). The TrpRS/TyrRS leaves of the Class I tree exemplify that process. They are very likely the most recent additions to the coding table and have the simplest modular architecture. We called that process ‘retrofunctionalization’ and argued that it reinforced Wong's notion that the AARS co-evolved with the rudimentary pathways for complex amino acid biosynthesis (70).

Much evidence suggests that the first AARS were assembled piecewise from quite small modules (10,13,25,26,34,56,62,71–73). Jacob (74) popularized the idea that Nature ‘tinkers’, for example, as it assembles bits of genetic information into primordial genes. Later authors have used the more colorful French bricolage to describe such assembly (25,75).

Our first experiments with ancestral Class II HisRS urzymes (10) showed that structural modules 6–20 residues long work together. Further experiments applied such network analysis to ancestral Class I TrpAC (3,6,34) and LeuAC (56,62) to the Class II GlyCA constructs. Figures 6 and 7 extend the analysis to Class II GlyCA. In this way, we begin to identify explicit functional contributions that likely provided selective advantage. That shows how bricolage works, in detail.

The GlyRS-B α-subunit clearly acquired its multiple functional modules (Figure 1) at successive times. We suggested (3,13,72), that the most ancient module is probably motif 2, which contains the first arginine tweezer (65). There is a reasonable case that the second most ancient module is motif 1 (3). Residues 133–210 within the intron are not required for either amino acid activation or acyl transfer. Those modules probably function fully only in the context of the intact α2β2 tetramer. Thus, they too are more recent. The three-helix bundle encoded by Exon 2 is probably an even more recently assimilated module because it occurs only in GlyRS-B clades. It has quite unexpected effects on catalysis (Figure 6AC). RNA and amino acid substrate affinities of GlyCA change in opposite directions in the presence of the Exon 2 dimer interface module. Exon 2 increases affinity for tRNA but radically reduces Glycine affinity. Theoretical analysis of the eukaryotic GlyRS catalytic domain (76) underscores this point by showing, among other things, that the fragments homologous to motifs 2 and 1 are shared by GlyRS subunits form widely separated clades on our most recent Class II AARS phylogenies (3).

Finally, our experimental work on AARS urzymes reveals quantitative aspects of the kind of code they appear capable of implementing. We now have characterized four, phylogenetically diverse urzymes – two from Classes I and two from Class II – that share similar characteristics. They speed the uncatalyzed rates to similar extents (Figure 10). They prefer amino acids from the Class of amino acids that they administer as full-length contemporary enzymes. GlyCA for example prefers Gly, Ser, Ala, His, Asp and Thr sufficiently for it to activate either Pro, Phe, Lys or Asn less than one time in 10 (Figure 8).

As we have noted (6,77–79), the coding of ancestral Class I AARS opposite and in-frame of ancestral Class II ancestral AARS provides structural bases for discriminating between both amino acid and tRNA acceptor stem substrates. Those rudimentary binding determinants act semi-independently as parallel filters, providing a basis for a rudimentary code. We also have noted (80) that the limited specificity of a coding alphabet capable of discriminating between four redundant coding letters representing four mutually contrasting sets of amino acids may represent a key barrier to enhancing the coding capacity of early genetic systems.

This work opens a new perspective from which we can now frame a substantially clearer set of questions about the origin of the genetic code. Each of the AARS urzymes we have characterized behaves remarkably consistently with the requirements for a redundant four-letter code. The questions that now appear approachable will help us devise experiments with which to address whether or not such a code could be necessary and sufficient to create a foundry for protein evolution.

Supplementary Material

gkae992_Supplemental_File

Acknowledgements

A major part of the LC-ESI-MS/MS analysis and data processing is conducted using the UNC Proteomics Core Facility, which is supported in part by P30 CA016086 Cancer Center Core Support Grant to the UNC Lineberger Comprehensive Cancer Center.

Author contributions:J.D. identified the gene as a possible Class II AARS urzyme and provided phylogenetic insight for Figure 1. S.K.P. expressed, purified and assayed GlyCA and GyCA1-2. He performed all experimental measurements, including zymography, AST, amino acid activation, aminoacylation and sample preparation for the LC-MS-MS analysis. L.B. designed the protein expression vector. G.Q.T. assisted with the preparation of materials and methods for aminoacylation assays. P.W., J.D. and C.W.C validated the analyses. C.W.C., S.K.P., P.W. and J.D. wrote the manuscript. All authors contributed to discussions throughout the course of the work, and all contributed and approved the final figures and text.

Contributor Information

Sourav Kumar Patra, Department of Biochemistry and Biophysics, University of North Carolina, Chapel Hill, NC 27599-7260, USA.

Jordan Douglas, Department of Physics, The University of Auckland, Auckland 1042, New Zealand; Centre for Computational Evolution, University of Auckland, 1010, New Zealand.

Peter R Wills, Department of Physics, The University of Auckland, Auckland 1042, New Zealand.

Laurie Betts, Department of Biochemistry and Biophysics, University of North Carolina, Chapel Hill, NC 27599-7260, USA.

Tang Guo Qing, Department of Biochemistry and Biophysics, University of North Carolina, Chapel Hill, NC 27599-7260, USA.

Charles W Carter, Jr., Department of Biochemistry and Biophysics, University of North Carolina, Chapel Hill, NC 27599-7260, USA

Data availability

All data will be provided upon request from the author.

Supplementary data

Supplementary Data are available at NAR Online.

Funding

Alfred P. Sloan Foundation Matter-to-Life Program [G- 2021-16944]. Funding for open access charge: Alfred P. Sloan Foundation [G-2021-16944].

Conflict of interest statement. None declared.

References

  • 1. Gomez M.A.R., Ibba M.. Aminoacyl-tRNA synthetases. RNA. 2020; 26:910–936. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Carter C.W. Jr., Wills P.R. The roots of genetic coding in aminoacyl-tRNA synthetase duality Ann. Rev. Biochem. 2021; 90:349–373. [DOI] [PubMed] [Google Scholar]
  • 3. Douglas J., Bouckaert R., Carter C.W.J., Wills P.. Enzymic recognition of amino acids drove the evolution of primordial genetic codes. Nucl. Acids Res. 2024; 52:558–571. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Wills P.R. Origins of genetic coding: self-guided molecular self-organization. Entropy. 2023; 25:1281. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Carter C.W. Jr., Wills P.R. Interdependence, reflexivity, fidelity, and impedance matching, and the evolution of genetic coding. Mol. Biol. Evol. 2018; 35:269–286. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Tang G.Q., Hu H., Douglas J., Carter C.W. Jr.. Primordial aminoacyl-tRNA synthetases preferred tRNA minihelix substrates over full-length tRNA. Nucl. Acids Res. 2024; 52:7096–7111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Hobson J.J., Li Z., Carter C.W. Jr.. A leucyl-tRNA synthetase urzyme: authenticity of tRNA Synthetase urzyme catalytic activities and production of a non-canonical product. Int. J. Mol. Sci. 2022; 23:4229. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Carter C.W. Jr. Urzymology: experimental access to a key transition in the appearance of enzymes. J. Biol. Chem. 2014; 289:30213–30220. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Li L., Francklyn C., Carter C.W. Jr.. Aminoacylating urzymes challenge the RNA world hypothesis. J. Biol. Chem. 2013; 288:26856–26863. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Li L., Weinreb V., Francklyn C., Carter C.W. Jr.. Histidyl-tRNA synthetase urzymes: class I and II aminoacyl-tRNA synthetase urzymes have comparable catalytic activities for cognate amino acid activation. J. Biol. Chem. 2011; 286:10387–10395. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Pham Y., Kuhlman B., Butterfoss G.L., Hu H., Weinreb V., Carter C.W. Jr.. Tryptophanyl-tRNA synthetase urzyme: a model to recapitulate molecular evolution and investigate intramolecular complementation. J. Biol. Chem. 2010; 285:38590–38601. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Pham Y., Li L., Kim A., Erdogan O., Weinreb V., Butterfoss G., Kuhlman B., Carter C.W. Jr.. A minimal TrpRS catalytic domain supports sense/antisense ancestry of class I and II aminoacyl-tRNA synthetases. Mol. Cell. 2007; 25:851–862. [DOI] [PubMed] [Google Scholar]
  • 13. Martinez-Rodriguez L., Jimenez-Rodriguez M., Gonzalez-Rivera K., Williams T., Li L., Weinreb V., Chandrasekaran S.N., Collier M., Ambroggio X., Kuhlman B.et al.. Functional class I and II amino acid activating enzymes can be coded by opposite strands of the same gene. J. Biol. Chem. 2015; 290:19710–19725. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Onodera K., Suganuma N., Takano H., Sugita Y., Shoji T., Minobe A., Yamaki N., Otsuka R., Mutsuro-Aoki H., Umehara T.et al.. Amino acid activation analysis of primitive aminoacyl-tRNA synthetases encoded by both strands of a single gene using the malachite green assay. Biosystems. 2021; 208:104481. [DOI] [PubMed] [Google Scholar]
  • 15. Illangsekhare M., Yarus M.. A tiny RNA that catalyzes both aminoacyl-tRNA and peptidyl-RNA. RNA. 1999; 5:1482–1489. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Niwa N., Yamagishi Y., Murakami H., Suga H.. A flexizyme that selectively charges amino acids activated by a water-friendly leaving group. Bioorg. Med. Chem. Lett. 2009; 19:3892–3894. [DOI] [PubMed] [Google Scholar]
  • 17. Carter C.W. Jr. What RNA world? why a peptide/RNA partnership merits renewed experimental attention. Life. 2015; 5:294–320. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Carter C.W. Jr., Li L., Weinreb V., Collier M., Gonzales-Rivera K., Jimenez-Rodriguez M., Erdogan O., Chandrasekharan S.N. The Rodin-Ohno hypothesis that two enzyme superfamilies descended from one ancestral gene: an unlikely scenario for the origins of translation that will not be dismissed. Biol. Direct. 2014; 9:11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Buehner M., Ford G.C., Moras D., Olsen K.W., Rossmann M.G.. D-glyceraldehyde 3-phosphate dehydrogenase: three dimensional structure and evolutionary significance. Proc. Nat. Acad. Sci. USA. 1973; 70:3052–3054. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Rodin S.N., Ohno S.. Two types of aminoacyl-tRNA synthetases could be originally encoded by complementary strands of the same nucleic acid. Orig. Life Evol. Biosph. 1995; 25:565–589. [DOI] [PubMed] [Google Scholar]
  • 21. Rodin S.N., Rodin A.. Partitioning of aminoacyl-tRNA synthetases in two classes could have been encoded in a strand-symmetric RNA world. DNA Cell Biol. 2006; 25:617–626. [DOI] [PubMed] [Google Scholar]
  • 22. Jumper J., Evans R., Pritzel A., Green T., Figurnov M., Ronneberger O., Tunyasuvunakool K., Bates R., Žídek A., Potapenko A.et al.. Highly accurate protein structure prediction with AlphaFold. Nature. 2021; 596:583–592. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Ju Y., Han L., Chen B., Luo Z., Gu Q., Xu J., Yang X.-L., Schimmel P., Zhou H.. X-shaped structure of bacterial heterotetrameric tRNA synthetase suggests cryptic prokaryote functions and a rationale for synthetase classifications. Nucl. Acids Res. 2021; 49:10106–10119. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Uwer U., Willmitzer L., Altmann T.. Inactivation of a glycyl-tRNA synthetase leads to an arrest in plant embryo development. Plant Cell. 1998; 10:1277–1294. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Alvarez-Carreño C., Arciniega M., Ribas de Pouplana L., Petrov A.S., Hernández-González A., Dimas-Torres J.-U., Valencia-Sànchez M.I., Wiliams L.D., Torres-Larios A.. Common evolutionary origins of the bacterial glycyl tRNA synthetase and alanyl tRNA synthetase. Prot. Sci. 2024; 33:e4844. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Valencia-Sànchez M.I., Rodríguez-Hernàndez A., Ferreira R., Santamaría-Suàrez H.A., Arciniega M., Dock-Bregeon A.-C., Moras D., Beinsteiner B., Mertens H., Svergun D.et al.. Structural insights into the polyphyletic origins of glycyl tRNA synthetases. J. Biol. Chem. 2016; 291:14430–14446. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Shiba K. Ibba M., Francklyn C., Cusack S.. The Aminoacyl-tRNA Synthetases. 2015; Austin: Landes Bioscience. [Google Scholar]
  • 28. Wagar E.A., Giese M.J., Yasin B., Pang M.. The glycyl-tRNA synthetase of Chlamydia trachomatis. J. Bact. 1995; 177:5179–5185. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Han L., Luo Z., Ju Y., Chen B., Taotao Z., Wang J., Xu J., Gu Q., Yang X.-L., Schimmel P.et al.. The binding mode of orphan glycyl-tRNA synthetase with tRNA supports the synthetase classification and reveals large domain movements. Sci. Adv. 2023; 9:eadf1027. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Nagato Y., Yamashita S., Ohashi A., Furukawa H., Takai K., Tomita K., Tomikawa C.. Mechanism of tRNA recognition by heterotetrameric glycyl-tRNA synthetase from lactic acid bacteria. J. Biochem. 2023; 174:291–303. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Yu Z., Wu Z., Li Y., Hao Q., Cao X., Blaha G.M., Lin J., Lu G.. Structural basis of a two-step tRNA recognition mechanism for plastid glycyl-tRNA synthetase Nucl. Acids Res. 2023; 51:4000–4011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. Eriani G., Delarue M., Poch O., Gangloff J., Moras D.. Partition of tRNA synthetases into two classes based on mutually exclusive sets of sequence motifs. Nature. 1990; 347:203–206. [DOI] [PubMed] [Google Scholar]
  • 33. Tan K., Zhou M.G., Zhang R., Anderson W.F., Joachimiak A.. The crystal structures of the α-subunit of the α2β2 tetrameric Glycyl-tRNA synthetase. J. Struct. Funct. Genom. 2012; 13:233–239. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34. Tang G.Q., Elder J.J.H., Douglas J., Carter C.W. Jr.. Domain acquisition by class I aminoacyl-tRNA synthetase urzymes coordinated the catalytic functions of HVGH and KMSKS motifs. Nucleic Acids Res. 2023; 51:8070–8084. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35. Peng Y., Li H., Liu Z., Zhang C., Li K., Gong Y., Liying G., Su J., Guan X., Liu L.et al.. Chromosome-level genome assembly of the Arctic fox (Vulpes lagopus) using PacBio sequencing and Hi-C technology. Mol. Ecol. Res. 2021; 21:2093–2108. [DOI] [PubMed] [Google Scholar]
  • 36. Patra S.K., Sinha N., Molla F., Sengupta A., Chakraborty S., Roy S., Ghosh S.. In-vivo protein nitration facilitates Vibrio cholerae cell survival under anaerobic, nutrient deprived conditions. Arch. Biochem. Biophys. 2022; 728:109358. [DOI] [PubMed] [Google Scholar]
  • 37. Fersht A.R., Ashford J.S., Bruton C.J., Jakes R., Koch G.L.E., Hartley B.S.. Active site titration and aminoacyl adenylate binding stoichiometry of amionacyl-tRNA synthetases. Biochem. 1975; 14:1–4. [DOI] [PubMed] [Google Scholar]
  • 38. Francklyn C.S., First E.A., Perona J.J., Hou Y.-M.. Methods for kinetic and thermodynamic analysis of aminoacyl-tRNA synthetases. Methods. 2008; 44:100–118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39. Schneider C.A., Rasband W.S., Eliceiri K.W.. NIH Image to ImageJ: 25 years of image analysis. Nature Meth. 2012; 9:671–675. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. Patra S.K., Samaddar S., Sinha N., Ghosh S.. Reactive nitrogen species induced catalases promote a novel nitrosative stress tolerance mechanism in Vibrio cholerae. Nitric Oxide. 2019; 88:35–44. [DOI] [PubMed] [Google Scholar]
  • 41. Goody R.S. How not to do kinetics: examples involving GTPases and guanine nucleotide exchange factors. FEBS J. 2014; 281:593–600. [DOI] [PubMed] [Google Scholar]
  • 42. Gilman A.G. G Proteins: transducers of receptor-generated signals. Ann. Rev. Biochem. 1987; 56:615–659. [DOI] [PubMed] [Google Scholar]
  • 43. Johnson K.A. New standards for collecting and fitting steady state kinetic data. Beilstein J. Org. Chem. 2019; 15:16–29. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44. Box G.E.P., Hunter W.G., Hunter J.S.. Statistics for Experimenters. 1978; NY: Wiley Interscience. [Google Scholar]
  • 45. Valencia-Sa′nchez M.I., Rodríguez-Herna′ndez A., Ferreira R., Santamaría-Sua′rez H.A., Arciniega M., Dock-Bregeon A.-C., Moras D., Beinsteiner B., Mertens H., Svergun D.et al.. Structural insights into the polyphyletic origins of glycyl tRNA synthetases. J. Biol. Chem. 2016; 291:14430–14446. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46. Burbaum J.J., Schimmel P.. Amino acid binding by the Class I Aminoacyl-tRNA synthetases: role for a conserved proline in the signature sequence. Protein Sci. 1992; 1:575–581. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47. Burbaum J.J., Schimmel P.. Assembly of a class I tRNA synthetase from products of an artificially split gene. Biochem. 1991; 30:319–324. [DOI] [PubMed] [Google Scholar]
  • 48. Wang S., Peng J., Xu J.. Alignment of distantly related protein structures: algorithm, bound and implications to homology modeling. Bioinformatics. 2011; 27:2537–2545. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49. Pymol Schrödinger. 2010; NY: LLC. [Google Scholar]
  • 50. Orsburn B.C. Proteome discoverer—a community enhanced data processing suite for protein informatics. Proteomes. 2021; 9:9010015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51. Wilkesman J., Kurz L.. Zymography Principles. Zymography. 2017; 1626:3–10. [DOI] [PubMed] [Google Scholar]
  • 52. Cestari I., Stuart K.. A spectrophotometric assay for quantitative measurement of aminoacyl-tRNA synthetase activity. J. Biomol. Screen. 2013; 18:490–497. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53. Takénaka A., Moras D.. Correlation between equi-partition of aminoacyl-tRNA synthetases and amino-acid biosynthesis pathways. Nucl. Acids Res. 2020; 48:3277–3285. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54. Fry D.C., Byler D.M., Sisu H., Brown E.M., Kuby S.A., Mildvan A.S. Solution structure of the 45-residue MgATP-binding peptide of adenylate kinase as examined by 2-D NMR, FTIR, and CD spectroscopy. Biochemistry. 1988; 27:3588–3598. [DOI] [PubMed] [Google Scholar]
  • 55. Fry D.C., Kuby S.A., Mildvan A.S. NMR studies of the MgATP binding site of adenylate kinase and of a 45-residue peptide fragment of the enzyme. Biochemistry. 1985; 24:4680–4694. [DOI] [PubMed] [Google Scholar]
  • 56. Weinreb V., Li L., Chandrasekaran S.N., Koehl P., Delarue M., Carter C.W. Jr.. Enhanced amino acid selection in fully-evolved tryptophanyl-tRNA synthetase, relative to its urzyme, requires domain movement sensed by the D1 switch, a remote, dynamic packing motif. J. Biol. Chem. 2014; 289:4367–4376. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57. Weinreb V., Li L., Carter C.W. Jr.. A master switch couples Mg²⁺-assisted catalysis to domain motion in B. stearothermophilus tryptophanyl-tRNA Synthetase. Structure. 2012; 20:128–138. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58. Williams C.J., Headd J.J., Moriarty N.W., Prisant M.G., Videau L.L., Deis L.N., Verma V., Keedy D.A., Hintze B.J., Chen V.B.et al.. MolProbity: more and better reference data for improved all-atom structure validation. Prot. Sci. 2018; 27:293–315. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59. Hotopp J.C.D. Horizontal gene transfer between bacteria and animals. Tr. Genet. 2011; 27:157–163. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60. Wong F., Beuning P., Silvers C., K M.-F. An isolated class II aminoacyltRNA synthetase insertion domain is functional in amino acid editing. J. Biol. Chem. 2003; 278:52857–52864. [DOI] [PubMed] [Google Scholar]
  • 61. Scalzitti N.-G., Anne Collet P.R., Poch O., Thompson J.D. A benchmark study of ab initio gene prediction methods in diverse eukaryotic organisms. BMC Genomics. 2020; 21:293. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62. Li L., Carter C.W. Jr.. Full implementation of the genetic code by tryptophanyl-tRNA synthetase requires intermodular coupling. J. Biol. Chem. 2013; 288:34736–34745. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63. Goldgur Y., Mosyak L., Reshetnikova L., Ankilova V., Lavrik O., Khodyreva S., Safro M.. The crystal structure of phenylalanyl-tRNA synthetase from thermus thermophilus complexed with cognate tRNAPhe. Structure. 1997; 5:59–68. [DOI] [PubMed] [Google Scholar]
  • 64. Mosyak L., Reshetnikova L., Goldgur Y., Delarue M., Safro M.G.. Structure of phenylalanyl-tRNA synthetase from Thermus thermophilus. Nat. Struct. Biol. 1995; 2:537–547. [DOI] [PubMed] [Google Scholar]
  • 65. Kaiser F., Bittrich S., Salentin S., Leberecht C., Haupt V.J., Krautwurst S., Schroeder M., Labudde D.. Backbone brackets and arginine tweezers delineate class I and class II aminoacyl tRNA synthetases. PLoS Comput. Biol. 2018; 14:e1006101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66. Arnez J.G., Harris D.C., Mitschler A., Rees B., Francklyn C.S., Moras D.. Crystal structure of histidyl-tRNA synthetase from Escherichia coli complexed with histidyl-adenylate. EMBO J. 1995; 14:4143–4155. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67. Koonin E.V., Novozhilov A.S.. Origin and evolution of the universal genetic code. Annu. Rev. Genet. 2017; 51:45–62. [DOI] [PubMed] [Google Scholar]
  • 68. Kauffman S.A., Niles L.. Mixed anhydrides at the intersection between peptide and RNA autocatalytic sets: evolution of biological coding. J. Roy. Soc. Int. 2023; 13:20230009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69. Kondratyeva L.G., Dyachkova M.S., Galchenko A.V.. The origin of genetic code and translation in the framework of current concepts on the origin of life. Biochemistry (Moscow). 2022; 87:150–169. [DOI] [PubMed] [Google Scholar]
  • 70. Wong J.T.-F., Ng S.-K., Mat W.-K., Hu T., Xue H.. Coevolution theory of the genetic code at age forty: pathway to translation and synthetic life. Life. 2016; 6:12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71. Kovacs N.A., Petrov A.S., Lanier K.A., Williams L.D.. Frozen in time: the history of proteins. Mol. Biol. Evol. 2017; 34:1252–1260. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72. Carter C.W. Jr., Popinga A., Bouckaert R., Wills P.R. Multidimensional phylogenetic metrics identify class I aminoacyl-tRNA synthetase evolutionary mosaicity and inter-modular coupling. Int. J. Mol. Sci. 2022; 23:1520. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73. Pham Y., Li L., Kim A., Weinreb V., Butterfoss G., Kuhlman B., Carter C.W. Jr.. A minimal TrpRS catalytic domain supports sense/antisense ancestry of class I and II aminoacyl-tRNA synthetases. Mol. Cell. 2007; 25:851–862. [DOI] [PubMed] [Google Scholar]
  • 74. Jacob F. Evolution and tinkering. Science. 1977; 196:1161–1167. [DOI] [PubMed] [Google Scholar]
  • 75. Lavorgna G., Patthy L., Boncinelli E.. Were protein internal repeats formed by ‘bricolage’?. Tr. Genet. 2001; 17:120–123. [DOI] [PubMed] [Google Scholar]
  • 76. de Farias S.T., Antonino D., Rêgo T.G., José M.V.. Structural evolution of Glycyl-tRNA synthetases alpha subunit and its implication in the initial organization of the decoding system. Progr. Biophys. Mol. Biol. 2019; 142:43–50. [DOI] [PubMed] [Google Scholar]
  • 77. Carter C.W. Jr., Wills P.R. Class I and II aminoacyl-tRNA synthetase tRNA groove discrimination created the first synthetase•tRNA cognate pairs and was therefore essential to the origin of genetic coding. IUBMB Life. 2019; 71:1088–1098. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78. Carter C.W. Jr., Wills P.R. Experimental solutions to problems defining the origin of codon-directed protein synthesis. Biosystems. 2019; 183:103979. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79. Carter C.W. Jr., Wills P.R. Hierarchical groove discrimination by Class I and II aminoacyl-tRNA synthetases reveals a palimpsest of the operational RNA code in the tRNA acceptor-stem bases. Nucleic Acids Res. 2018; 46:9667–9683. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80. Wills P.R., Carter C.W. Jr.. Impedance matching and the choice between alternative pathways for the origin of genetic coding. Int. J. Mol. Sci. 2020; 21:7392. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

gkae992_Supplemental_File

Data Availability Statement

All data will be provided upon request from the author.


Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press

RESOURCES