Skip to main content
Molecular Biology and Evolution logoLink to Molecular Biology and Evolution
. 2017 Nov 8;35(2):404–416. doi: 10.1093/molbev/msx292

Reciprocal Nucleopeptides as the Ancestral Darwinian Self-Replicator

Eleanor F Banwell 1,#, Bernard M A G Piette 2,#, Anne Taormina 2, Jonathan G Heddle 1,3,
PMCID: PMC5850689  PMID: 29126321

Abstract

Even the simplest organisms are too complex to have spontaneously arisen fully formed, yet precursors to first life must have emerged ab initio from their environment. A watershed event was the appearance of the first entity capable of evolution: the Initial Darwinian Ancestor. Here, we suggest that nucleopeptide reciprocal replicators could have carried out this important role and contend that this is the simplest way to explain extant replication systems in a mathematically consistent way. We propose short nucleic acid templates on which amino-acylated adapters assembled. Spatial localization drives peptide ligation from activated precursors to generate phosphodiester-bond-catalytic peptides. Comprising autocatalytic protein and nucleic acid sequences, this dynamical system links and unifies several previous hypotheses and provides a plausible model for the emergence of DNA and the operational code.

Keywords: Initial Darwinian Ancestor, abiogenesis, RNA world, protein world, nucleopeptide replicator, reciprocal replicator, polymerase, ribosome, evolution, early earth, hypercycle

Introduction

In contrast to our good understanding of more recent evolution, we still lack a coherent and robust theory that adequately explains the initial appearance of life on Earth (abiogenesis). In order to be complete, an abiogenic theory must describe a path from simple molecules to the Last Universal Common Ancestor (LUCA), requiring only a gradual increase in complexity.

The watershed event in abiogenesis was the emergence of the Initial Darwinian Ancestor (IDA): the first self-replicator (ignoring dead ends) and ancestral to all life on Earth (Yarus 2011). Following the insights of von Neumann, who proposed the kinematic model of self-replication (Kemeny 1955), necessary features of such a replicator are: Storage of the information for how to build a replicator; a processor to interpret information and select parts; an instance of the replicator.

In order to be viable, any proposal for the IDA’s structure must fit with spontaneous emergence from prebiotic geochemistry and principles of self-replication. Currently, the most dominant abiogenesis theory is the “RNA world,” which posits that the IDA was a self-replicating ribozyme, that is, an RNA-dependent RNA polymerase (Cech 2012). Although popular, this theory has problems (Kurland 2010). For example, while it is plausible that molecules with the necessary replication characteristics can exist, length requirements seem to make their spontaneous emergence from the primordial milieu unlikely, nor does the RNA world explain the appearance of the operational code (Noller 2012; Robertson and Joyce 2012). Furthermore, it invokes three exchanges of function between RNA and other molecules to explain the coupling of polynucleotide and protein biosynthesis, namely transfer of information storage capability to DNA and polymerase activity to protein as well as gain of peptide synthesis ability. This presents a situation in which no extant molecule continues in the role it initially held. Others have posited peptide and nucleopeptide worlds as solutions.

The peptide world theory proposes a spontaneously occurring self-replicating peptide with RNA synthesis, DNA and the operational code appearing later, and possible self-replicating mechanisms of peptides have been explored (Fox and Harada 1958; Lee et al. 1996). Nucleopeptide theories require that the replicator consist of both peptides and nucleic acids and may involve their covalent linkage or (as in our proposal) noncovalent conjugation. Covalently linked nucleopeptides include nucleobase-containing peptides such as PNA which has been mooted as a possible precursor to the RNA world (Miller 1997) and possible RNA-interacting nucleo-∈-peptides have been synthesized (Roviello et al. 2009; Nelson et al. 2000). Both the peptide world and nucleopeptide theories consist of single molecular classes and therefore suffer the same exchange of function problems as the RNA-world theory. To the best of our knowledge, no single theory has emerged that parsimoniously answers the biggest questions.

Here, we build on several foregoing concepts to propose an alternative theory based around a nucleopeptide reciprocal replicator that uses its polynucleotide and peptide components according to their strengths, thus avoiding the need to explain later exchange of function and coupling. We advocate a view of the IDA resulting from a biochemical system which we describe as a dynamical system, that is, a system of equations describing the changes that occur over time in the self-replicator presented here, and we demonstrate that such an entity is both mathematically consistent and complies with all the logical requirements for life. While necessarily wide in view we hope that this work will provide a useful framework for further investigation of this fundamental question.

Model and Results

Solving the Chicken and Egg Problem

Given that any IDA must have been able to replicate in order to evolve, extant cellular replication machinery is an obvious source of clues to its identity. Common ancestry means that features shared by all life were part of LUCA. By examining the common replication components present in LUCA, and then extrapolating further back to their simplest form, it is possible to reach a pre-LUCA, irreducibly complex, core replicator (fig. 1).

Fig. 1.

Fig. 1.

Replication schemes. (a) This simplified cellular replication schematic is common to all life today and likely reflects the ancestral form present in LUCA. Shading by molecule type (purple for nucleic acid and orange for protein), reveals a reciprocal nucleopeptide replicator. Although the ribosome is a large nucleoprotein complex, the catalytic centre has been shown to be a ribozyme (Moore and Steitz 2003) and so it is shaded purple in this scheme. (b) Comparison of the method of action of the extant ribosome with the proposed primordial analogue (components are shaded like for like). Today, tRNA molecules (mid purple) loaded with amino acids (orange) bind the mRNA (dark purple) in the ribosome (light purple), which co-ordinates and catalyses the peptidyl-transferase reaction. Although the present day modus operandi is regulated via far more complex interactions than the primordial version, the two schemes are fundamentally similar. Mixed nucleic acid structures, one performing a dual function as primordial mRNA and primordial ribosome (p-Rib) and a second functioning as a primordial tRNA (p-tRNA), provide a system wherein the former structure templates amino acid-loaded molecules of the latter.

We see that in all cells, the required functions of a replicator are not carried out by a single molecule or even a single class of molecules, rather they are performed variously by nucleic acids (DNA, RNA) and proteins. When viewed by molecular class, the replicator has two components and is reciprocal in nature: polynucleotides rely on proteins for their polymerization and vice versa. The question of which arose first is a chicken and egg conundrum that has dogged the field since the replication mechanisms were first elucidated (Giri and Jain 2012). In this work, we suggest that, consistent with common ancestry and in contrast with the RNA world theory, the earliest replicator was a two—rather than a one—component system, composed of peptide and nucleic acids.

Assumptions of the Model

We postulate that, in a nucleopeptide reciprocal replicator, the use of each component according to its strengths could deliver a viable IDA more compatible with evolution to LUCA replication machinery. Although seemingly more complex than an individual replicating molecule, the resulting unified abiogenesis theory answers many hard questions and is ultimately more parsimonious. The model does not consider in detail the chemistry of how the building blocks that constitute the IDA (short peptides and nucleic acids) came about as these details are covered in the cited literature (see for example, Saladino et al. 2012; Patel et al. 2015; Da Silva et al. 2015; Leman et al. 2004; Liu et al. 2014; Martin et al. 2008). Rather, we concentrate on the important question of the mathematical validity of the IDA in terms of its ability to sustainably self-replicate, without which it would not be a valid system. In constructing our model, we make the following assumptions:

  • (i) The existence of random sequences of short strands of mixed nucleic acids (XNA) likely consisting of ribonucleotides, deoxyribonucleotides and possibly other building blocks able topolymerizewith nucleotide chains, as well as the existence of random amino acids and short peptides produced abiotically.

For this first assumption we have supposed a pool of interacting amino acids, nucleotides and related small molecules as well as a supply of metal ions, other inorganic catalysts and energy. The precise understanding of the “metabolic” reactions in which these precursor building blocks were formed is in itself an extremely important question but is not considered here as a number of potential early earth conditions and reaction pathways resulting in these outcomes have already been proposed, including the formamide reaction (Saladino et al. 2012) and cyanosulfidic chemistries (Patel et al. 2015). Recent experimental models of alkaline hydrothermal vents have even succeeded in producing various organic molecules including ribose and deoxyribose (Herschy et al. 2014). Pools of pure molecules are unlikely; instead, mixtures would likely have comprised standard and nonstandard amino acids as well as XNAs with mixed backbone architectures, being, in their simplest forms, mixtures of deoxy- and ribonucleotides (Trevino et al. 2011; Pinheiro et al. 2012) with other building blocks being possible. For simplicity we sometimes refer to XNAs as “polynucleotides.” Such conditions would be conducive to the occasional spontaneous covalent attachment of nucleotides to each other to form longer polymer chains (Da Silva et al. 2015).

  • (ii) The existence of abiotically aminoacylated short XNA strands (primordial tRNAs (p-tRNAs))

The second assumption is potentially troubling as amino acid activation is slow and thermodynamically unfavorable. However, amino acylation has been investigated in some detail and has been shown to be possible abiotically including, in some cases, the abiotic production of activated amino acids (Illangasekare et al. 1995; Leman et al. 2004; Giel-Pietraszuk and Barciszewski 2006; Lehmann et al. 2007; Turk et al. 2010; Liu et al. 2014). A pool of activated amino acids allows us to presume a fast rate of charging of p-tRNAs meaning that we can assume that the rate of charged p-tRNA formation is proportional to the concentration of free amino acids. Taken together these data suggest that multiple small amino-acylated tRNA-like primordial XNAs could have arisen. Though likely being XNA in nature, we refer to them as p-tRNA, reflecting their function. A similar nomenclature applies to p-Rib and p-mRNA.

  • (iii) Conditions that allow a codon/anticodon interaction between two or more charged p-tRNA for sufficient time and appropriate geometry to allow peptide bond formation, that is, the functionality of a primordial ribosome (p-Rib)

Our proposed p-Rib is an extreme simplification of the functionality of both the present day ribosome and mRNA (fig. 1). Initially, the p-Rib need only have been a (close to) linear assembly template for the p-tRNAs to facilitate the peptidyl transferase reaction through an increase in local concentration. This mechanism is simple enough to emerge spontaneously and matches exactly the fundamental action of the extant ribosome (fig. 2). The idea that a p-Rib may have an internal template rather than separate mRNA molecules and that an RNA strand could act as a way to bring charged tRNAs together has previously been suggested (Schimmel and Henderson 1994; Wolf and Koonin 2007; Morgens 2013) and is known as an “entropy trap” (Sievers et al. 2004; Ruiz-Mirazo et al. 2014). The concept has been demonstrated to be experimentally viable (Tamura and Schimmel 2003) although in the latter case it is the primordial ribosomal rRNA strand itself that provides one of the two reacting amino acids.

Fig. 2.

Fig. 2.

Models of primitive polymerization reactions. An XNA strand can function like a primordial ribosome (p-Rib) whereby one strand (+ strand) can template the production of a primordial polymerase (p-Pol) as indicated by the solid arrow. The action of this p-Pol is represented by the double-headed dotted arrow whereby it acts on the p-Rib (+ strand) to catalyze synthesis of the complementary sequence (− strand) and also on the − strand to produce more of the + strand.

A functional operational system requires preferential charging of particular p-tRNAs to specific amino acids. Although there is evidence for such relationships in the stereochemical theory (Woese 1965; Yarus et al. 2009), so far unequivocal proof has been elusive (Yarus et al. 2005; Koonin and Novozhilov 2009). However, there is sufficient evidence to suggest at least a separation along grounds of hydrophobicity and charge using just a two-base codon (Knight and Landweber 2000; Biro et al. 2003; Rodin et al. 2011). Furthermore only a reduced set of amino acids (Angyan et al. 2014)—possibly as few as four (Ikehara 2002)—need to have been provided in this way. The “statistical protein” hypothesis proposes that such a weak separation may have been sufficient to produce populations of active peptides (Ikehara 2005; Vetsigian et al. 2006). Such “primordial polymerases” (p-Pol) need only have been small (see below) and spontaneous emergence of a template coding loosely for such a sequence seems plausible. The failure rate of such syntheses would be high but a p-Rib using the outlined primordial operational code to produce statistical p-Pol peptides could have been accurate enough to ensure its own survival.

  • (iv) Theviability of averyshortpeptidesequence tofunction as an RNA-dependent RNApolymerase

Templated ligation is often proposed as a primordial self-replication mechanism, particularly for primitive replication of nucleic acid in RNA world type scenarios. However, these are associated with a number of problems as mentioned earlier. In addition, extant RNA/DNA synthesis proceeds via terminal elongation (Paul and Joyce 2004; Vidonne and Philp 2009). To be consistent with the mechanism present in LUCA and pre-LUCA, the p-Pol should, preferably, have used a similar process.

During templated ligation, a parent molecule binds and ligates short substrates that must then dissociate to allow further access, but the product has greater binding affinity than the substrates and dissociation is slow. This product inhibition results in parabolic growth and limits the usefulness of templated ligation for replication (Issac and Chmielewski 2002). Conversely, in 1D sliding (or more accurately jumping), the catalyst may dock anywhere along a linear substrate and then diffuse by “hops” randomly in either direction until it reaches the reaction site; a successful ligation reaction has little impact on binding affinity and leaves the catalyst proximal to the next site. For simplicity our model assumes a single binding event between p-Pol and p-Rib followed by multiple polymerization events. A p-Pol proceeding via 1D sliding could catalyze phosphodiester bond formation between nucleotides bound by Watson and Crick base-pairing to a complementary XNA strand. Because p-Pol activity would be independent of substrate length, a relatively small catalyst could have acted on XNAs of considerable size. From inspection of present day polymerases such a peptide may have included sequences such as DxDGD and/or GDD known to be conserved in their active sites and consisting of the amino acids thought to be amongst the very earliest in life (Iyer et al. 2003; Koonin 1991).

In our simple system any such p-Pol must be very short to have any realistic chance of being produced by the primitive components described. We must therefore ask if there is evidence that small (e.g. <11 amino acid) peptides can have such a catalytic activity. Catalytic activity in general has been demonstrated for molecules as small as dipeptides (Kochavi et al. 1997). For polymerase activity in particular, it is known that randomly produced tripeptides can bind tightly and specifically to nucleotides (Schneider et al. 2000; McCleskey et al. 2003). We suggest that a small peptide could arise with the ability to bind divalent metal ions, p-Rib and incoming nucleotides. It is interesting to note that small peptides can assemble into large and complex structures (Bromley et al. 2008; Fletcher et al. 2013) with potentially sophisticated functionality: di- and tripeptides can self-assemble into larger nanotubes and intriguingly it has even been suggested that these structures could have acted as primitive RNA polymerases (Carny and Gazit 2005).

In summary, the essence of the model is that on geological timescales, short linear polynucleotides may have been sufficient to template similar base-pairing interactions to those seen in the modern ribosome with small amino-acylated adapters. Given that the majority of ribosome activity stems from accurate substrate positioning, such templating could be sufficient to catalyze peptide bond formation and to deliver phosphodiester-bond-catalytic peptides. As backbone ligation reactions are unrelated to polynucleotide sequence, these generated primordial enzymes could have acted on a large subset of the available nucleic acid substrates, in turn producing more polynucleotide templates and resulting in an autocatalytic system.

Mathematical Model

The IDA described above is attractive both for its simplicity and continuity with the existing mixed (protein/nucleic acid) replicator system in extant cells. However, the question remains as to whether such a system is mathematically consistent, could avoid collapse and instead become self-sustaining. The number of parameters and variables needed to analyze the system in its full complexity is such that one is led to consider simplified models which nevertheless capture essential features of interest. Here we consider a simple model of RNA–protein self-replication.

Constituents

The main constituents of the simplest model of XNA-protein self-replication considered here (see also figs. 1b and 2) are a pool of free nucleotides and amino acids, polypeptide chains—including a family of polymerases—and polynucleotide chains as well as p-tRNAs loaded with single amino acids.

We introduce some notations. Generically, we consider polymer chains Π made of n types of building blocks labeled 1,,n. In our models, the polymer chains are polypeptides and polynucleotides, and the building blocks are amino acids and codons respectively. With a slight abuse of language, we call the number of constituents (building blocks) of a polymer chain its length. So hereafter, “lengths” are dimensionless. The order in which these constituents appear in any chain is biologically significant, and we encode this information in finite ordered sequences of arbitrary length L denoted S{L}=(s1,s2,,sL), whose elements sj,j=1,L label the building blocks forming the chains, in the order indicated in the sequences. Each element sj in the sequence S{L} is an integer in the set {1,,n} which refers to the type of building block occupying position j in the chain. There are therefore nL sequences of length L if the model allows n types of building blocks. For instance, the sequence S{5}=(1,4,3,1,3) in a model with, say, n = 4 types of building blocks (amino acids or codons), corresponds to a polymer chain of length 5 whose first component is a type 1 building block, the second component is a type 4 and so on. Given a sequence S{L}, we introduce subsequences S{L,j}=(s1,s2,sj) (resp. S{L,j}^=(sLj+1,sLj+2,sL)), j=1,L, whose elements are the j leftmost (resp. rightmost) elements of S{L}. In particular, S{L,L}S{L,L}^S{L},S{L,1}=s1 and S{L,1}^=sL. We write

S{L}=(S{L,L},S{L,}^),  0<<L.

In what follows we sometimes refer to families of polymer chains differing only by their length and obtained by removing some rightmost building blocks from a chain of maximum length Lmax. Denoting by ΠS a polymer chain of length and sequence S{} or subsequence S{L,}, both having elements with L>, the family of polymer chains obtained from a chain of maximal length Lmax and sequence S{Lmax} is given by {ΠS}=1,2,Lmax.

In the specific case of XNA/polynucleotide chains entering our model, we use Π=R and the sequences are generically labeled as α{}. Their elements correspond to types of codons, and the complementary codon sequences in the sense of nucleic acids complementarity are α¯{}. Therefore, a large class of XNA strands of length and sequence α{} are denoted by Rα, and in particular, R1α1 is a codon of type α1. Besides the generic sequences α{} introduced above, a sequence denoted π{Lmax}, together with its subsequences π{Lmax,} and π{Lmax,}^ for =1,Lmax play a specific role: they correspond to polynucleotide chains that template the polymerization of a family of primordial peptide polymerases (p-Pol) through a process described in the next subsection, see also figure 3. Using Π=P to denote polypeptide chains, this family of polymerases derived from PLmax of maximal length Lmax, is {Pπ}=2,,Lmax. These polymerases are such that Pπ=P1π+P1π, with P1π an amino acid π. We use the notation Pπ for a generic polymerase in the family. Alongside these polymerases, generic polypeptide chains of length and sequence α{} are labeled as Pα. Proteins of length 1, P1α1, are single amino acids of type α1.

Fig. 3.

Fig. 3.

Mechanism (B): Polypeptide polymerization in our model. The square boxes represent the codons of a polynucleotide chain (here, of length L = 4) and the circles represent amino acids. The p-tRNA molecules are labeled T1,,T4.

RNA–Protein Replication Scenario

The scenario relies on three types of mechanisms:

  1. The spontaneous polymerization of polynucleotide and polypeptide chains, assumed to occur at a very slow rate, and their depolymerization through being cleaved in two anywhere along the chains at a rate independent of where the cut occurs.

  2. The nonspontaneous polypeptide polymerization occurring through a polynucleotide chain RLS on which several p-tRNA molecules loaded with an amino acid dock and progressively build the polypeptide chain. More precisely, each codon of type s of the polynucleotide chain binds with a p-tRNA, itself linked to an amino acid of type s. Note that we assume the same number n of types of codons and amino acids. This leads to a chain of amino acids matching the codon sequence S{L} of the polynucleotide chain. The process is illustrated in figure 3 for a polypeptide chain of length L = 4 and amino acid sequence S{4}=(s1,s2,s3,s4).

  3. The duplication of a polynucleotide chain RLS, of length Lπmin, as a two-step process. In the first step, a polypeptide polymerase Pπ, obtained by polymerization via mechanism (B) using a polynucleotide RLπ, scans the polynucleotide chain RLS to generate its complementary polynucleotide chain RLS¯. This is shown in figure 4. The resulting polynucleotide chain RLS¯ is then used to generate a copy of the original polynucleotide chain RLS via the same mechanism (C).

Fig. 4.

Fig. 4.

First phase of Mechanism (C): Polymerization of the complementary polynucleotide chain RLS¯ catalyzed by a primordial polymerase Pπ.

The replicator crudely operates as follows:

  • Mechanism (A) provides a small pool of polymer chains; among them, one finds short strands of XNA with dual function (p-mRNA and p-Rib)

  • Mechanism (B) provides polypeptide chains, including the polymerases (p-Pol, called Pπ here), by using the XNA produced through Mechanism (A) and Mechanism (C)

  • Pπ are involved, through Mechanism (C), in the duplication of polynucleotides present in the environment, including the strands of XNA that participate in the very production of Pπ

Reactions Driving the Replication and Physical Parameters

For simplicity, we consider the polymerization of polypeptide chains and the duplication of polynucleotide chains as single reactions where the reaction rates take into account all subprocesses as well as failure rates.

This leads to the following schematic reactions:

Mechanism(A)
RLS+R1sL+1RL+1S, (1)
RLSRLS+RS^  =1,,L1, (2)
PLS+P1sL+1PL+1S, (3)
PLSPLS+PS^  =1,,L1. (4)
Mechanism(B)
RLS+L×TRPRLS+PLS. (5)
Mechanism(C)
RLS+L×R1PπRLS+RLS¯, (6)

where TRP denotes p-tRNA loaded with a single amino acid.

The parameters for these reactions are (see the Supplementary Material online for more details on the estimation of the parameter values):

  • KR+: polymerization rate of polynucleotide chains (eq. 1); we have estimated the catalyzed XNA polymerization rate to be 4.2×107mol1m3s1.

  • KR: depolymerization rate of polynucleotide chains (hydrolysis) (eq. 2); taken to be 8×109s1.

  • KP+: polymerization rate of polypeptide chains (eq. 3); we have estimated it to be 2.8×1021mol1m3s1.

  • KP,S,L: depolymerization rate of polypeptide chains of length L and sequence S (eq. 4); we have estimated it to be in the range 4×1011s15.1×106s1.

  • kP,L+: polymerization rate of a polypeptide of length L from the corresponding polynucleotide chain (eq. 5). It is reasonable to assume that kP,L+=kP,1+/L and we have estimated kP,1+ to be 0.1mol1m3s1.

  • Z: the rate at which a polymerase attaches to a polynucleotide chain (eq. 6) which we have estimated to be 106mol1m3s1.

  • hR: the rate of attachment of a free polynucleotide to a polynucleotide chain attached to a p-Pol (eq. 6). We have estimated it to be 106mol1m3s1.

  • kstep: the rate at which a polymerase moves by one step on the polynucleotide (eq. 6). We have estimated it to be in the range 2×102s14×105s1.

We now argue that the three parameters Z,hR and kstep enter the dynamical system for the polymer concentrations in our model as two physical combinations denoted K(L) and Pb that we describe below.

First recall that we assume the existence of a pool of nucleotides, amino acids and p-tRNA. The amount of free nucleotides and amino acids is taken to be the difference between the total amount of these molecules and the total amount of the corresponding polymerized material, ensuring total conservation.

We denote the concentration of polypeptide and polynucleotide chains respectively by PLα,PLπ,PLπ¯ and RLα,RLπ,RLπ¯, all expressed in molm3molm-3. In particular, P1 and R1 are the concentrations of each type of free amino acids and nucleotides respectively, and we assume, for simplicity, that all types of amino acids/codons are equally available.

We also assume that the amount of loaded p-tRNA, Cp-tRNA, remains proportional to the amount of free amino acids and that the concentration of p-tRNA is larger than P1 so that most amino acids are loaded on a p-tRNA. With these conventions, one has

Cp-tRNA=ktP1  with  kt1. (7)

Total Reaction Rate K(L) of Polynucleotide Polymerization

If a complex reaction is the result of one event at rate K, and m other, identical, events at rate k, the average time to complete the reaction is the sum of the average times for each event. Hence the reaction rate is given by

K˜(K,k,m)=(1K+mk)1=KkmK+k. (8)

One such complex reaction in our model is the polymerization of a polynucleotide chain of length L, say, from its complementary chain (second phase of Mechanism (C)). Polymerases are characterized by the polymerizing efficiency which, we assume, increases with , up to Lmax. The first step in polymerization requires a polymerase to attach itself to the template polynucleotide. This is only possible if the template polynucleotide has a minimum length, which we assume to be πmin. In the following, we assume that polymerases can polymerize polynucleotide chains of any length greater or equal to πmin. The corresponding reaction rate is given by ZPπ for a polymerase of length πmin.

The free nucleotides must then attach themselves to the polynucleotide–polymerase complex and the polymerase must move one step along the polynucleotide. The rate for each of these L steps is

kR+=kstephRR1kstep+hRR1, (9)

and hence, the rate of polymerization for a polynucleotide of length L and polymerase of length is K˜(ZPπ,kR+,L). However, it is assumed that polymerases of several lengths are available and therefore, the total rate is given by

K(L)={=πminLmaxK˜(ZPπ,kR+,L)W,Lπmin0L<πmin, (10)

where it is understood that πmin is the lower bound length for polymerase activity and W is a quality factor given by

W={πmin+1πmaxπmin+1πminπmax1πmax<Lmax. (11)

Indeed, we expect long polymerases to be more efficient, so W is taken to increase with in the range πminπmax, while polymerases of length >πmax have the same level of activity as those with length =πmax, that is, W>max=1.

To avoid proliferation of parameters in our simulations, we have taken πmax=Lmax, where Lmax is the maximal polynucleotide chain’s length.

Binding Probability Pb of a Polynucleotide and a Polymerase of Length L

First note that it takes L times longer to synthesize a polypeptide chain of length L from its corresponding polynucleotide chain than it takes for one amino acid to bind itself to the polynucleotide. The rate is thus given by kP,L+P1=(kP,1+/L)P1.

We now offer some considerations on depolymerization. We assume that if a polymer ΠLS depolymerizes, it does so by (potentially consecutive) cleavings. In the first step, ΠLS can cleave in L – 1 different positions, resulting in two smaller chains L1, L2 with L=L1+L2 and 1L1,2L1. This is the origin of the factor (L1) in the terms describing the depolymerization of polymer chains in the dynamical systems equations presented in the next subsection.

The concentration variations resulting from such depolymerizations must be carefully evaluated. A polymer ΠLS of length L and sequence S, where S stands for any of α, π or π¯, can be obtained by cleaving a polymer ΠS˜ of length >L and sequence S˜=(S,T) where T is a sequence of length L. Similarly it can be obtained by cleaving ΠS˜ of sequence S˜=(T,S) where T is also of length L. If the rate of cleaving, KΠ, is assumed to be independent of the polymer length, and since there are nL different sequences T and T, where n is the number of amino acid or codon types, the rate of concentration variation of polymers of length L resulting from the depolymerization of longer polymers is

=L+1LmaxnLKΠΠS˜+=L+1LmaxnLKΠΠS˜. (12)

Recall that we use the same notation for the concentration of a polymer of sequence S and length L and the polymer itself, namely ΠLS, and Π is supposed to be set to Π=P or Π=R in our model. As already stressed, we assume polymers have at most length Lmax. Finally, when the concentrations ΠLS˜ and ΠLS˜ are equal, (eq. 12) can be rewritten as

2=L+1LmaxnLKΠΠS˜. (13)

The depolymerization of polymerase PLπ requires special treatment. When PLπ depolymerizes, it generates a polymerase Pπ with <L. On the other hand, any PLπ can be obtained through depolymerization of one of 2n types of polymers of length L + 1, one of which being PL+1π and the remaining 2n1 being of type PL+1α with α{L+1}=(π{L},αL+1),αL+1πL+1, or α{L+1}=(α1,π{L}) with α1 any of the n types of amino acids. More generally, they can be obtained from PL+π and 2n1 polymers of type PL+α where 1 and α{L+}=(π{L},αL+1,αL+) with αjπj,j=L+1,L+, or α{L+}=(α1,α,π{L}) for any type αj,j=1,. The same is true for the corresponding polynucleotide chains.

When the polymerase is bound to a polynucleotide, it becomes more stable either through induced folding of a (partially) unfolded sequence, or through the inaccessibility of bound portions, or both. We thus define Fπ() as the depolymerization reduction coefficient for the bound polymerase of length , with that reduction coefficient being 1 when no depolymerization occurs at all. We estimate it to be

Fπ()={1eπmin+1λπmin0<πmin, (14)

with λ>0 a parameter controlling how much of the polymerase is stabilized. The term (πmin+1)/λ can be interpreted as a Boltzmann factor with a free energy expressed in units of kBT. The hydrogen bond binding energy between RNA and a polypeptide is ∼16 kJ/mol [Dixit et al. 2000], so assuming that the number of such hydrogen bonds between the polymerase and the polynucleotide is πmin+1, one has λ0.15.

The binding rate of a polymerase to a polynucleotide RMα of length M and sequence α is kb,M=ZRMαnM where nM is the total number of polynucleotides of length M. The probability that a polymerase of length L binds to a polynucleotide of length M is therefore given by

P˜b,M=kb,Mm=2Lmaxkb,m. (15)

The total time the polymerase remains bound to a polynucleotide of length M is estimated to be M/kR+. Therefore the probability Pb for a polymerase to be bound is given by the average binding time divided by the sum of the average binding time and the average time needed to bind:

Pb=M=2Lmax(M/kR+)P˜b,MM=2Lmax((M/kR+)P˜b,M)+1/m=2Lmaxkb,m. (16)

As a result the polymerase depolymerization rate will be

KP,α,L=KP,KP,π¯,L=KP,KP,π,L=KP(1PbFπ(L)). (17)

Equations

For any chain of length , our model considers the concentrations of polynucleotides and polypeptides corresponding to the polymerase sequence π, its complementary sequence π¯ and the generic sequences α. We assume that the concentrations of polynucleotides and polypeptides of a specific length, bar the polymerase and its complementary sequence, are identical. For the chains that share the first elements of their sequence with those of the polymerase (or its complementary chain), and differ in all other elements, this is only an approximation, but it is nevertheless justified, as the concentrations of these polymers only differ slightly from those of polymers with sequences of type α, and their contribution to the variation of the polymerase concentration is expected to be small.

The variations in polymer concentrations as time evolves are governed in our model by a system of ordinary differential equations. In the equations, L is the length of the polymer chains, spanning all values in the range 1<LLmax where Lmax is the maximal length of polypeptide and polynucleotide chains. We thus have a system of 6×(Lmax1) equations. We recall that n is the number of codon types, assumed to be equal to the number of amino acid types.

dRLπdt=KR+R1RL1πnKR+R1RLπ+=L+1Lmax[KRRπ+(2nL1)KRRα](L1)KRRLπ+K(L)RLπ¯,
dRLπ¯dt=KR+R1RL1π¯nKR+R1RLπ¯+=L+1Lmax[KRRπ¯+(2nL1)KRRα](L1)KRRLπ¯+K(L)RLπ,
dRLαdt=KR+R1RL1αnKR+R1RLα+2=L+1LmaxnLKRRα(L1)KRRLα+K(L)RLα,
dPLπdt=KP+P1PL1πnKP+P1PLπ+=L+1Lmax[KP(1PbFπ(L))Pπ+(2nL1)KPPα](L1)KP(1PbFπ(L)PLπ+kP,L+P1RLπ,
dPLπ¯dt=KP+P1PL1π¯nKP+P1PLπ¯+=L+1Lmax[KPPπ¯+(2nL1)KPPα](L1)KPPLπ¯+kP,L+P1RLπ¯,
dPLαdt=KP+P1PL1αnKP+P1PLα+2=L+1LmaxnLKPPα(L1)KPPLα+kP,L+P1RLα. (18)

Alongside the seven physical parameters {KR±,KP±,hP,L+,K(L),Pb} appearing in the differential equations above, we need to consider two parameters yielding the “initial” concentrations of amino acid and nucleotide inside the system, namely ρpP1(t=0) and ρrR1(t=0). In the absence of actual data for these quantities, we explore a range of realistic values in the analysis of our model. The concentration of free amino acids and nucleotides at any one time is then given by P1(t)=ρpL=2Lmax[(nL2)PLα(t)+PLπ(t)+PLπ¯(t)] and R1(t)=ρrL=2Lmax[(nL2)RLα(t)+RLπ(t)+RLπ¯(t)] respectively, with PLS(0)=RLS(0)=0 for any value of L in the range 2LLmax and sequence S=α,π,π¯.

Results

The system of equations (eq. 18) is nonlinear and too complex to solve analytically. We therefore analyze it numerically, starting from a system made entirely of free nucleotides, amino acids, as well as charged p-tRNA, and letting the system evolve until it settles into a steady configuration.

The main quantities of interest are the relative concentrations of the polymerase (ρπ) and of the α peptide chains (ρα). We have

ρπ==πminLmaxPπ  and  ρα==πminLmaxPα, (19)

and evaluate the ratios

Q1=ρπρα  and  Q2,=PπPα, (20)

while monitoring the evolution of each quantity over time. Q1 corresponds to the relative amount of polymerase of any length compared with other proteins (for a specific arbitrary sequence α), while Q2, corresponds to the relative amount of polymerase of length compared with an arbitrary protein of length . Unit ratios indicate that the polymerase has not been selected at all, whereas large values of Q1 or Q2, on the other hand indicate a good selection of the polymerase.

The complexity of the system (eq. 18) also lies in the number of free parameters it involves. A systematic analysis of the high-dimensional parameter space is beyond the scope of this article, and we therefore concentrate on the analysis and description of results for a selection of parameter values that highlight potentially interesting behaviors of our model.

Recall that our model assumes that the number n of different amino acids is equal to the number of codon types, and throughout our numerical work we have set n = 4. Note that the word “codon” here is used by extension. Indeed, there are four different nucleic acids in our model and the “biological” codons are made of two nucleic acids, bringing their number to sixteen. However, they split into four groups of four, each of which encoding one of the four amino acids. From a mathematical modeling point of view, this is completely equivalent. It is well accepted that early proteins were produced using a reduced set of amino acids (Angyan et al. 2014). The exact identity and number is unclear though experimental work has shown that protein domains can be made using predominantly five amino acids (Riddle et al. 1997) whereas the helices of a four-alpha helix bundle were made using only four amino acids (Regan and DeGrado 1988). We have used mostly πmin=7 and πmax=Lmax=10, but have investigated other values as well (see the Supplementary Material online).

While these figures are somewhat arbitrary, an πmin of 7 was chosen on the assumption that the functional p-Pol would have some forms of stable structural motif and this number corresponds to the typical minimum number of amino acids required to produce a stable, folded alpha helix structure (Manning et al. 1988). The choice of Lmax=10 is based on the fact that while the polymer peptide chains could be significantly longer, they would need correspondingly long polynucleotide sequences to encode them, which becomes increasingly unlikely as lengths increase. Furthermore, we expected polymers of length 10 to have very low concentrations, a hypothesis confirmed by our simulations. We have nevertheless investigated larger values of Lmax as well, and found little difference, as outlined below.

In a first step, guided by data on parameter values gleaned from the literature and gathered in the Supplementary Material online, we set

KR+=4.2×107mol1m3s1,KR=8×109s1,KP+=2.8×1021mol1m3s1,KP=4×1011s1kP,1+=0.1mol1m3s1,hR=106mol1m3s1,Z=106mol1m3s1,λ=0.15,kstep=4×105s1. (21)

We let the system evolve under a variety of initial concentrations of free amino acids and nucleotides, ρp and ρr, in the range 1050.1molm-3, and with all polymer concentrations set to 0. We monitored the concentration of all polymers, in particular the concentration of polymerase ρπ and its ratio to the concentration of α polypeptide chains, Q1. In most cases we found that the nucleotides polymerized spontaneously (Mechanism (A)) in small amount and this led, indirectly, to the polymerization of the polypeptides, including the polymerases (Mechanism (C)). The polymerases then induced further polymerization of the polynucleotides (Mechanism (B)) and the system slowly equilibrated.

The end result was an excess of polymerase of all lengths compared with α polypeptide chains with Q1=786 for all initial concentrations ρp=ρr0.001molm3 (fig. 5). Moreover the total amount of polymerase reached, for initial concentration of free amino acids ρp, was a concentration of 4×104×ρp (as illustrated by the bottom two rows in table 1). The concentration of polymerase of length 10, on the other hand, was very small P10π=6.3×1014molm3 for but Q2,10=5.9×1018 was very large, effectively showing that the only polypeptide chain of length Lmax=10 was the polymerase.

Fig. 5.

Fig. 5.

(a) Time evolution of the polymerase for initial concentration ρr=ρp=0.001, 0.01, and 0.1molm3. (b) Q1 for initial concentration ρr=ρr=0.01molm3. Parameter values: KP=4×1011s1,Z=106mol1m3s1,λ=0.15.

Table 1.

Effect of Initial Concentrations on Polymerase Production.

ρp(molm3) 1. ρr(molm3) 2. ρπ(molm3) 3. Q1 4. Polymerase Production
2 × 10–4 2 × 10–4 2.8 10–19 1.0008 Insignificant
9 × 10–4 9 × 10–4 1.410–14 12.4 Insignificant
10–3 10–3 3.9 × 10–7 786 Yes
10–1 10–1 3.9 × 10–5 786 Yes

We found hardly any polymerization of the polymerase when ρp=ρr=0.0009molm-3, with ρπ1.4×1014molm3 and Q1=12.4, whereas with ρp=ρr=0.001molm-3, we obtained ρπ3.9×107molm3 and Q1=786 (fig. 5a). This highlights a very sharp transition at a critical concentrationρp,c above which polymerases are generated. We summarize the data in table 1.

We then fixed the initial concentration ρp to four different values and varied ρr to identify the critical initial concentration of nucleotides necessary for the production of polymerases. The results in table 2 show that the critical concentration ρr,c is nearly constant and of the order of 103molm3 for a very wide range of amino acid initial concentrations.

Table 2.

Effect of Initial Peptide Concentration on Critical Concentration.

ρp(molm3) ρr,c(molm3)
10–4 2 × 10–3
10–3 10–3
10–2 8 × 10–4
10–1 7 × 10–4

Many of the parameters we have used were estimated or measured in conditions which, in all likelihood, were not identical to the ones existing when the polymerization we are modeling occurred. In a second step, we departed from the set of values (eq. 21) and found that in all cases investigated, varying these parameters modified the critical concentrations of ρr,c and ρp,c, but did not affect significantly the value of Q1 while Q2,10 remained extremely large.

More specifically, taking KP=5.1×106s1 marginally increased the critical concentration to ρr,c=ρp,c=0.0011molm3. Similarly, taking kstep=0.02s1 increased slightly the critical concentrations: ρr,c=ρp,c=0.0017molm3. On the other hand, taking Z=108mol1m3s1 lead to a decrease of the critical concentrations: ρr,c=ρp,c=0.0005molm3. Varying hR to values as small as 1mol1m3s1 did not change the critical concentrations.

In our model, we have considered the concentrations of free amino acids (ρpP1) and charged p-tRNA to be identical: kt1 (see eq. 7). To consider other values of kt we only need to multiply the polymerization rate of a peptide (kP,1+) by kt as it is p-tRNAs that bind to XNA chains, not free amino acids. We have considered a large range of values for kP,1+ and found that for kP,1+=105mol1m3s1, the critical concentrations had not changed significantly while for 108mol1m3s1, they increased to ρr,c=ρp,c=0.002molm3. This shows that taking much smaller values of kt has a very small impact on our results and that having a concentration of charged p-tRNA much smaller than that of free amino acids would only increase marginally the critical concentrations we have obtained using our original assumption.

The parameters on which the model is the most sensitive are KR+ and KR. We found that for KR+=4×108mol1m3s1, ρr,c=ρp,c=0.007molm3 and for KR+=4×109mol1m3s1, ρr,c=ρp,c=0.05molm3. Similarly, for KR=107s1 we found that ρr,c=ρp,c=0.01molm3 and for KR=106s1 that ρr,c=ρp,c0.05molm3. This shows that the spontaneous polymerization of polynucleotide is essential to reach a minimum concentration of polynucleotides to kick start the whole catalysis process and that the stability of the polynucleotides plays an important role.

To investigated this, we have run simulations with KR+=4×108mol1m3s1 for a fixed duration, τpol, after which KR+ was set to 0. We found that if τpol was long enough, the polymerization of polypeptide and polynucleotide chains was identical to the one obtained whereas KR+ was not modified. When τpol was too short, on the other hand, one was only left with short polypeptide and polynucleotide chains in an equilibrium controlled by the spontaneous polymerization and depolymerization parameters. The minimum value for τpol depends on the concentrations ρr and ρp and the results are given in table 3.

Table 3.

Effect of Initial Concentration of Free Nucleotides on Time for Production of Polymerase.

ρr=ρp(molm3) τpol(years)
0.001 18,000
0.002 254
0.005 12.7
0.01 2.2

This shows that while KR+ is an important parameter in the process, what matters are to have a spontaneous generation of polynucleotides at the onset (Mechanism (A)). This then leads to the production of polypeptides, including polymerase (Mechanism (C)) and, once the concentration of polymerase is large enough, the catalyzed production of polynucleotides (Mechanism (B)) dominates the spontaneous polymerization.

We have also varied KR once the system had settled and we found that for ρr=ρp=0.01molm3,KR could be increased up to 6×107s1 while still keeping a large amount of polymerase. Above that value, the polynucleotides are too unstable and one ends up again with mostly short polymer chains and Q11.

We have also considered values of Lmax>10 and found that the main difference is a slight increase of the critical concentrations. For example, for Lmax=11,12 and 15, ρr,c=ρp,c are respectively equal to 0.001,0.0011, and 0.0011molm3. At given concentrations Q1 and ρπ remain unchanged but PLmaxπ deceases approximately by a factor of 40 each time Lmax is increased by 1 unit.

We have also taken Lπmin=4,5, and 6 and found that the critical concentrations were respectively 2×105,2×104, and 4×104molm3, whereas ρπ took the values of ∼0.012, 2.6×103, and 3×104molm3. Q1 on the other hand remained constant.

A summary of the parameter values investigated outside the set (eq. 21) and the corresponding critical concentrations are given in table 4. Only one parameter was changed at a time (see the Supplementary Material online).

Table 4.

Effect of Various Parameters on Initial Critical Concentrations.

Modified Parameter ρr,c=ρp,c(molm3)
KP=5.1×106s1 1.1 × 10–3
kstep=2×102s1 1.7 × 10–3
Z=108mol1m3s1 5 × 10–4
hR=1mol1m3s1 10–3
kP,1+=105mol1m3s1 10–3
kP,1+=108mol1m3s1 2 × 10–3
Lmax = 15 1.1 × 10–3
Lπmin=6 4 × 10–4
Lπmin=5 2 × 10–4
Lπmin=4 2 × 10–5
KR+=4×108mol1m3s1 7 × 10–3
KR+=4×109mol1m3s1 5 × 10–2
KR=107s1 10–2
KR=106s1 0.19

Discussion

We describe a theoretical nucleopeptidic reciprocal replicator comprising a polynucleotide that templates the assembly of small p-tRNA adapter molecules, most likely having mixed backbone architectures. These spontaneously arising p-tRNAs would have been bound to various classes of amino acids (possibly via weak stereochemical specificity), and a simple increase in local concentration mediated by binding to the p-Rib (in its most primitive version nothing much more than a mixed backbone architecture p-mRNA) could have driven polypeptide polymerization. Once a template arose that coded for a peptide able to catalyze phosphodiester bond formation, this p-Rib could have templated assembly of its own complementary strand (and vice versa) and the self-replication cycle would have been complete (see fig. 6 for a summary).

Fig. 6.

Fig. 6.

The nucleopeptide Initial Darwinian Ancestor. In this cartoon model, a short strand of XNA has the functionality of both a primordial p-mRNA and a p-Rib. Primitive XNA molecules loaded with amino acids (p-tRNA) bind to the p-Rib via codon–anticodon pairing. This allows adjacent amino acids to undergo peptide bond formation and a short peptide chain is produced. A certain peptide sequence is able to act as a primordial XNA-dependent XNA polymerase (p-Pol) able to copy both + and – p-Rib strands to eventually produce a copy of the p-Rib(+) strand.

Starting from a single peptide and single polynucleotide, the IDA would quickly have become a distribution of related sequences of peptides and XNAs. We can imagine that over time, different p-Ribs encoding different peptides with additional functionalities could have appeared as the system evolved and that these p-Ribs may have subsequently fused together into larger molecules.

By imagining the IDA swiftly becoming a pool of molecules where variety within the “species” is maintained by the poor copying fidelity of a statistical operational code, should any mutation that stops replication arise, the other molecules in the pool would still function, ensuring continuity of the whole. Indeed this could have provided a selective pressure for superior replicators. While our model does not directly consider less than perfect copying fidelity, it is not expected to have a major effect on our conclusions as copies with decreased performance would not be maintained as a significant proportion of the population and copies with increased performance would simply take over the role of main replicator.

The primordial operational code may only have required two bases per p-tRNA to deliver statistical proteins, while the catalytic requirements of the p-Pol are loose enough that a seven-residue peptide is a plausible lower length limit. This reduces the minimum length of the posited spontaneously arising p-Rib to just 14 nucleotides (assuming no spaces between codons). This is an optimistic length estimate, but given the available time and with molecular co-evolution, inorganic catalysts and geological PCR, considerably longer molecules may have been possible (Baaske et al. 2007; Fishkis 2011). These p-Pols would act on p-Ribs and the crucial abiogenesis step would be the emergence of a 14-mer XNA that, in the context of the primordial operational code, happened to code for a peptide able to bind XNA and catalyze phosphodiester bond formation of base-paired nucleotides. Although the concentrations of various components are not known with certainty this does not seem an unreasonable proposition particularly given that functional peptides are known to occur in random sequences with surprising frequency (Keefe and Szostak 2001).

Our mathematical model showed that the most important parameters, apart from the concentration of loaded p-tRNA and polynucleotides, are the spontaneous polymerization and depolymerization of polynucleotides. It also shows that polynucleotides are first polymerized spontaneously and that these initial polynucleotides catalyze the production of the first polypeptides, including the polymerase. These polymerases can then generate further polynucleotides through catalysis. The stability gained by polymerases while being bound to polynucleotides ultimately leads to an increase of their relative concentration compared with the other polypeptides.

Overall, the hypothesis explains the coupling of polynucleotide and polypeptide polymerization, the operational code and mutations in the p-Pol sequence that could eventually result in increased specificities leading to primitive DNA polymerases and RNA polymerases. No extraordinary exchanges of function are required and each molecule is functionally similar to its present-day analogue. Like all new abiogenesis theories, this IDA requires in vitro confirmation; in particular, the steps required for the primordial operational code to arise ab initio warrant close attention.

The idea that the ancestral replicator may have consisted of both nucleic acid and peptide components (the “nucleopeptide world”) is in itself not new, but compared with the RNA world, has been somewhat neglected. We argue that molecular co-evolution of polynucleotides and peptides seems likely and cross-catalysis is known to be possible, for example in vitro selection experiments delivered RNA with peptidyl transferase activity after just nine rounds of a single selection experiments (Zhang and Cech 1997; Fishkis 2011). Inversely, Levy and Ellington produced a 17-residue peptide that ligates a 35 base RNA (Levy and Ellington 2003).

Nucleopeptide world research is relatively sparse, the data collected so far hint that cross-catalysis may be more efficient than autocatalysis by either peptides or nucleic acids. A self-replicating primordial system wherein RNA encoding for protein was replicated by a primordial RNA-dependent RNA polymerase which carried out the role of a replicative agent rather than as a transcriber of genes has previously been suggested (Leipe et al. 1999), although in this case no further development of the concept to produce a self-contained replicating system was pursued. The merits of a “two polymerase” system where RNA catalyses peptide polymerization and vice versa were succinctly explained by Kunin (2000), although possible mechanisms and validity were not considered in detail. The possibility of a two polymerase system is also mentioned by van der Gulik and Speijer as part of a wider review of the co-evolution of peptides and RNA (van der Gulik and Speijer 2015) but without a mathematical model.

Other origins of life hypotheses propose that the initial self-replicator did not consist of polynucleotides and/or peptides but was originally composed of different materials, most famously clay crystals (Cairns-Smith 1982). Such hypotheses are of interest but were not considered in this work as the IDA presented here does not require genetic takeover of one replication system by another and can be achieved using building blocks likely to have been present on the early earth and so appears more parsimonious. Our IDA hypothesis has tried to set out more rigorously the possible steps and processes whereby a nucleopeptide IDA could have arisen and could be tested experimentally.

Future experimental work that would support the nucleopeptide theory would be to provide evidence that the stereochemical hypothesis applies to the earliest occurring amino acids including those likely to have composed the active site of the p-Pol. Currently codon/anticodon binding to a number of amino acids has been shown (Yarus et al. 2005) but is absent for the four earliest amino acids (Wolf and Koonin 2007). This may be due to their small sizes though even here possible solutions have been proposed (Tamura 2015).

It is important to note that we do not propose that the RNA world did not or could not exist, nor does this work necessarily suggest that a self-replicating RNA polymerase did not exist (although our results suggest it to be unlikely), but rather that such a molecule did not directly lead to current living systems. Indeed the crucial role of RNA (more correctly, XNA) in our model is highlighted by the importance of KR+, the rate of polymerization of polynucleotide chains. We also do not dismiss any roles for ribozymes—for example it could well be that ribozymes were responsible for aminoacylation reactions (although this would inevitably raise the question of how such ribozymes were themselves replicated). Similarly (and with similar provisos), peptides alone could also have carried out supporting roles such as stabilizing long XNA sequences or catalyzing aminoacylation reactions. At its core however, we suggest that the ancestral replicator was nucleopeptidic with information storage function carried out by the XNA and polymerase function carried out by the peptide.

Supplementary Material

Supplementary data are available at Molecular Biology and Evolution online.

Supplementary Material

Supplementary Data

Acknowledgments

We thank Jeremy Tame, Andy Bates, and Arnout Voet for critical reading of the manuscript and Arnout Voet and Jan Zaucha for many constructive and critical discussions. This work was supported by RIKEN Initiative Research Funding to J.G.H. And funding from the Malopolska Centre of Biotechnology, awarded to J.G.H.

References

  1. Angyan AF, Ortutay C, Gaspari Z.. 2014. Are proposed early genetic codes capable of encoding viable proteins? J Mol Evol. 785:263–274. [DOI] [PubMed] [Google Scholar]
  2. Baaske P, Weinert FM, Duhr S, Lemke KH, Russell MJ, Braun D.. 2007. Extreme accumulation of nucleotides in simulated hydrothermal pore systems. Proc Natl Acad Sci USA. 10422:9346–9351. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Biro JC, Benyo B, Sansom C, Szlavecz A, Fordos G, Micsik T, Benyo Z.. 2003. A common periodic table of codons and amino acids. Biochem Biophys Res Commun. 3062:408–415. [DOI] [PubMed] [Google Scholar]
  4. Bromley EHC, Channon K, Moutevelis E, Woolfson DN.. 2008. Peptide and protein building blocks for synthetic biology: from programming biomolecules to self-organized biomolecular systems. ACS Chem Biol. 31:38–50. [DOI] [PubMed] [Google Scholar]
  5. Cairns-Smith AG. 1982. Genetic takeover and the mineral origins of life. Cambridge University Press. [Google Scholar]
  6. Carny O, Gazit E.. 2005. A model for the role of short self-assembled peptides in the very early stages of the origin of life. FASEB J. 199:1051–1055.http://dx.doi.org/10.1096/fj.04-3256hyp [DOI] [PubMed] [Google Scholar]
  7. Cech TR. 2012. The RNA worlds in context. Cold Spring Harb Perspect Biol. 47:a006742.. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Da Silva L, Maurel MC, Deamer D.. 2015. Salt-promoted synthesis of RNA-like molecules in simulated hydrothermal conditions. J Mol Evol. 802:86–97.http://dx.doi.org/10.1007/s00239-014-9661-9 [DOI] [PubMed] [Google Scholar]
  9. Dixit SB, Arora N, Jayaram B.. 2000. How do hydrogen bonds contribute to protein–DNA recognition? J Biomol Struct Dyn. 17(Suppl 1):109–112. [DOI] [PubMed] [Google Scholar]
  10. Fishkis M. 2011. Emergence of self-reproduction in cooperative chemical evolution of prebiological molecules. Origins Life Evol B. 413:261–275.http://dx.doi.org/10.1007/s11084-010-9220-3 [DOI] [PubMed] [Google Scholar]
  11. Fletcher JM, Harniman RL, Barnes FR, Boyle AL, Collins A, Mantell J, Sharp TH, Antognozzi M, Booth PJ, Linden N.. 2013. Self-assembling cages from coiled-coil peptide modules. Science 3406132:595–599. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Fox SW, Harada K.. 1958. Thermal copolymerization of amino acids to a product resembling protein. Science 1283333:1214.http://dx.doi.org/10.1126/science.128.3333.1214 [DOI] [PubMed] [Google Scholar]
  13. Giel-Pietraszuk M, Barciszewski J.. 2006. Charging of tRNA with non-natural amino acids at high pressure. FEBS J. 27313:3014–3023. [DOI] [PubMed] [Google Scholar]
  14. Giri V, Jain S.. 2012. The origin of large molecules in primordial autocatalytic reaction networks. PLoS ONE. 71:e29546. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Herschy B, Whicher A, Camprubi E, Watson C, Dartnell L, Ward J, Evans JRG, Lane N.. 2014. An origin-of-life reactor to simulate alkaline hydrothermal vents. J Mol Evol. 79(5–6):213–227. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Ikehara K. 2002. Origins of gene, genetic code, protein and life: comprehensive view of life systems from a GNC-SNS primitive genetic code hypothesis. J Biosci. 272:165–186.http://dx.doi.org/10.1007/BF02703773 [DOI] [PubMed] [Google Scholar]
  17. Ikehara K. 2005. Possible steps to the emergence of life: the [GADV]-protein world hypothesis. Chem Rec. 52:107–118.http://dx.doi.org/10.1002/tcr.20037 [DOI] [PubMed] [Google Scholar]
  18. Illangasekare M, Sanchez G, Nickles T, Yarus M.. 1995. Aminoacyl-RNA synthesis catalyzed by an RNA. Science 2675198:643–647. [DOI] [PubMed] [Google Scholar]
  19. Issac R, Chmielewski J.. 2002. Approaching exponential growth with a self-replicating peptide. J Am Chem Soc. 12424:6808–6809. [DOI] [PubMed] [Google Scholar]
  20. Iyer L, Koonin E, Aravind L.. 2003. Evolutionary connection between the catalytic subunits of DNA-dependent RNA polymerases and eukaryotic RNA-dependent RNA polymerases and the origin of RNA polymerases. BMC Struct Biol. 3:1.. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Keefe AD, Szostak JW.. 2001. Functional proteins from a random-sequence library. Nature 4106829:715–718.http://dx.doi.org/10.1038/35070613 [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Kemeny J. 1955. Man viewed as a machine. Sci Am. 1924:58–68. [Google Scholar]
  23. Knight RD, Landweber LF.. 2000. Guilt by association: the arginine case revisited. RNA 64:499–510.http://dx.doi.org/10.1017/S1355838200000145 [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Kochavi E, Bar-Nun A, Fleminger G.. 1997. Substrate-directed formation of small biocatalysts under prebiotic conditions. J Mol Evol. 454:342–351.http://dx.doi.org/10.1007/PL00006239 [DOI] [PubMed] [Google Scholar]
  25. Koonin EV, Novozhilov AS.. 2009. Origin and evolution of the genetic code: the universal enigma. IUBMB Life. 612:99–111.http://dx.doi.org/10.1002/iub.146 [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Koonin EV. 1991. The phylogeny of RNA-dependent RNA polymerases of positive-strand RNA viruses. J Gen Virol. 729:2197–2206.http://dx.doi.org/10.1099/0022-1317-72-9-2197 [DOI] [PubMed] [Google Scholar]
  27. Kunin V. 2000. A system of two polymerases – a model for the origin of life. Origins Life Evol B. 305:459–466.http://dx.doi.org/10.1023/A:1006672126867 [DOI] [PubMed] [Google Scholar]
  28. Kurland CG. 2010. The RNA dreamtime. Bioessays 3210:866–871.http://dx.doi.org/10.1002/bies.201000058 [DOI] [PubMed] [Google Scholar]
  29. Lee DH, Granja JR, Martinez JA, Severin K, Ghadiri MR.. 1996. A self-replicating peptide. Nature 3826591:525–528. [DOI] [PubMed] [Google Scholar]
  30. Lehmann J, Reichel A, Buguin A, Libchaber A.. 2007. Efficiency of a self-aminoacylating ribozyme: effect of the length and base-composition of its 3′ extension. RNA 138:1191–1197. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Leipe DD, Aravind L, Koonin EV.. 1999. Did DNA replication evolve twice independently? Nucleic Acids Res. 2717:3389–3401. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Leman L, Orgel L, Ghadiri MR.. 2004. Carbonyl sulfide-mediated prebiotic formation of peptides. Science 3065694:283–286.http://dx.doi.org/10.1126/science.1102722 [DOI] [PubMed] [Google Scholar]
  33. Levy M, Ellington AD.. 2003. Peptide-templated nucleic acid ligation. J Mol Evol. 565:607–615.http://dx.doi.org/10.1007/s00239-002-2429-7 [DOI] [PubMed] [Google Scholar]
  34. Liu Z, Beaufils D, Rossi JC, Pascal R.. 2014. Evolutionary importance of the intramolecular pathways of hydrolysis of phosphate ester mixed anhydrides with amino acids and peptides. Sci Rep. 4:7440.. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Manning MC, Illangasekare M, Woody RW.. 1988. Circular dichroism studies of distorted alpha-helices, twisted beta-sheets, and beta turns. Biophys Chem. 31(1–2):77–86. [DOI] [PubMed] [Google Scholar]
  36. Martin W, Baross J, Kelley D, Russell MJ.. 2008. Hydrothermal vents and the origin of life. Nat Rev Microbiol. 611:805–814. [DOI] [PubMed] [Google Scholar]
  37. McCleskey SC, Griffin MJ, Schneider SE, McDevitt JT, Anslyn EV.. 2003. Differential receptors create patterns diagnostic for ATP and GTP. J Am Chem Soc. 1255:1114–1115. [DOI] [PubMed] [Google Scholar]
  38. Miller SL. 1997. Peptide nucleic acids and prebiotic chemistry. Nat Struct Biol. 43:167–169.http://dx.doi.org/10.1038/nsb0397-167 [DOI] [PubMed] [Google Scholar]
  39. Moore PB, Steitz TA.. 2003. After the ribosome structures: how does peptidyl transferase work? RNA 92:155–159. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Morgens DW. 2013. The protein invasion: a broad review on the origin of the translational system. J Mol Evol. 774:185–196.http://dx.doi.org/10.1007/s00239-013-9592-x [DOI] [PubMed] [Google Scholar]
  41. Nelson KE, Levy M, Miller SL.. 2000. Peptide nucleic acids rather than RNA may have been the first genetic molecule. Proc Natl Acad Sci USA. 978:3868–3871.http://dx.doi.org/10.1073/pnas.97.8.3868 [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Noller HF. 2012. Evolution of protein synthesis from an RNA world. Cold Spring Harb Perspect Biol. 44:a003681. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Patel BH, Percivalle C, Ritson DJ, Duffy CD, Sutherland JD.. 2015. Common origins of RNA, protein and lipid precursors in a cyanosulfidic protometabolism. Nat Chem. 74:301–307. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Paul N, Joyce GF.. 2004. Minimal self-replicating systems. Curr Opin Chem Biol. 86:634–639.http://dx.doi.org/10.1016/j.cbpa.2004.09.005 [DOI] [PubMed] [Google Scholar]
  45. Pinheiro VB, Taylor AI, Cozens C, Abramov M, Renders M, Zhang S, Chaput JC, Wengel J, Peak-Chew SY, McLaughlin SH, et al. 2012. Synthetic genetic polymers capable of heredity and evolution. Science 3366079:341–344. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Regan L, DeGrado WF.. 1988. Characterization of a helical protein designed from first principles. Science 2414868:976–978.http://dx.doi.org/10.1126/science.3043666 [DOI] [PubMed] [Google Scholar]
  47. Riddle DS, Santiago JV, Bray-Hall ST, Doshi N, Grantcharova VP, Yi Q, Baker D.. 1997. Functional rapidly folding proteins from simplified amino acid sequences. Nat Struct Biol. 410:805–809. [DOI] [PubMed] [Google Scholar]
  48. Robertson MP, Joyce GF.. 2012. The origins of the RNA world. Cold Spring Harb Perspect Biol. 45:a003608.. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Rodin AS, Szathmary E, Rodin SN.. 2011. On origin of genetic code and tRNA before translation. Biol Direct. 6:14.. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Roviello G, Musumeci D, Castiglione M, Bucci EM, Pedone C, Benedetti E.. 2009. Solid phase synthesis and RNA-binding studies of a serum-resistant nucleo-epsilon-peptide. J Pept Sci. 153:155–160. [DOI] [PubMed] [Google Scholar]
  51. Ruiz-Mirazo K, Briones C, de la Escosura A.. 2014. Prebiotic systems chemistry: new perspectives for the origins of life. Chem Rev. 1141:285–366. [DOI] [PubMed] [Google Scholar]
  52. Saladino R, Botta G, Pino S, Costanzo G, Di Mauro E.. 2012. Genetics first or metabolism first? The formamide clue. Chem Soc Rev. 4116:5526–5565.http://dx.doi.org/10.1039/c2cs35066a [DOI] [PubMed] [Google Scholar]
  53. Schimmel P, Henderson B.. 1994. Possible role of aminoacyl-RNA complexes in noncoded peptide synthesis and origin of coded synthesis. Proc Natl Acad Sci USA. 9124:11283–11286.http://dx.doi.org/10.1073/pnas.91.24.11283 [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Schneider SE, O'Nei SN, Anslyn EV.. 2000. Coupling rational design with libraries leads to the production of an ATP selective chemosensor. J Am Chem Soc. 1223:542–543. [Google Scholar]
  55. Sievers A, Beringer M, Rodnina MV, Wolfenden R.. 2004. The ribosome as an entropy trap. Proc Natl Acad Sci USA. 10121:7897–7901. [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Tamura K. 2015. Beyond the frozen accident: glycine assignment in the genetic code. J Mol Evol. 81(3–4):69–71.http://dx.doi.org/10.1007/s00239-015-9694-8 [DOI] [PubMed] [Google Scholar]
  57. Tamura K, Schimmel P.. 2003. Peptide synthesis with a template-like RNA guide and aminoacyl phosphate adaptors. Proc Natl Acad Sci USA. 10015:8666–8669. [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. Trevino SG, Zhang N, Elenko MP, Luptak A, Szostak JW.. 2011. Evolution of functional nucleic acids in the presence of nonheritable backbone heterogeneity. Proc Natl Acad Sci USA. 10833:13492–13497. [DOI] [PMC free article] [PubMed] [Google Scholar]
  59. Turk RM, Chumachenko NV, Yarus M.. 2010. Multiple translational products from a five-nucleotide ribozyme. Proc Natl Acad Sci USA. 10710:4585–4589. [DOI] [PMC free article] [PubMed] [Google Scholar]
  60. van der Gulik P, Speijer D.. 2015. How amino acids and peptides shaped the RNA world. Life 51:230..http://dx.doi.org/10.3390/life5010230 [DOI] [PMC free article] [PubMed] [Google Scholar]
  61. Vetsigian K, Woese C, Goldenfeld N.. 2006. Collective evolution and the genetic code. Proc Natl Acad Sci USA. 10328:10696–10701. [DOI] [PMC free article] [PubMed] [Google Scholar]
  62. Vidonne A, Philp D.. 2009. Making molecules make themselves – the chemistry of artificial replicators. Eur J Org Chem. 20095:593–610. [Google Scholar]
  63. Woese CR. 1965. On the evolution of the genetic code. Proc Natl Acad Sci USA. 546:1546–1552.http://dx.doi.org/10.1073/pnas.54.6.1546 [DOI] [PMC free article] [PubMed] [Google Scholar]
  64. Wolf YI, Koonin EV.. 2007. On the origin of the translation system and the genetic code in the RNA world by means of natural selection, exaptation, and subfunctionalization. Biol Direct. 2:14.. [DOI] [PMC free article] [PubMed] [Google Scholar]
  65. Yarus M, Caporaso JG, Knight R.. 2005. Origins of the genetic code: the escaped triplet theory. Annu Rev Biochem. 74:179–198.http://dx.doi.org/10.1146/annurev.biochem.74.082803.133119 [DOI] [PubMed] [Google Scholar]
  66. Yarus M, Widmann JJ, Knight R.. 2009. RNAamino acid binding: a stereochemical era for the genetic code. J Mol Evol. 695:406–429.http://dx.doi.org/10.1007/s00239-009-9270-1 [DOI] [PubMed] [Google Scholar]
  67. Yarus M. 2011. Getting past the RNA world: the initial darwinian ancestor. Cold Spring Harbor Perspect Biol. 34:a003590.. [DOI] [PMC free article] [PubMed] [Google Scholar]
  68. Zhang BL, Cech TR.. 1997. Peptide bond formation by in vitro selected ribozymes. Nature 3906655:96–100.http://dx.doi.org/10.1038/36375 [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Data

Articles from Molecular Biology and Evolution are provided here courtesy of Oxford University Press

RESOURCES