Abstract
The synthesis of complementary strands is the reaction underlying the replication of genetic information. It is likely that the earliest self‐replicating systems used RNA as genetic material. How RNA was copied in the absence of enzymes and what sequences were most likely to have supported replication is not clear. Here we show that mixtures of dinucleotides with C and G as bases copy an RNA sequence of up to 12 nucleotides in dilute aqueous solution. Successful enzyme‐free copying occurred with in situ activation at 4 °C and pH 6.0. Dimers were incorporated in favor of monomers when both competed as reactants, and little misincorporation was detectable in mass spectra. Simulations using experimental rate constants confirmed that mixed C/G sequences are good candidates for successful replication with dimers. Because dimers are intermediates in the synthesis of longer strands, our results support evolutionary scenarios encompassing formation and copying of RNA strands in enzyme‐free fashion.
Keywords: Genetic Copying, Nucleotides, Origin of Life, RNA, Replication
When dinucleotides were allowed to act as building blocks for enzyme‐free genetic copying of RNA, sequences up to 12 bases in length were faithfully transmitted into the copy.
Nature uses semiconservative replication to pass genetic information from one generation to the next. The molecular basis of replication is the synthesis of complementary strands, directed by template strands. Genetic copying of extant biology relies on polymerase‐catalyzed formation of phosphodiester bonds via the nucleophilic attack of the 3′‐terminus of the growing strand on a nucleoside triphosphate engaging in Watson–Crick base pairing with the template. Genetic copying is best known for DNA, [1] but, to this day, RNA viruses use RNA‐directed RNA polymerization when replicating their genome, [2] and so may have the earliest self‐replicating systems of evolution. [3] The latter probably did so without the help of enzymes, using activated nucleotides with organic leaving groups.[ 4 , 5 ]
While, conceptually, enzyme‐free copying of RNA is straightforward, even copying of sequences just long enough to encode a very short ribozyme [6] remains challenging experimentally. One reason for this is the sequence dependence of the enzyme‐free reaction. Early reports used homopolymers as templates, [7] with the most successful example being poly(C), on which oligoguanylates form. [8] Unfortunately, homopolymer sequences are not useful as genes. When mixed sequences were studied, sequences not dominated by cytidylic acid were found to be poor templates.[ 9 , 10 ] To this day, even extension of a primer by four nucleotides succeeds only in rare cases, [11] or for immobilized templates with repeated replenishing of the monomers. [12] A combination of preactived monomers and trimers, furnished with the best organic leaving groups known, still did not give “reads” longer than seven nucleotides. [13] Also, preactivated trimers were found to be poor building blocks. [14] Enzyme‐free ligation of longer RNA is an unlikely alternative, as this reaction is known to be less efficient for RNA than for DNA. [15] Further, the best sequences remain to be identified. Systematic searches for sequences that favor copying have been proposed, [16] and performed, [11] but have not yielded a system that undergoes copying for a stretch of at least 10 bases.
Naïvely, one may consider primer extension just a bimolecular reaction. If so, an increase in concentration should increase the rate of the reaction and should lead to high yields. This was found not to be the case. Rather, as the concentration increases, spent monomers, i.e. the hydrolysis products of activated monomers, increasingly inhibit incorporation by blocking the extension site. [17] This problem can be reduced by periodically replacing spent monomers [12] or by re‐activating free nucleotides in situ. [18] Further, a poor template effect, caused by weak base pairing, can be reduced by employing strands that provide additional stacking interactions to incoming building blocks. [19] Finally, misincorporations that cause stalling, [20] may be avoided by relying on high fidelity copying with bases that pair well.
These considerations led to our current study, which employs in situ activation, strongly pairing building blocks, and sequences related to those known to support enzyme‐free replication.[ 21 , 22 , 23 ] To be plausible, we also chose conditions that support strand formation and copying, [18] as depicted in Figure 1. Critically, we opted for both mono‐ and dinucleotides [24] to overcome the poor performance of known systems. Here we report that dimer‐containing mixtures lead to extension of a primer by up to 12 nucleotides in a row, and that the sequences identified have the potential to support replication, as suggested by the results of an in silico study.
Figure 1.

A) Phosphodiester formation as the molecular basis of strand formation and genetic copying. B) Putative steps of molecular evolution from mononucleotides to building blocks for copying, and copying itself.
Figure 2 shows the RNA sequences employed in our study. All assays were ′single‐run′ experiments, in which the RNA strands and unactivated mono‐ or dinucleotides were dissolved in homogeneous aqueous condensation buffer and allowed to react at 4 °C without feeding fresh reagents or starting materials. The buffer contained 0.4 M 1‐ethyl‐3‐(3‐dimethylaminopropyl) carbodiimide (EDC) as condensation agent, an equimolar amount of 1‐ethylimidazole as organocatalyst, and 0.08 M magnesium chloride as the only other salt. Aliquots drawn after stated intervals were analyzed by MALDI‐TOF mass spectrometry under conditions that allow for quantitative detection. [25] Calibration experiments were performed for all major products to correct for differences in desorption and ionization, using synthetic RNA strands, and the presence of the main extension products was confirmed by HPLC for a typical product mixture (see Chapter 4 of the Supporting Information).
Figure 2.
Genetic copying reactions studied, with regions to be copied in red and regions of products that are copies in blue. Conditions: 45 μM primer, 60 μM template, monomers and/or dimers 0.5–2 mM each, 0.4 M EDC, 0.4 M 1‐ethylimidazole, 0.08 M MgCl2, pH 6 and 4 °C.
The first experiments employed template 1 and primer 2, and the template region to be copied was ten nucleotides in length. When mononucleotides CMP and GMP were used as building blocks, little extension to products 3–12 was detected, with singly extended 3 being the main product (Figure 3A and Figure S7a in the Supporting Information). When dimers CG and GG were employed instead, extension products up to the full ten nucleotides of the templating region appeared in the MALDI spectrum (Figure 3B). The peak for compound 12 indicated an overall yield of 18 % for the five‐step sequence. The corresponding assay with a mixture containing both monomers and dimers yielded a similar result (Figure 3 C), with minor peaks for the products with an odd number of newly added nucleotides resulting from NMP incorporation.
Figure 3.

Genetic copying with the primer/template combinations shown in the upper part of each panel, as detected by MALDI‐TOF MS. Building blocks were 2 mM each, except for E (0.5 mM each). The building blocks were: A) C and G; B) CG and GG; C) C, G, CG, and GG; D) CG, GG, GC, and CC; E) A, C, G, U, AG, CG, GG, and UG; F) CG, GG, GC, and CC. Conditions: 45 μM primer, 60 μM template, 0.4 M EDC, 0.4 M 1‐EtIm, 0.08 M MgCl2, pH 6, 13 d (A–E) or 18 d (F) at 4 °C.
When a mixture of all four possible dinucleotides in a C/G‐based genetic system was employed, full‐length product was again detectable by MALDI mass spectrometry (Figure 3D). This was an important result because such a mix is more plausible, assuming statistical oligomerization as the source for strands. Further, self‐pairing between dimers could have suppressed template binding and thus successful copying of the desired sequence.
Next, we asked whether the dimer‐based copying reaction tolerates a modest level of weakly pairing bases (A and U). For this, template 13 was employed, which features a templating region with all four canonical nucleobases, three of which are either A or U. Again, copying with mononucleotides was largely unsuccessful, with less than 30 % conversion to any products. Strands 3 and 4 gave the only significant peaks in a spectrum acquired after 13 d reaction time (Figure S7b, Supporting Information). When dimers AG, CG, GG and UG were present in the reaction mixture, strands up to 19 were detectable in the spectrum after the same time span (Figure 3E), even though eight components were used and the concentration of dimers was only 500 μM each. While these results show that a few weakly pairing bases are tolerated, our study on the individual incorporation of all possible dinucleotide sequences gave yields as low as 3 % for some dimers consisting of A and U only, and just 56 % for the most favorable dimer (AU), [24] making it unlikely that A/U‐rich sequences will be copied successfully under the current assay conditions.
Besides base pairing strength, there is an effect of concentration on the yield of copying. Exploratory experiments with a model system, using just two equivalents of dimer CG and ten‐fold dilution of the reaction mixture, resulted in a roughly ten‐fold decrease in conversion to product, as shown in Figure S8 of the Supporting Information.
In the last assay of the experimental part of our study, we used template 22, featuring a templating stretch of a dozen bases. All four C/G‐containing dimers were added as building blocks, and the reaction was allowed to proceed for 18 d in the cold. Even though the full length product was ever more difficult to separate from the template strand (and thus to detect), with a calculated UV melting point of 94 °C at the Mg2+ concentration chosen, a clearly visible peak, corresponding to 11 % overall conversion, was detected for the desired product (23), extended by twelve bases total (Figure 3F).
To gain insights into how unique the sequences employed are in their ability to support copying, we performed an exploratory study on how the template sequence affects enzyme‐free copying with dimers by a computational approach. For this, we first extracted effective rate constants for the ligations of CG, GG, GC, and CC from kinetic data for single‐step primer extension experiments with those dimers (see Supporting Information). Based on these rate constants, we inferred the times that would be required to copy each of the 1024 possible template sequences of ten bases consisting only of C/G nucleotides. Although cyclization of dinucleotides can be a significant side reaction,[ 24 , 26 , 27 ] we did not correct for this phenomenon, as a previous study had shown similar levels of cyclization for all four C/G‐containing dinucleotides (≤50–70 %, even after 20 d). [24] Figure 4A shows the distribution of the calculated copying times, with the sequences plotted in ′classes′ containing the same number of G nucleotides.
Figure 4.

Computational exploration of enzyme‐free copying and replication with dimers. A) Copying times for all possible template sequences of length ten consisting only of C and G nucleotides. The timescales were calculated using effective extension rates extracted from kinetic data for the ligation of individual dimers (see Supporting Information). Sequences with the same G content and same predicted copying time are represented as filled circles, with areas proportional to the number of sequences contained (these numbers are also explicitly indicated). The circles containing the experimental template 1 and its reverse complement are marked in red. B) Replication times, i.e., the sums of the copying times of a sequence and its reverse complement, for all sequences from (A). The circle containing the experimental template 1 is marked in red.
As the plot of this exploratory data shows, experimental template 1 ranks best among the decamer sequences containing three G nucleotides, together with nine other such sequences with equal predicted copying time. For these ten sequences, the calculated copying time is only about 20 % longer than for the fastest‐copying poly(C) sequence. Moreover, the copy of the experimental sequence, i.e., its reverse complement, also appears to be well suited for copying (in Figure 4A, the circles containing sequence 1 and its reverse complement are marked in red). In order to rank sequences according to their “replicability”, we also computed the replication time for each sequence, i.e., the sum of the copying times for the sequence and for its reverse complement, see Figure 4B. Here, the replication time of the experimental template 1 (red circle) is within 45 % of the sequence (GC)5, which marks the global optimum for enzyme‐free replication with dimers. In contrast to the periodic optimal sequence (GC)5, which is not useful as a gene, the experimental sequence 1, together with its 9 equivalently fast replicating sequences (all consisting of three GC and two CC dimers), provides an example of a prebiotically plausible set of sequences that can encode information and, at the same time, are kinetically well suited for enzyme‐free replication. As seen in Figure 4B, this set could be further extended without increasing replication times, by including the five best‐replicating sequences containing four G nucleotides, and the 20 second‐best replicating sequences containing five G nucleotides.
Taken together, our data provides a fascinating glimpse at what RNA sequences are most likely to undergo enzyme‐free replication. Because in situ activation was employed that allows for both strand formation and copying, [18] a “holistic” scenario now seems realistic that starts from ribonucleotides and evolves into a primitive genetic system with traits that make it fit for replication. Several challenges remain, however, before replication can be demonstrated. Among them is the strand separation problem caused by product strands pairing strongly with the template, inhibiting the next phase of genetic copying. Possible solutions to this are thermal cycling [28] or the use of organic salts to slow down re‐annealing of strands. [29] Another challenge that remains to be addressed is the regioselectivity of phosphodiester formation. Template‐directed reactions are known to be more regioselective than oligomerizations,[ 30 , 31 ] but a yet to be determined percentage of 2′‐5′‐linkages may still be formed in genetic copying with dimers, and future studies should tackle this issue.
Our results can also be discussed in light of the difficulties experienced in earlier studies. Several factors might have contributed to the successful copying of longer stretches of sequence under our conditions. The strongly pairing dimers may have formed a non‐covalent helix, in which each dimer aids the binding of a neighboring building block. As a consequence, the residence time of the building blocks probably became long enough to allow for even sluggish reactions, such as the ligation of unmodified RNA strands, to proceed. The well‐binding dimers appear to displace monomers, which would otherwise cause significant levels of misincorporation. Overall, dimers appear to possess a favorable combination of binding strength and reactivity that allows for high fidelity copying of longer stretches of RNA.
In conclusion, the copying of RNA sequences up to 12 nucleotides in length was observed, starting from unactivated dinucleotides containing C and G as the only bases, with little misincorporation. To the best of our knowledge, this is the longest “read” of an RNA sequences capable of encoding genetic information by enzyme‐ or ribozyme‐free copying to date. Apparently, dimers, i.e. the very first products of oligomerization reactions, can drive a process that has been difficult to demonstrate using mononucleotides or trimers. The results from the in silico exploration of sequence space indicate that the mixed sequences employed here are suitable for both copying and back‐copying, the two steps necessary for replication. We have reason to believe that even longer reads than the twelve‐nucleotide stretch of sequence copied here may be achievable using dimers, and we are actively pursuing research in this direction.
Conflict of interest
The authors declare no conflict of interest.
Supporting information
As a service to our authors and readers, this journal provides supporting information supplied by the authors. Such materials are peer reviewed and may be re‐organized for online delivery, but are not copy‐edited or typeset. Technical support issues arising from supporting information (other than missing files) should be addressed to the authors.
Supporting Information
Acknowledgements
The authors thank O. Doppleb for help with RNA syntheses. Supported by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation)—Project‐ID 364653263—TRR 235 (CRC 235). Open Access funding enabled and organized by Projekt DEAL.
G. Leveau, D. Pfeffer, B. Altaner, E. Kervio, F. Welsch, U. Gerland, C. Richert, Angew. Chem. Int. Ed. 2022, 61, e202203067; Angew. Chem. 2022, 134, e202203067.
Data Availability Statement
The data that support the findings of this study are available in the Supporting Information of this article.
References
- 1. Kornberg A., Baker T. A., DNA Replication , 2nd ed., University Science Books, Mill Valley, 2005. [Google Scholar]
- 2. Koonin E. V., Dolja V. V., Krupovic M., Varsani A., Wolf Y. I., Yutin N., Murilo Zerbini F., Kuhn J. H., Microbiol. Mol. Biol. Rev. 2020, 84, e00061. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Ruiz-Mirazo K., Briones C., de la Escosura A., Chem. Rev. 2014, 114, 285–366. [DOI] [PubMed] [Google Scholar]
- 4. Kozlov I. A., Orgel L. E., Mol. Biol. 2000, 34, 781–789. [PubMed] [Google Scholar]
- 5. Benner S. A., Kim H.-J., Yang Z., Cold Spring Harbor Perspect. Biol. 2012, 4, a003541. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Turk R. M., Chumachenko N. V., Yarus M., Proc. Natl. Acad. Sci. USA 2010, 107, 4585-4589. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Sulston J., Lohrmann R., Orgel L. E., Miles H. T., Proc. Natl. Acad. Sci. USA 1968, 59, 726–733. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Joyce G. F., Visser G. M., van Boeckel C. A. A., van Boom J. H., Orgel L. E., van Westrenen J., Nature 1984, 310, 602–604. [DOI] [PubMed] [Google Scholar]
- 9. Hill A. R., Orgel L. E., Wu T., Origins Life Evol. Biospheres 1993, 23, 285–290. [DOI] [PubMed] [Google Scholar]
- 10. Zielinski M., Kozlov I. A., Orgel L. E., Helv. Chim. Acta 2000, 83, 1678–1684. [DOI] [PubMed] [Google Scholar]
- 11. Duzdevich D., Carr C. E., Ding D., Zhang S. J., Walton T. S., Szostak J. W., Nucleic Acids Res. 2021, 49, 3681–3691. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Deck C., Jauker M., Richert C., Nat. Chem. 2011, 3, 603–608. [DOI] [PubMed] [Google Scholar]
- 13. Li L., Prywes N., Tam C. P., O'Flaherty D. K., Lelyveld V. S., Izgu E. C., Pal A., Szostak J. W., J. Am. Chem. Soc. 2017, 139, 1810–1813. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Prywes N., Blain J. C., Del Frate F., Szostak J. W., eLife 2016, 5, e17756. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Dolinnaya N. G., Sokolova N. I., Ashirbekova D. T., Shabarova Z. A., Nucleic Acids Res. 1991, 19, 3067–3072. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Hey M., Hartel C., Göbel M. W., Helv. Chim. Acta 2003, 86, 844–854. [Google Scholar]
- 17. Kervio E., Claasen B., Steiner U. E., Richert C., Nucleic Acids Res. 2014, 42, 7409–7420. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Jauker M., Griesser H., Richert C., Angew. Chem. Int. Ed. 2015, 54, 14559–14563; [DOI] [PMC free article] [PubMed] [Google Scholar]; Angew. Chem. 2015, 127, 14767–14771. [Google Scholar]
- 19. Vogel S. R., Deck C., Richert C., Chem. Commun. 2005, 4922–4924. [DOI] [PubMed] [Google Scholar]
- 20. Leu K., Kervio E., Obermayer B., Turk-MacLeod R. M., Yuan C., J.-M. Luevano Jr , Chen E., Gerland U., Richert C., Chen I. A., J. Am. Chem. Soc. 2013, 135, 354–366. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Zielinski W. S., Orgel L. E., Nature 1987, 327, 346–347. [DOI] [PubMed] [Google Scholar]
- 22. Sievers D., von Kiedrowski G., Nature 1994, 369, 221–224. [DOI] [PubMed] [Google Scholar]
- 23. Hänle E., Richert C., Angew. Chem. Int. Ed. 2018, 57, 8911–8915; [DOI] [PubMed] [Google Scholar]; Angew. Chem. 2018, 130, 9049–9053. [Google Scholar]
- 24. Sosson M., Pfeffer D., Richert C., Nucleic Acids Res. 2019, 47, 3836–3845. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Sarracino D., Richert C., Bioorg. Med. Chem. Lett. 1996, 6, 2543–2548. [DOI] [PubMed] [Google Scholar]
- 26. Smietana M., Kool E. T., Angew. Chem. Int. Ed. 2002, 41, 3704–3707; [DOI] [PubMed] [Google Scholar]; Angew. Chem. 2002, 114, 3856–3859. [Google Scholar]
- 27. Gaffney B. L., Veliath E., Zhao J., Jones R. A., Org. Lett. 2010, 12, 3269–3271. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Edeleva E., Salditt A., Stamp J., Schwintek P., Boekhoven J., Braun D., Chem. Sci. 2019, 10, 5807–5814. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. He C., Gállego I., Laughlin B., Grover M. A., Hud N. V., Nat. Chem. 2017, 9, 318–324. [DOI] [PubMed] [Google Scholar]
- 30. Motsch S., Tremmel P., Richert C., Nucleic Acids Res. 2020, 48, 1097–1107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31. Motsch S., Pfeffer D., Richert C., ChemBioChem 2020, 21, 2013–2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
As a service to our authors and readers, this journal provides supporting information supplied by the authors. Such materials are peer reviewed and may be re‐organized for online delivery, but are not copy‐edited or typeset. Technical support issues arising from supporting information (other than missing files) should be addressed to the authors.
Supporting Information
Data Availability Statement
The data that support the findings of this study are available in the Supporting Information of this article.


