Significance
An RNA enzyme with RNA polymerase activity was used to replicate and evolve an RNA enzyme with RNA-cleavage activity. The fidelity of the polymerase is sufficient to maintain heritable information over the course of evolution, with a succession of variants of the RNA-cleaving RNA enzyme arising that have progressively increasing fitness. The RNA-catalyzed evolution of functional RNAs is thought to have been central to the early history of life on Earth and to the possibility of constructing RNA-based life in the laboratory.
Keywords: directed evolution, ribozyme, RNA polymerase, RNA replication
Abstract
An RNA polymerase ribozyme that was obtained by directed evolution can propagate a functional RNA through repeated rounds of replication and selection, thereby enabling Darwinian evolution. Earlier versions of the polymerase did not have sufficient copying fidelity to propagate functional information, but a new variant with improved fidelity can replicate the hammerhead ribozyme through reciprocal synthesis of both the hammerhead and its complement, with the products then being selected for RNA-cleavage activity. Two evolutionary lineages were carried out in parallel, using either the prior low-fidelity or the newer high-fidelity polymerase. The former lineage quickly lost hammerhead functionality as the population diverged toward random sequences, whereas the latter evolved new hammerhead variants with improved fitness compared to the starting RNA. The increase in fitness was attributable to specific mutations that improved the replicability of the hammerhead, counterbalanced by a small decrease in hammerhead activity. Deep sequencing analysis was used to follow the course of evolution, revealing the emergence of a succession of variants that progressively diverged from the starting hammerhead as fitness increased. This study demonstrates the critical importance of replication fidelity for maintaining heritable information in an RNA-based evolving system, such as is thought to have existed during the early history of life on Earth. Attempts to recreate RNA-based life in the laboratory must achieve further improvements in replication fidelity to enable the fully autonomous Darwinian evolution of RNA enzymes as complex as the polymerase itself.
Darwinian evolution depends on the selective propagation of heritable information. In biology that information is represented by the sequence of nucleotide subunits within an RNA or DNA genome and is expressed through the set of RNAs and proteins encoded by that genome. During the early history of life on Earth, it is thought that RNA served as both the genetic material and the agent of expressed function in an era commonly referred to as the “RNA world” (1–3). At the outset, RNA replication is likely to have occurred as a non-enzymatic process by which RNA templates were copied to yield complementary strands, which in turn were copied to yield additional copies of the starting templates (4–7). Sequence variation would have arisen due to imperfect copying fidelity, and those variants that replicated most efficiently would have grown to dominate the population, until new variants with even greater fitness arose.
At some point during the early history of RNA-based evolution, it is thought that RNA evolved the ability to catalyze its own replication, acting as an RNA-dependent RNA polymerase (1–3). As the efficiency and accuracy of RNA replication improved, larger and more complex RNAs could be replicated, encompassing more sophisticated catalytic motifs and expanding the functional repertoire of the RNA world. Throughout the generations of RNA-based evolution, copying accuracy must have exceeded a critical threshold to maintain heritable information, and this threshold would have risen as the evolving RNAs increased in size and complexity (8, 9).
The mathematical relationship between replication accuracy and maximum genome length was first described by Eigen (8). Stated simply, the relative advantage enjoyed by the most advantageous individuals in the population must exceed the probability of producing an error copy of those advantageous individuals. The greater the number of conserved nucleotide subunits, the higher the copying fidelity must be to produce error-free copies and thus to ensure that genomic information can be maintained over successive generations.
Although there are no known examples in biology of RNA enzymes with RNA-dependent RNA polymerase activity, considerable progress has been made in developing such enzymes in the laboratory using directed evolution methods (10–12). The most advanced of these polymerases, which are derived from the class I RNA ligase (13), can synthesize RNAs containing more than 100 nucleotide subunits (14–17). Although this improved polymerase activity enables the synthesis of a variety of functional RNAs, including aptamers and ribozymes, the fidelity of synthesis, especially for more complex RNAs, has remained low (14, 15, 17, 18). However, now that the polymerase can synthesize larger functional RNAs, directed evolution can select for polymerases with higher fidelity by requiring them to synthesize longer products without introducing deleterious mutations that would disrupt the function of those products.
During the most recent rounds of directed evolution, which are reported here, the polymerases were required to synthesize the class I ligase. This requirement placed unprecedented selection pressure on the accuracy of synthesis because even a few mistakes would result in an inactive product. As a consequence of these efforts, the fidelity of synthesis improved substantially, making it possible, for the first time, to use RNA enzymes to replicate and evolve other functional RNAs. As a demonstration of this capability, both the prior low-fidelity and the newer high-fidelity polymerase ribozymes were challenged to evolve the hammerhead ribozyme.
The hammerhead is a small, self-cleaving RNA that contains ~34 nucleotides, 12 of which are strictly conserved (19, 20). This size was found to place the hammerhead above the error threshold for the low-fidelity polymerase, but within the error threshold for the high-fidelity polymerase. In the present study, eight rounds of RNA-catalyzed evolution were carried out in parallel using the two polymerases. During each round, the hammerhead RNA was copied by the polymerase to yield its complement, which in turn was copied by the polymerase to generate new hammerhead ribozymes, which were required to catalyze an RNA-cleavage reaction. Thus the fitness of the evolving hammerheads was determined by both their replicability and catalytic activity. Following each round of evolution, the RNAs were reverse transcribed and PCR amplified, which provided an archive of materials that were analyzed by deep sequencing to determine the course of hammerhead evolution.
The low-fidelity polymerase was unable to maintain the essential features of the hammerhead motif, with heritable information becoming lost to entropic decay after only a few generations of evolution. In contrast, the high-fidelity polymerase not only sustained the evolving population of functional hammerheads, but also led to the emergence of novel variants with improved fitness compared to the starting molecule, progressively accruing advantageous mutations. These improvements in fitness reflect a compromise between replicability and catalytic activity, with the evolved hammerheads having somewhat reduced catalytic activity, but a greater increase in their ability to be copied by the polymerase.
These results are relevant to the RNA world hypothesis, whereby the higher the fidelity of the polymerase, the larger the functional RNA it is able to synthesize; and the larger the functional RNA it is able to synthesize, the more opportunity there is to evolve complex motifs with even greater fidelity (3). This process may also be the path to the laboratory evolution of RNA polymerases with sufficient activity and fidelity to catalyze the evolution of RNAs as large and complex as the polymerase itself.
Results
Evolution of Polymerases with Improved Activity and Fidelity.
A directed evolution procedure was developed that requires the polymerase ribozyme to synthesize a functional copy of its evolutionary ancestor, the class I RNA ligase (13). The class I ligase is a challenging target for the polymerase, which the previous most advanced variant, the 52-2 polymerase, can synthesize in only 2% yield after 24 h, with a fidelity of 84.1% per nucleotide (17). The class I ligase is a complex motif, containing 33 highly conserved and 41 partially conserved nucleotides (21). As a consequence, the ensemble of ligase ribozymes synthesized by the 52-2 polymerase are >1,000-fold less active compared to the ensemble synthesized by a protein polymerase (17). Therefore, by requiring the polymerase ribozyme to synthesize a functional ligase, one can impose stringent selection pressure to direct the evolution of polymerases with improved fidelity (Fig. 1A).
Fig. 1.

Directed evolution of polymerase and hammerhead ribozymes. (A) Scheme for selective amplification of polymerase ribozymes that synthesize a functional class I ligase ribozyme. 1) Attachment to the polymerase of an RNA primer (magenta), which binds to an RNA template (brown) that encodes the class I ligase; 2) extension of the primer by polymerization of NTPs (cyan); 3) reverse transcription of the polymerase; 4) attachment of one RNA substrate for ligation (orange) to the 3′ end of the polymerase cDNA and hybridization of that substrate to a 5′-biotinylated (green) RNA template that is linked to the other substrate for ligation (orange); 5) capture of the ligated products on streptavidin magnetic beads (gray); and 6) PCR amplification and transcription to generate progeny polymerases. (B) Sequence and secondary structure of the 71-89 RNA polymerase. Orange circles indicate mutations that arose during evolution leading to the 52-2 polymerase (17); red circles indicate mutations that arose from the 52-2 to the 71-89 polymerase; nucleotides in gray were added during polymerase evolution. (C) Sequence and secondary structure of the hammerhead bound to an RNA substrate. Stem elements I–III are labeled, strictly conserved nucleotides are shown in red, and the 5′ and 3′ primer-binding regions are shown in magenta and brown, respectively. The arrow indicates the substrate cleavage site. (D) Scheme for RNA-catalyzed selective amplification of hammerhead ribozymes that cleave an attached RNA substrate. 1) Extension of an RNA primer (brown) by polymerization of NTPs on an RNA template that encodes HHR– RNA; 2) strand separation and hybridization of a second RNA primer (magenta), with attached 3′-biotinylated RNA substrate; 3) extension of the second primer to generate HHR+ RNA; 4) biotin capture on streptavidin magnetic beads; 5) cleavage of the attached RNA substrate, releasing the hammerhead from the beads; 6) reverse transcription, PCR amplification, and forward transcription to begin the next round of selective amplification.
The population of RNA polymerases obtained after the prior 52 rounds of directed evolution (17, 18, 22), from which the 52-2 polymerase was isolated, were covalently linked to an RNA primer (Primer1) and challenged to extend that primer on a separate, 3′-biotinylated RNA template (Tem2) using NTP substrates to yield the class I ligase (for sequences of oligonucleotides, see SI Appendix, Table S1). Following removal of the template RNA, the polymerase-linked extension products were captured on magnetic beads by hybridization to a bead-bound DNA primer (Rev2) that is complementary to the 3′ end of the polymerase. The DNA primer was extended by reverse transcriptase to inactivate the polymerase as an RNA-cDNA hybrid, thus preventing it from participating in the subsequent ligation reaction. One substrate for the ligase (S1) was attached to the cDNA and the other, which was 3′-biotinylated (S2), was provided free in solution. The ligation reaction was allowed to proceed for 30 min, after which the cDNA was released from the beads and any cDNAs that had been labeled by an active ligase were captured on streptavidin, then selectively copied to yield the opposing strand of DNA, which was PCR amplified. The primers for amplification were specific to the ligated products, adding further stringency to the selection process. The amplified DNA was then forward transcribed to yield a progeny population of polymerase ribozymes to begin the next round of directed evolution.
The evolution procedure was continued for 18 rounds, progressively decreasing the time allowed for polymerization from 20 to 1 h and reducing the concentration of Mg2+ from 200 to 83 mM. Mg2+ is a catalytic cofactor for the polymerase (23), and at high concentrations has been shown to reduce polymerase fidelity, presumably by stabilizing base mismatches (17). Following the final round of evolution (71 in total), the population was sequenced at depth across all of the rounds. A dominant sequence containing 10 mutations relative to the 52-2 polymerase, named 71-89 (Fig. 1B), was highly enriched over the later rounds of evolution and was chosen for comparison to the 52-2 polymerase in subsequent RNA-catalyzed evolution studies.
RNA-Catalyzed Evolution of Hammerhead Ribozymes.
The 52-2 polymerase and its predecessors can synthesize small functional ribozymes in a single-pass reaction, but are incapable of replicating those RNAs, which requires reciprocal synthesis of both the functional plus strand and the complementary minus strand. Polymerase fidelity plays a more critical role in RNA replication compared to RNA synthesis because errors can be introduced during the synthesis of either strand and thereby disrupt function. In addition, there are idiosyncratic sequence-dependent effects in the polymerization reaction (17), such that the efficient and accurate synthesis of a plus strand does not ensure similarly efficient and accurate synthesis of a corresponding minus strand.
The 52-2 polymerase is capable of synthesizing the hammerhead ribozyme in the presence of either 200 or 50 mM Mg2+, but the higher concentration is required for good yield and the lower concentration is required for good fidelity (17). However, the polymerase struggles to synthesize the minus strand, even in the presence of 200 mM Mg2+. Furthermore, reciprocal synthesis of plus and minus strands requires the polymerase to synthesize a primer binding site within each strand, flanking the hammerhead motif, which is beyond the capability of the 52-2 polymerase. Thus, to compare the ability of the 52-2 and 71-89 polymerases to propagate functional RNAs, it was necessary to modify the hammerhead ribozyme so that it could be replicated by the 52-2 polymerase.
For synthesis of the minus strand (HHR–), which is the complement of the hammerhead, the primer was designed to include the 5′-substrate binding arm and the last two nucleotides of the conserved “GAAA” portion of the motif. In addition, the central stem of the hammerhead (stem II) was modified so that it could be copied more readily, while still retaining hammerhead activity (Fig. 1C). This variant of the hammerhead, which will henceforth be referred to as “Seq0,” can be replicated by both the 52-2 and 71-89 polymerases. For 52-2, the reactions require 200 mM Mg2+ and generate HHR+ and HHR– in 18% and 23% yield, respectively after 24 h. The 71-89 polymerase, which was evolved to operate with higher fidelity in the presence of 50 mM Mg2+, yields 11% HHR+ and 7% HHR– after 24 h.
The two polymerases have a significantly different fidelity of synthesis. Beginning with the Seq0 hammerhead ribozyme as a template, the 52-2 polymerase synthesizes HHR– (in 200 mM Mg2+) with a single-pass fidelity of 85.6% per nucleotide. The resulting HHR– RNAs in turn direct the synthesis of new copies of HHR+ with 81.4% overall fidelity for the reciprocal synthesis of HHR– and HHR+. In contrast, the 71-89 polymerase (in 50 mM Mg2+) synthesizes HHR– with 90.9% single-pass fidelity, and carries out reciprocal synthesis of HHR– and HHR+ with 89.1% overall fidelity (SI Appendix, Tables S2 and S3). This substantial difference in copying fidelity has important consequences for the preservation of functional sequence information over the course of successive rounds of selective amplification.
Two parallel lineages of RNA-catalyzed evolution were carried out, using either the 52-2 or 71-89 polymerase in the presence of either 200 or 50 mM Mg2+, respectively, to replicate the hammerhead RNA (Fig. 1D). Starting with Seq0 HHR+ RNA, the polymerases first were required to synthesize complementary HHR– RNAs. Then the HHR+ template was removed, the full-length HHR– products were gel purified, and the HHR– RNAs were used as templates to synthesize new copies of HHR+ RNAs.
The primer for HHR+ synthesis (Primer3) was 5′-biotinylated and linked to the substrate RNA for the hammerhead ribozyme. The HHR+ products were captured on streptavidin-coated beads, and then incubated for 30 min in the presence of 20 mM Mg2+ to allow functional hammerheads to cleave the attached substrate, thereby becoming released from the beads. The released hammerheads were reverse transcribed and PCR amplified to provide an archive of the selected materials. A portion of these materials were forward transcribed to yield a progeny population of ribozymes to begin the next round of RNA-catalyzed replication and selection based on catalytic function. This process was continued for eight rounds for each lineage, after which materials from each round were analyzed by deep sequencing to determine the course of hammerhead evolution.
Sequence Changes Over the Course of Evolution.
The two lineages of RNA-catalyzed evolution of the hammerhead ribozyme had very different outcomes, with RNA-cleavage activity becoming lost in the 52-2 lineage, while it was maintained in the 71-89 lineage. The catalytic activity of the two evolving populations of HHR+ RNAs was measured after each round in comparison to that of the Seq0 hammerhead, based on the ability to cleave a separate RNA substrate (Fig. 2 A and B). It was not feasible to measure catalytic activity under the same on-bead format as used during evolution because the activity of the 52-2 lineage quickly fell to very low levels that could only be measured using purified materials.
Fig. 2.

RNA-catalyzed evolution of the hammerhead ribozyme. The 52-2 (orange) or 71-89 (blue) polymerase catalyzed replication of the hammerhead, with selection dependent on catalytic activity of the hammerhead. (A and B) Following each round, RNA-cleavage activity was determined for the population relative to that of Seq0 RNA (dark colored circles) and the fraction of RNA molecules consistent with the hammerhead motif was determined from the sequencing data (light colored circles). Values of the former are based on three replicates, with SE. (C and D) Violin plots indicate the distribution of mutations in the population after each round, relative to Seq0.
Over the course of evolution, the population replicated by the 52-2 polymerase exhibited progressively declining activity, falling to a barely detectable level by round 8. In contrast, for the population replicated by the 71-89 polymerase, catalytic activity was maintained throughout the eight rounds. The sequences of individual members of the two populations were matched to the biochemically defined sequence requirements for active hammerhead ribozymes (20). The same trend was observed, with the 52-2 population drifting away from sequences required to maintain catalytic activity, while the 71-89 population largely held fast to the functional motif (Fig. 2 A and B).
The two lineages showed distinct patterns in their accrual of sequence variation. The frequency of mutations within the evolving populations of RNAs was calculated as the Levenshtein distance from Seq0, based on the total number of substitutions, insertions, and deletions (Fig. 2 C and D). For the 52-2 lineage, more than half of the RNAs contained at least 5 mutations by round 3, and by round 6 most contained more than 10 mutations. For the 71-89 lineage, the number of mutations increased much more slowly, with an average of 3 mutations by round 3 and less than 5 mutations by round 8. Consistent with these observations, the final population from the 52-2 lineage could not enrich nor even maintain specific RNA sequences, whereas that from the 71-89 lineage could enrich or deplete specific sequences, consistent with selection based on their differential fitness (SI Appendix, Fig. S1 A and B).
Because the steps of reverse transcription and PCR amplification were carried out prior to each round of HHR– synthesis, it was important to determine whether enrichment or depletion of specific sequences might have been due to these protein-catalyzed steps rather than the RNA-catalyzed processes. Thus, beginning with the round 7 population from each lineage, the HHR+ RNAs were directly reverse transcribed and PCR amplified before sequencing, without involving replication by the polymerase ribozyme or RNA-catalyzed cleavage of the substrate RNA. For both the 52-2 and 71-89 polymerases, there was a roughly normal distribution of sequence changes, with a mean and SD of 0.75 ± 0.24 and 0.95 ± 0.32, respectively (SI Appendix, Fig. S1 C and D). There was no significant enrichment or depletion of particular sequences; thus, demonstrating that the protein-catalyzed steps did not introduce significant selection bias.
Formal measures of population diversity confirm that the 52-2 lineage is unable to enrich selectively advantageous sequences. The Shannon population entropy is a measure of sequence diversity in a population, maximized at 1.0 when every RNA in the population has a unique sequence (24). The entropy value for the 52-2 lineage reached 0.94 by round 3 and was 0.97 at round 8, corresponding to a nearly random distribution of rare and distinct sequences. In contrast, the 71-89 lineage plateaued at an entropy value of ~0.6 by round 4 and maintained that value through round 8 (SI Appendix, Fig. S2A). For the 52-2 lineage, the average sequence difference between any two RNAs (25) was ~40% at round 3 and ~50% at round 8, whereas for the 71-89 lineage an average difference of ~20% was maintained throughout rounds 2 through 8 (SI Appendix, Fig. S2B). In summary, the population replicated by the 52-2 polymerase decayed to a state of highly divergent, low-abundance sequences, whereas the population replicated by the 71-89 polymerase became enriched with clusters of high-abundance sequences.
Emergence of Novel Hammerhead Variants.
A simple clustering algorithm was used to identify individual sequences of high abundance, together with their closely related sequences, over the course of evolution (26). Separately, a neighbor-joining phylogeny was calculated for the most abundant sequences for each round of evolution (27). The results from these two approaches are largely congruent, with distinct clades or close paraphyletic groups in the phylogeny corresponding to distinct peaks of related sequences (Fig. 3A and SI Appendix, Fig. S3A).
Fig. 3.
Emergence of peak sequences and clusters over the course of evolution catalyzed by the 71-89 polymerase. (A) Phylogenetic tree of sequences that reached an abundance of ≥0.5% of the population by the third round or later. Seq0 and the most abundant peak sequences are labeled. (B) Abundance by round of the peak sequence (dark shading), sequences within the same cluster as the peak sequence that share a common ancestor as the peak (medium shading; bracketed in A), and the entire cluster surrounding the peak (light shading). These data are normalized to the highest abundance reached by the cluster, with the maximum percent abundance indicated by the number at the top right of the graph. (C) Sequence and secondary structure of Seq0 and the most abundant peak sequences, with mutations highlighted by colored circles.
For the 52-2 lineage, there was a cluster of sequences after round 1 centered about Seq0 and comprising ~50% of all RNAs in the population. During subsequent rounds, a succession of overlapping clusters arose and faded away, each containing one to two mutations relative to Seq0 (SI Appendix, Fig. S3). The fraction of total RNAs in these peaks fell below 20% by round 3 and below 10% by round 8. Any sequence cluster that arose that was distinct from Seq0 suffered the same fate, beginning as a small fraction of the total RNAs and fading by round 8.
For the 71-89 lineage, in contrast, the majority of sequences in the population belonged to one of multiple prominent clusters throughout the course of evolution. After round 1, there was a single large cluster centered about Seq0. During subsequent rounds, new sequence clusters emerged, centered about distinct peak sequences (Fig. 3A). The trajectories of the most abundant clusters were tracked over the eight rounds (Fig. 3B). The clusters were named after the peak sequence within each cluster.
The Seq0 cluster declined from the outset, initially replaced by a group of very closely related sequences that subsequently coalesced to form the Seq2 cluster, with a peak sequence containing two mutations relative to Seq0. By round 6, the Seq3 and Seq5 clusters emerged, containing five and three mutations, respectively, relative to Seq0. These two clusters remained prominent through round 8. Beginning at round 7, the Seq15 cluster expanded rapidly, increasing by more than 10-fold during the last two rounds of evolution. Seq15 contains six mutations relative to Seq0, including a single-nucleotide deletion that causes a frameshift within stem II. Finally, in round 8 the Seq35 cluster became notable, with a distinct constellation of five mutations relative to Seq0. In retrospect, Seq35 had gradually been increasing in abundance over the entire eight rounds, but did not constitute a significant fraction of the population until round 8. All of the peak sequences conform to the canonical hammerhead ribozyme motif (Fig. 3C), with no mutations at the most highly conserved nucleotide positions, two or more mutations within stem II, and zero or one mutations within stem I which pairs with the 3′ portion of the RNA substrate.
Biochemical Properties of the Evolved Hammerhead Ribozymes.
The Seq0, Seq2, Seq3, Seq5, Seq15, and Seq35 ribozymes were prepared by in vitro transcription of synthetic DNA and tested individually for their ability to be replicated by the 71-89 polymerase. The synthetic HHR+ RNAs were copied by the polymerase to yield complementary HHR– RNAs, which in turn were copied by the polymerase to yield HHR+ RNAs. Separately, synthetic HHR+ RNAs were tested for catalytic activity in the RNA-cleavage reaction, conducted in the same format as during RNA-catalyzed evolution.
All five evolved hammerhead variants are replicated more efficiently than Seq0 RNA, but are less active in the RNA-cleavage reaction (Fig. 4A). For each of these variants, there is a two- to threefold improvement in the synthesis of HHR– and a two- to eightfold improvement in the synthesis of HHR+ RNA. RNA-cleavage activity is reduced by 1.5- to threefold. The net effect, multiplying the yields from plus- and minus-strand synthesis and the requisite RNA-cleavage activity of the plus strand, is that the evolved variants have an overall three- to eightfold improvement in fitness compared to Seq0 (Fig. 4B). Seq15 has the highest overall fitness, with a three- and sevenfold improvement in HHR– and HHR+ synthesis, respectively, counterbalanced by a twofold decrease in catalytic activity.
Fig. 4.

Relative fitness of the most abundant peak variants compared to Seq0. (A) Measurements were made of HHR– synthesis (light blue bars), HHR+ synthesis (medium blue bars), and HHR+ catalytic activity (orange bars). Individual HHR+ RNAs that had been prepared by in vitro transcription were used as templates for polymerase-catalyzed synthesis of HHR–, then the resulting HHR– products were used as templates for polymerase-catalyzed synthesis of HHR+. Separately, HHR+ RNAs with attached RNA substrate were prepared synthetically and tested for RNA-cleavage activity under the same conditions as during RNA-catalyzed evolution. (B) Overall fitness was estimated by the multiplicative effects of HHR– synthesis, HHR+ synthesis, and HHR+ catalytic activity (light blue bars); also adjusting for differences in specific activity due to sequence-specific effects on copying fidelity (medium blue bars); both in comparison to fitness based on relative enrichment during a single competitive round of evolution involving Seq0 and the five peak sequences (dark blue bars). Values are based on three replicates, with SE.
The fitness of an evolved variant is not determined by the fitness of an individual reference sequence, but rather by the distribution of variants about that reference sequence that evolve together as a “quasispecies” (8, 9, 28). Because the 71-89 polymerase operates with 89.1% fidelity per round of replication, each round results in an average of two mutations per sequence. Thus a quasispecies distribution is inevitable, and a better measure of fitness is the net enrichment of progeny RNAs that derive from a parental quasispecies.
With the current selection format, it is not feasible to track the direct descent of members within a complex population. Thus to estimate the fitness of the quasispecies centered about a reference sequence, the Seq0 RNA and each of the five most prominent variants were carried through one round of RNA-catalyzed evolution, both separately and as an equimolar mixture of all six. Following HHR– synthesis, HHR+ synthesis, and selection for hammerhead activity, the population of RNAs were analyzed by deep sequencing. By comparing the sequence progression of each starting variant in isolation to that within the combined mixture, it was possible to determine the fraction of the progeny population that derives from each of the parental sequences.
The enrichment of copy number of each variant relative to Seq0 provides a measure of fitness that is very similar to that calculated by multiplying the relative yields of plus- and minus-strand synthesis and RNA-cleavage activity (Fig. 4B). Only Seq35 exhibits substantially greater fitness as estimated from replicability and catalytic activity compared to competitive enrichment. The most rapidly growing peak sequences in the later rounds of evolution, Seq3, Seq5, and Seq15, have the highest fitness by either measure, exceeding that of Seq0 by four- to ninefold.
It is possible that certain hammerhead sequences might be prone to higher or lower fidelity of copying during RNA replication or are more likely to give rise to disabling mutations. This possibility was investigated by examining the distribution of progeny sequences resulting from one round of replication and comparing those sequences to the sequence requirements for a functional hammerhead (20, 29). The specific activity of each variant was estimated by multiplying its observed activity by the fraction of functional progeny. The relative fitness of each variant compared to Seq0, based on combined replication yield and estimated specific activity, more closely matches that based on observed enrichment over one round of evolution, notably correcting the overestimate of fitness for Seq35 (Fig. 4B). In general, however, the further inclusion of sequence-specific effects changes the estimates of relative fitness by an average of only~20% for the five peak sequences.
Visualizing the RNA-Catalyzed Evolution of RNA.
To better visualize the evolution of the two populations over time, the RNA sequences were mapped from the high-dimensional sequence space defined by the 27 variable nucleotide positions to a two-dimensional plane defined by Seq0 and reference sequences that emerged in the later rounds of each lineage. For the 52-2 lineage the reference was a highly divergent RNA that no longer conforms to the hammerhead motif; for the 71-89 lineage the references were Seq15 and Seq35, which were the most divergent of the peak sequences to emerge during the eight rounds of evolution. The density distribution of functional hammerheads within the two-dimensional plane can be represented as a contour map that encompasses all HHR+ sequences that had been observed in either lineage (Fig. 5).
Fig. 5.

Scatterplots of the evolving populations of hammerhead ribozymes. For each round of evolution, catalyzed by either the (A) 52-2 or (B) 71-89 polymerase, 100,000 sampled RNA sequences were mapped onto a two-dimensional plane defined by the first (PC1) and second (PC2) principal components of variation relative to Seq0 and three other reference sequences. Even numbered rounds are shown here, with all eight rounds shown in Movie S1. The position of Seq0 is indicated by a white circle and contour lines indicate diminishing levels of hammerhead functionality radiating out from Seq0, with the highest and lowest contours representing >90% and <10% functionality, respectively. Sequences present at >0.05% abundance are indicated by colored circles and individual sequences are shown as dots colored according to the corresponding cluster. Sequences that do not belong to any cluster are shown in gray.
Visualizing the population of RNAs as a scatterplot on a two-dimensional plane provides a time-lapsed view of how the population in each lineage explores sequence space over the course of evolution (Movie S1). For the 52-2 lineage, the distribution of sequences quickly diverges from the domain of functional hammerheads, with a progressively dwindling fraction of the RNAs clinging to the region of functionality (Fig. 5A). By round 8, nearly all of the sequences have drifted far from the region of functionality. In contrast, the 71-89 lineage proceeds through a succession of sequence clusters, corresponding to a quasispecies distribution centered about Seq2, then Seq3 and Seq5, then Seq15, and finally Seq35 (Fig. 5B). Throughout the evolution of the 71-89 lineage, sequence divergence of the population increases over time, but occurs primarily along regions of high-level hammerhead activity. Sequence diversity is shaped by selection, expanding into new regions of high functionality as the evolving population explores variants that are replicated efficiently while preserving hammerhead activity.
Discussion
The propagation of genetic information is critically dependent on the fidelity of the copying mechanism. For the replication of RNA, both the plus and minus strands must be copied with sufficient fidelity that the selective advantage of the fittest individuals exceeds the probability of producing an error copy of those individuals. The earlier 52-2 RNA polymerase has a single-pass fidelity of 85.6% per nucleotide and fidelity for reciprocal synthesis of 81.4% for the variant of the hammerhead ribozyme used in this study. Thus, the 52-2 polymerase fails to meet the fidelity criterion, even for this small functional RNA. However, a directed evolution campaign to develop higher-fidelity polymerases was successful, resulting in the 71-89 polymerase with a single-pass fidelity of 90.9% and fidelity for reciprocal synthesis of 89.1% (SI Appendix, Tables S2 and S3). The improved polymerase has lower fidelity for the synthesis of HHR– compared to HHR+, but the fidelity of reciprocal synthesis is sufficient to enable Darwinian evolution of the hammerhead’s catalytic function.
Whereas the hammerhead ribozymes in the 52-2 lineage quickly diverged toward random sequences, those in the 71-89 lineage evolved a succession of variants with increased fitness compared to the starting Seq0 hammerhead. The initial variants that emerged contained only two or three mutations relative to Seq0, whereas the later variants contained five or six mutations, which would have been more difficult to access in the early rounds of evolution. Sequence clustering and phylogenetic analysis of the 71-89 lineage revealed that the population came to be dominated by a succession of five distinct peak sequences and associated clusters of closely related sequences (Fig. 3A). Two sequence clusters emerged at the outset, centered about peak sequences Seq2 and Seq5, with two and three mutations, respectively (Fig. 3 B and C). The Seq2 cluster remained prevalent throughout the course of evolution and was soon accompanied by the Seq35 cluster, with five mutations, which slowly became more prevalent over successive rounds. The Seq5 cluster also remained prevalent throughout the course of evolution, later accompanied by the Seq3 and Seq15 clusters with five and six mutations, respectively, the latter of which only reached significant abundance in the last two rounds.
All of the most abundant evolved variants contain mutations that alter stem II of the hammerhead ribozyme. This stem is required for function, but the sequence of the stem can vary so long as the stem structure is maintained (20). The polymerase ribozyme, despite its ability to operate on a broad range of sequences, struggles to copy highly structured templates and has reduced copying efficiency even for moderately structured templates (17). Stem II within HHR+ RNA and its complement within HHR– RNA impose a modest obstacle to replication, making it selectively advantageous for variants to emerge that destabilize the stem and its complement, but not in a way that prevents the stem from forming. Each of the major evolved variants contains such mutations, destabilizing base pairs within stem II or altering the closing UNCG tetraloop due to nucleotide substitutions or deletions. Most of the peak sequences also contain a single mutation within stem I that likely has similar benefit for RNA replication. Only the 3′ portion of stem I is free to mutate because the 5′ portion constitutes part of the attached RNA substrate (Fig. 1 C and D).
The fitness of each peak sequence was determined experimentally, which revealed that each is copied more readily than Seq0 for the synthesis of both HHR– and HHR+ RNA (Fig. 4A). However, each peak sequence has a reduced level of RNA-cleavage activity compared to Seq0. In all cases, the improved replicability substantially outweighs the reduced activity (Fig. 4B). Seq15 and Seq3, which were the two fastest growing sequence clusters over the course of evolution, have the highest and second highest relative fitness, respectively, based on the combined parameters of replicability and specific catalytic activity. These variants rank in the same order when fitness is determined based on relative enrichment of progeny RNAs in a competition experiment beginning with an equimolar mixture of Seq0 and all five peak variants, As a consequence, the progeny of Seq15 and Seq3 each comprise ~30% of the population following a single round of evolution, compared to only 4% for Seq0.
Given the high mutation rates in both evolutionary lineages, the object of selection is not a single peak sequence, but rather a constellation of variants that include the peak sequence and closely related sequences that evolve together as a quasispecies (8, 9, 28). The structure of a quasispecies is determined both by the available pathways for mutational interconversion among its members and by the underlying fitness distribution of those members. The hammerhead ribozyme has been extensively studied, including with regard to the detailed sequence requirements for its catalytic activity (20, 29). Thus one can predict with good accuracy the distribution of functional hammerheads in an evolving population. The population of RNAs can be represented as a two-dimensional scatterplot, superimposed on a contour map that denotes gradated levels of congruence with the functional hammerhead motif (Fig. 5 and Movie S1).
Over successive rounds of evolution, new quasispecies emerged as clouds of sequences centered on a small number of high-abundance sequences. Due to the poor fidelity of the 52-2 polymerase, the structure of a quasispecies cannot be maintained around the starting sequence, nor established around any new peak sequence that starts to emerge in that lineage. Instead, the population drifts toward random sequences, forsaking the domain of hammerhead functionality. In contrast, the higher-fidelity 71-89 polymerase enables an evolutionary progression of successive peak sequences and corresponding quasispecies that move progressively further from the starting sequence, while remaining within the central portion of the contour map for hammerhead functionality.
Even though the 52-2 lineage lost hammerhead functionality, it could still be propagated (Fig. 2A). This behavior partly reflects the small but dwindling portion of the population that retain catalytic activity, but is also due to imperfect stringency of selection. Catalytically inactive RNAs can still be released from the streptavidin-coated beads due to uncatalyzed cleavage of the RNA substrate during the 30-min incubation in the presence of 20 mM MgCl2. In addition, there likely is imperfect exclusion of a small amount of uncleaved materials. These mechanisms can be significant relative to the small amount of functional hammerheads produced by the low-fidelity 52-2 polymerase but have little consequence compared to the strong signal of hammerhead-catalyzed cleavage that is maintained throughout the 71-89 lineage (Fig. 2B).
The 71-89 polymerase contains 10 mutations relative to the 52-2 polymerase (Fig. 1B), which result in an increase in replication fidelity from 81.4% to 89.1% per nucleotide for a complete cycle of HHR– synthesis followed by HHR+ synthesis (SI Appendix, Table S3). Much of the improved fidelity can be attributed to the ability of the 71-89 polymerase to operate in the presence of 50 mM Mg2+, whereas the 52-2 polymerase requires 200 mM Mg2+ to replicate RNAs as complex as the hammerhead (17). A complete cycle of replication requires reciprocal synthesis of 27 nucleotides, which corresponds to a maximum permissible error rate of ~1/27 = 3.7% per nucleotide. The 71-89 polymerase, with a fidelity of 89.1% for reciprocal synthesis of HHR– and HHR+, would seem to exceed this error rate. However, experimentally determined fitness landscapes for other RNA-cleaving ribozymes suggest that the maximum permissible error rate can be at least sevenfold higher than simply the inverse of sequence length (30). This observation derives in part from a high-stringency selection process that enables superior sequences to prevail against their competitors, thereby easing the error threshold by a factor of the natural log of the superiority of the most advantageous variants (8). In addition, the cooperative nature of RNA folding ensures that many mutations within the catalytic motif have little or no deleterious effect. The 27-nucleotide region of the hammerhead that was subject to reciprocal copying contains only 10 nucleotides that are completely intolerant of mutation (20, 29).
The 71-89 polymerase contains 182 nucleotides (Fig. 1B), which would correspond to a maximum permissible error rate of ~1/182 = 0.5% per nucleotide for a complete cycle of plus- and minus-strand synthesis. However, as with the hammerhead, the error threshold would be much more permissive if selection stringency is high and the cooperative folded structure of the polymerase renders it tolerant to mutation at a significant fraction of the nucleotide positions. Under suitable selection conditions, a replication fidelity of 97 to 98% would likely be sufficient to enable the RNA-catalyzed evolution of the polymerase ribozyme itself. Although it is difficult to predict the future path of evolution, both in nature and in directed evolution experiments, such an increase in fidelity does not seem unattainable.
Materials and Methods
Materials.
The sequences of all oligonucleotides used in this study are listed in SI Appendix, Table S1. Synthetic oligonucleotides were either purchased from IDT or prepared by solid-phase synthesis using an Expedite 8909 DNA/RNA synthesizer, with reagents and phosphoramidites from either Glen Research or ChemGenes. All other RNAs, including RNA templates and polymerase ribozymes, were prepared by in vitro transcription (SI Appendix, Methods) and purified by denaturing polyacrylamide gel electrophoresis (PAGE). Assays of polymerase or hammerhead ribozyme activity used fluorescently labeled RNA primers or substrates, respectively. The reaction products were separated by PAGE, imaged using an Amersham Typhoon RGB laser scanner, quantified using ImageQuant TL software, and plotted using GraphPad Prism. Hot Start Taq, OneTaq, Q5 high-fidelity DNA polymerase, T4 RNA ligase, T4 RNA ligase 2, K227Q T4 RNA ligase 2, and Universal miRNA Cloning Linker were from New England Biolabs. SuperScript IV reverse transcriptase, Turbo DNase, MyOne C1 streptavidin magnetic beads, and Pierce high-capacity streptavidin agarose beads were from Thermo Fisher Scientific. Nucleoside 5′-triphosphates were from Chem-Impex International, pCp-biotin, γ-(2-azidoethyl)-ATP, and sulfo-cyanine5-azide were from Jena Bioscience, and all other chemical reagents were from Sigma-Aldrich.
Directed Evolution of Polymerase Ribozymes.
Beginning with a population of polymerase ribozymes obtained after 52 rounds of directed evolution (17), error-prone PCR was performed (31) and a new 5′-terminal region was added during PCR using primer Fwd1, which contained an AvaII restriction site. In vitro transcription of polymerase-encoding DNA was carried out in the presence of γ-(2-azidoethyl)-ATP, which enabled attachment via click chemistry (SI Appendix, Methods) of a 5′-hexynylated RNA primer (Primer1). Following PAGE purification, 50 nM polymerase-primer conjugates were mixed with 100 nM 3′-biotinylated RNA template (Tem2) and 75 nM cofactor oligodeoxynucleotide (Tem4), which were annealed by heating at 80 °C for 30 s, then cooling to 17 °C. The cofactor oligo, which binds to the new 5′ end of the polymerase, was found to increase polymerase activity and was included in all polymerase-catalyzed reactions.
The annealed materials were added to a reaction mixture containing 4 mM of each NTP, various concentrations of MgCl2, 50 mM Tris-HCl (pH 8.3), and 0.05% Tween-20, which were incubated at 25 °C for various times. The reaction was quenched by adding an equal volume of 250 mM EDTA, 500 mM NaCl, 5 mM Tris-HCl (pH 8.0), and 0.025% Tween-20, then mixed with 5 µg streptavidin magnetic beads per pmol of biotinylated template and incubated with agitation at 23 °C for 1 h. The beads had been pre-blocked by incubating with 1 mg/mL tRNA. The beads were washed twice with urea buffer [8 M urea, 1 mM EDTA, and 10 mM Tris-HCl (pH 8.0)], and the polymerase-primer conjugates were eluted from the RNA template in NaOH buffer (25 mM NaOH, 1 mM EDTA, and 0.05% Tween-20), then neutralized in 100 mM Tris-HCl (pH 7.5), ethanol precipitated, and the full-length products were purified by PAGE.
The purified products were captured on magnetic beads that had been derivatized with DNA oligo Rev2, which is complementary to the 3′ end of the polymerase (SI Appendix, Methods). The bead-bound polymerase was reverse transcribed in situ. Then in a concerted reaction using both AvaII restriction enzyme and T4 DNA ligase, the 5′ substrate for the class I ligase (S1) was installed at the 3′ end of the polymerase cDNA. The 3′ substrate (S2), which was 3′-biotinylated, was added to the solution and the bead-bound ligase ribozymes were challenged to join the two substrates in the presence of 60 mM MgCl2, 200 mM KCl, 0.6 mM EDTA, and 50 mM Tris-HCl (pH 8.3) at 25 °C in 30 min.
The ligated products and corresponding polymerase cDNAs were cleaved off the beads using EcoRV (SI Appendix, Methods). The released molecules were captured on streptavidin magnetic beads, washed three times with NaOH buffer and twice with urea buffer, eluted in formamide buffer [95% formamide, 100 nM EDTA, and 1 mM Tris-HCl (pH 8.0)] by heating at 95 °C for 5 min, then ethanol precipitated. The cDNAs were reverse transcribed using primer Fwd2, which is specific for substrate S2, then PCR amplified using primers Rev1 and Fwd3, the latter of which is specific for the ligation junction between S1 and S2. Nested PCR with primers Fwd4 and Rev1 provided materials to begin the next round of evolution. New mutations were introduced by error-prone PCR (31) after rounds 53 through 58, 60, 61, 63, and 67 through 70.
Directed Evolution of Hammerhead Ribozymes.
Evolution was initiated with the Seq0 HHR+ template RNA (HHR+(0)), which was prepared by in vitro transcription (SI Appendix, Methods). During each round of evolution, 100 nM HHR+ template RNA was mixed with 100 nM polymerase ribozyme, 100 nM RNA Primer2 containing a photocleavable biotin moiety, and 200 nM cofactor oligo Tem4 when using the 71-89 polymerase. The RNAs were annealed, then incubated for 24 h under standard polymerase conditions, either with 200 mM MgCl2 for the 52-2 lineage or with 50 mM MgCl2 and 8% PEG8000 for the 71-89 lineage. The reaction was quenched with EDTA and the products were incubated with 10 μL streptavidin agarose beads per 200 pmol of RNA primer, which had been pre-blocked with tRNA. The HHR– RNA was washed three times with NaOH buffer and twice with urea buffer, the beads were again blocked with 1 mg/mL tRNA, and the extension products were eluted by exposure to 350 nm UV irradiation in the presence of 25 mM NaCl, 1 mM EDTA, and 10 mM Tris-HCl (pH 8.0) at 23 °C for 30 min. The full-length products were purified by PAGE.
One hundred nanomolar purified HHR– RNA was mixed with 100 nM RNA polymerase ribozyme, 80 nM RNA Primer3 containing a biotinylated hammerhead substrate RNA at its 5′ end, and 1 μM blocking oligodeoxynucleotide Tem9, which masked the hammerhead substrate during the polymerization reaction. The RNAs were annealed and reacted as described above for HHR– synthesis. The reactions were quenched with EDTA and the products were captured on streptavidin magnetic beads that had been pre-blocked with tRNA, then washed three times with NaOH buffer and twice with urea buffer. The washed beads were suspended in 20 mM MgCl2, 50 mM Tris-HCl (pH 8.0), and 0.05% Tween-20 and incubated at 25 °C for 30 min to elute functional hammerhead ribozymes. The supernatant was passed through a 0.2-μm filter to remove residual magnetic beads, and the RNAs were reverse transcribed and PCR amplified. New copies of HHR+ RNA were transcribed from the amplified PCR products. A mock round 8 of evolution was performed using the HHR+ RNA from round 7 as input and carrying out reverse transcription and PCR amplification, but omitting the steps of RNA-catalyzed amplification and hammerhead cleavage.
Biochemical Characterization of Evolved Hammerhead Ribozymes.
The catalytic activity of the population of hammerhead ribozymes obtained after each round of evolution was assayed using 0.2 µM HHR+ RNA prepared by in vitro transcription and 0.24 µM of a separate 5′-fluorescently labeled RNA substrate (S3), which were incubated under the same conditions and for the same amount of time as used to select functional hammerheads. The products were separated by PAGE to determine the yield of cleaved products.
The activity of individual hammerhead ribozymes was determined by their ability to cleave an attached RNA substrate under the same conditions as during RNA-catalyzed evolution. The hammerhead-substrate constructs were prepared by chemical synthesis, incorporating a Cy5 label (SI Appendix, Methods). The constructs were captured on streptavidin magnetic beads, washed, and allowed to undergo substrate cleavage for 30 min. The cleaved RNA was collected in the supernatant, then the uncleaved RNA was eluted separately using formamide, and both were analyzed by PAGE to determine the yield of cleaved products.
HHR+ RNAs that had been prepared by in vitro transcription were used as templates for the polymerase-catalyzed synthesis of HHR– by extension of RNA Primer2 under the same conditions as during RNA-catalyzed evolution. HHR– RNAs that were prepared using the 71-89 polymerase were similarly used as templates for the polymerase-catalyzed synthesis of HHR+ by extension of RNA Primer4. The extension products of both reactions were analyzed by PAGE to determine the yield of full-length products.
Sequencing of Evolved Hammerhead Ribozymes.
The PCR products from each round of RNA-catalyzed evolution were subject to deep sequencing, together with corresponding HHR– RNAs and HHR+ RNAs that had not undergone hammerhead cleavage and had been reverse transcribed and PCR amplified. All three sets of cDNAs were prepared for sequencing by nested PCR amplification and were sequenced by the Salk Next Generation Sequencing Core on an Illumina NextSeq2000 with a 300-cycle paired-end run. The sequence reads were trimmed, filtered, and merged using a previously described analytical pipeline (17) (SI Appendix, Methods).
Sequences of HHR– and pre-cleaved HHR+ RNAs from the first round of evolution were processed separately to determine the fidelity of a half- and full-cycle of RNA-catalyzed replication, respectively. The sequences were aligned to that of Seq0 using bowtie2 v2.4.2 (32), and the frequency of substitutions, insertions, and deletions was determined for each of the 27 nucleotide positions that were free to vary (Fig. 1C).
Sequences of pre-cleaved and cleaved HHR+ RNAs were processed using a custom Python script to determine the frequency of each distinct sequence in each round of evolution (SI Appendix, Methods). Sequences that included the strictly conserved hammerhead nucleotides 5′-CUGANGA… GAAA-3′, together with a Watson–Crick base pair at the base of stem I and either an R:Y or Y:Y pair at the base of stem II, were identified as matching the biochemically defined hammerhead motif (20, 29). A peak-finding algorithm (26) identified local peak sequences in each population (above a minimum threshold of 0.1%) and joined sequences within two mutations of the most abundant nearby peak in a greedy fashion. For each sequence, the Levenshtein distance was calculated to Seq0 and three reference sequences.
Statistical properties of the sequences in the evolving populations were determined using custom scripts in R. The average distance between sequences in each population of cleaved HHR+ RNAs was determined by randomly sampling 100,000 pairs of sequences from the list of distinct sequences, based on the frequency of the sequence in a given round, and averaging the Levenshtein distance between each pair divided by the number of variable nucleotides in the first member of the pair. The normalized Shannon population entropy, which was also determined from a sample of 100,000 sequences, is defined as the sum over all distinct sequences: ∑ Fs × ln(Fs)/ln(1/N), where Fs is the frequency of each distinct sequence in the sample and N is the total number of sequences in the population. Sub-sampling of sequences was carried out to ensure that entropy values were determined at the same sequencing depth (33).
Relative changes in frequency between round 7 and either round 8 or a mock round 8 were calculated for each sequence that was present in round 7 at >0.01% frequency, corresponding to a sequencing depth of at least 50 reads. Phylogenetic trees rooted to Seq0 were generated using the neighbor-joining algorithm of Saitou and Nei (27), encompassing all sequences that reached a maximum frequency >0.1% for the 52-2 lineage and >0.5% for the 71-89 lineage during rounds 3 to 8 of evolution.
Yield of RNA Progeny After a Single Round of Evolution.
A single mock round of hammerhead evolution was performed using the 71-89 polymerase and the Seq0, Seq2, Seq3, Seq5, Seq15, and Seq 35 HHR+ template RNAs, both individually and as an equimolar mixture. The resulting HHR+ RNAs were reverse transcribed, PCR amplified, and sequenced. The frequency of cleaved HHR+ RNA sequences in the mixed population were fit to a multiple linear regression of the frequencies of sequences in each population that were propagated individually. Regression coefficients for the individually propagated populations were used to assign the fractional contribution of corresponding starting RNAs to progeny RNAs in the mixed population. The predicted abundance of progeny for a given starting RNA was calculated by multiplying the frequency of each RNA in the mixed starting population by a relative fitness value that was determined biochemically (Fig. 4). The predicted fraction of functional hammerheads among the replication products was estimated by the fraction of pre-cleaved HHR+ RNA sequences from each individually propagated population that are consistent with the hammerhead motif.
Scatterplot Analysis of Evolving Population.
A two-dimensional map of the evolving populations in sequence space was constructed based on the Levenshtein distance from each distinct sequence in the population to four reference sequences. The first two principal components of variation between reference sequences reflect: for PC1, proximity to functional hammerhead sequences versus sequences that have diverged from the hammerhead motif; and for PC2, proximity to Seq15 versus Seq35 (SI Appendix, Methods). Based on their distances to the four reference sequences, the positions of all distinct cleaved HHR+ sequences were projected onto the plane defined by PC1 and PC2 and plotted using a custom script in R (SI Appendix, Methods). The density of hammerhead functionality across the two-dimensional plane was estimated as the local average fraction of all sequences in the cleaved HHR+ RNA populations that match the functional hammerhead motif.
Supplementary Material
Appendix 01 (PDF)
Animated scatterplots of the evolving populations of hammerhead ribozymes. For either the 52-2 (left) or 71-89 (right) lineage, the population of RNA sequences were mapped onto a 2-dimensional plane defined by the starting sequence (Seq0) and 3 reference sequences. The position of Seq0 is indicated by a white circle and contour lines indicate diminishing levels of hammerhead functionality radiating out from Seq0, with the highest and lowest contours representing >90% and <10% functionality, respectively. Sequences present at >0.05% abundance are indicated by colored circles and individual sequences are shown as dots colored according to the corresponding cluster. Sequences that do not belong to any cluster are shown in gray.
Acknowledgments
We thank Wesley Cochrane for guidance on methods for oligonucleotide derivatization of magnetic beads. This work was supported by NASA grant 80NSSC22K0973 and Simons Foundation Grant 287624.
Author contributions
N.P., D.P.H., and G.F.J. designed research; N.P. and D.P.H. performed research; N.P., D.P.H., and G.F.J. analyzed data; and D.P.H. and G.F.J. wrote the paper.
Competing interests
The authors declare no competing interest.
Footnotes
This article is a PNAS Direct Submission.
Contributor Information
David P. Horning, Email: dhorning@salk.edu.
Gerald F. Joyce, Email: gjoyce@salk.edu.
Data, Materials, and Software Availability
Sequencing data, bioinformatics pipeline, and software script data have been deposited in Dryad Digital Repository (https://doi.org/10.5061/dryad.rxwdbrvgs) (34).
Supporting Information
References
- 1.Crick F. H. C., The origin of the genetic code. J. Mol. Biol. 38, 367–379 (1968). [DOI] [PubMed] [Google Scholar]
- 2.Gilbert W., The RNA world. Nature 319, 618 (1986). [Google Scholar]
- 3.Joyce G. F., Szostak J. W., Protocells and RNA self-replication. Cold Spring Harb. Perspect. Biol. 10, a034801 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Inoue T., Orgel L. E., A nonenzymatic RNA polymerase model. Science 219, 859–862 (1983). [DOI] [PubMed] [Google Scholar]
- 5.Orgel L. E., Molecular replication. Nature 358, 203–209 (1992). [DOI] [PubMed] [Google Scholar]
- 6.Szostak J. W., The eightfold path to non-enzymatic RNA replication. J. Syst. Chem. 3, 2 (2012). [Google Scholar]
- 7.Prywes N., Blain J. C., Del Frate F., Szostak J. W., Nonenzymatic copying of RNA templates containing all four letters is catalyzed by activated oligonucleotides. eLife 5, e17756 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Eigen M., Selforganization of matter and the evolution of biological macromolecules. Naturwissenschaften 58, 465–523 (1971). [DOI] [PubMed] [Google Scholar]
- 9.Eigen M., McCaskill J., Schuster P., Molecular quasi-species. J. Phys. Chem. 92, 6881–6891 (1988). [Google Scholar]
- 10.Ekland E. H., Bartel D. P., RNA-catalysed RNA polymerization using nucleoside triphosphates. Nature 383, 373–376 (1996). [DOI] [PubMed] [Google Scholar]
- 11.Johnston W. K., Unrau P. J., Lawrence M. S., Glasner M. E., Bartel D. P., RNA-catalyzed RNA polymerization: Accurate and general RNA-templated primer extension. Science 292, 1319–1325 (2001). [DOI] [PubMed] [Google Scholar]
- 12.Zaher H. S., Unrau P. J., Selection of an improved RNA polymerase ribozyme with superior extension and fidelity. RNA 13, 1017–1026 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Ekland E. H., Szostak J. W., Bartel D. P., Structurally complex and highly active RNA ligases derived from random RNA sequences. Science 269, 364–370 (1995). [DOI] [PubMed] [Google Scholar]
- 14.Wochner A., Attwater J., Coulson A., Holliger P., Ribozyme-catalyzed transcription of an active ribozyme. Science 332, 209–212 (2011). [DOI] [PubMed] [Google Scholar]
- 15.Attwater J., Raguram A., Morgunov A. S., Gianni E., Holliger P., Ribozyme-catalysed RNA synthesis using triplet building blocks. eLife 7, e35255 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Cojocaru R., Unrau P. J., Processive RNA polymerization and promoter recognition in an RNA world. Science 371, 1225–1232 (2021). [DOI] [PubMed] [Google Scholar]
- 17.Portillo X., Huang Y.-T., Breaker R. R., Horning D. P., Joyce G. F., Witnessing the structural evolution of an RNA enzyme. eLife 10, e71557 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Tjhung K. F., Shokhirev M. N., Horning D. P., Joyce G. F., An RNA polymerase ribozyme that synthesizes its own ancestor. Proc. Natl. Acad. Sci. U.S.A. 117, 2906–2913 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Forster A. C., Symons R. H., Self-cleavage of virusoid RNA is performed by the proposed 55-nucleotide active site. Cell 50, 9–16 (1987). [DOI] [PubMed] [Google Scholar]
- 20.Ruffner D. E., Stormo G. D., Uhlenbeck O. C., Sequence requirements of the hammerhead RNA self-cleavage reaction. Biochemistry 29, 10695–10702 (1990). [DOI] [PubMed] [Google Scholar]
- 21.Ekland E. H., Bartel D. P., The secondary structure and sequence optimization of an RNA ligase ribozyme. Nucleic Acids Res. 23, 3231–3238 (1995). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Horning D. P., Joyce G. F., Amplification of RNA by an RNA polymerase ribozyme. Proc. Natl. Acad. Sci. U.S.A. 113, 9786–9791 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Glasner M. E., Bergman N. H., Bartel D. P., Metal ion requirements for structure and catalysis of an RNA ligase ribozyme. Biochemistry 41, 8103–8112 (2002). [DOI] [PubMed] [Google Scholar]
- 24.Volkenstein M. V., “Informational aspects of evolution” in Physical Approaches to Biological Evolution, A. Kübler, Ed. (Springer, 2012), pp. 335–366. [Google Scholar]
- 25.Nei M., Molecular Evolutionary Genetics (Columbia University Press, 2019). [Google Scholar]
- 26.Janzen E., et al. , Emergent properties as by-products of prebiotic evolution of aminoacylation ribozymes. Nat. Commun. 13, 3631 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Saitou N., Nei M., The neighbor-joining method: A new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4, 406–425 (1987). [DOI] [PubMed] [Google Scholar]
- 28.Domingo E., Schuster P., “What is a quasispecies? Historical origins and current scope” in Quasispecies: From Theory to Experimental Systems, Domingo E., Schuster P., Eds. (Springer, 2016), pp. 1–22. [DOI] [PubMed] [Google Scholar]
- 29.Blount K. F., Uhlenbeck O. C., The structure-function dilemma of the hammerhead ribozyme. Annu. Rev. Biophys. Biomol. Struct. 34, 415–440 (2005). [DOI] [PubMed] [Google Scholar]
- 30.Kun A., Santos M., Szathmáry E., Real ribozymes suggest a relaxed error threshold. Nat. Genet. 37, 1008–1011 (2005). [DOI] [PubMed] [Google Scholar]
- 31.Cadwell R. C., Joyce G. F., Randomization of genes by PCR mutagenesis. Genome Res. 2, 28–33 (1992). [DOI] [PubMed] [Google Scholar]
- 32.Langmead B., Salzberg S., Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Gregori J., et al. , Viral quasispecies complexity measures. Virology 493, 227–237 (2016). [DOI] [PubMed] [Google Scholar]
- 34.Papastavrou N., Horning D. P., Joyce G. F., Deep sequencing datasets from: RNA-Catalyzed Evolution of Catalytic RNA. Dryad. 10.5061/dryad.rxwdbrvgs. Deposited 7 December 2023. [DOI]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Appendix 01 (PDF)
Animated scatterplots of the evolving populations of hammerhead ribozymes. For either the 52-2 (left) or 71-89 (right) lineage, the population of RNA sequences were mapped onto a 2-dimensional plane defined by the starting sequence (Seq0) and 3 reference sequences. The position of Seq0 is indicated by a white circle and contour lines indicate diminishing levels of hammerhead functionality radiating out from Seq0, with the highest and lowest contours representing >90% and <10% functionality, respectively. Sequences present at >0.05% abundance are indicated by colored circles and individual sequences are shown as dots colored according to the corresponding cluster. Sequences that do not belong to any cluster are shown in gray.
Data Availability Statement
Sequencing data, bioinformatics pipeline, and software script data have been deposited in Dryad Digital Repository (https://doi.org/10.5061/dryad.rxwdbrvgs) (34).

