Skip to main content
Biophysical Journal logoLink to Biophysical Journal
. 2019 Jul 10;117(3):520–532. doi: 10.1016/j.bpj.2019.06.037

A Polymer Physics Framework for the Entropy of Arbitrary Pseudoknots

Ofer Kimchi 1,, Tristan Cragnolini 2, Michael P Brenner 3,4, Lucy J Colwell 2,∗∗
PMCID: PMC6697467  PMID: 31353036

Abstract

The accurate prediction of RNA secondary structure from primary sequence has had enormous impact on research from the past 40 years. Although many algorithms are available to make these predictions, the inclusion of non-nested loops, termed pseudoknots, still poses challenges arising from two main factors: 1) no physical model exists to estimate the loop entropies of complex intramolecular pseudoknots, and 2) their NP-complete enumeration has impeded their study. Here, we address both challenges. First, we develop a polymer physics model that can address arbitrarily complex pseudoknots using only two parameters corresponding to concrete physical quantities—over an order of magnitude fewer than the sparsest state-of-the-art phenomenological methods. Second, by coupling this model to exhaustive enumeration of the set of possible structures, we compute the entire free energy landscape of secondary structures resulting from a primary RNA sequence. We demonstrate that for RNA structures of ∼80 nucleotides, with minimal heuristics, the complete enumeration of possible secondary structures can be accomplished quickly despite the NP-complete nature of the problem. We further show that despite our loop entropy model’s parametric sparsity, it performs better than or on par with previously published methods in predicting both pseudoknotted and non-pseudoknotted structures on a benchmark data set of RNA structures of ≤80 nucleotides. We suggest ways in which the accuracy of the model can be further improved.

Significance

The functions and properties of RNA molecules are closely tied to the set of structures they can fold into and their free energies. However, complex structures termed pseudoknots are not well predicted by current tools despite their prevalence. Here, we describe a method to analytically calculate the entropies of arbitrarily complex pseudoknots using only two parameters corresponding to concrete physical quantities. This approach represents an order-of-magnitude reduction in parameters compared to even the sparsest state-of-the-art tools. We employ this method alongside an exhaustive enumeration of the set of possible structures to predict the entire free energy landscape of short RNA molecules, given their sequence. Finally, we show that despite its parametric sparsity, our algorithm outperforms current state-of-the-art methods in pseudoknot prediction.

Introduction

RNA molecules play physiological roles that extend far beyond translation. In human cells, most RNA molecules are not translated (1). Noncoding RNAs interact functionally with messenger RNA (2), DNA (3), and proteins (4) and can be as large as thousands of nucleotides (nts) (5, 6). However, a substantial fraction are <40 nts in length, including microRNAs and small interfering RNAs, which serve as regulators for the translation of messenger RNA (2, 7), and piwi-interacting RNAs, which form RNA-protein complexes to regulate the germlines of mammals (8). The in vitro evolution of RNA, especially through systematic evolution of ligands by exponential enrichment (9, 10, 11), has led to an explosion of applications for short RNA molecules because of their ability to tightly and specifically bind to a remarkable range of target ligands (12).

Overwhelmingly, the properties of short noncoding RNA molecules are tied to their structures (13, 14, 15). Such structures are formed because of the energetic favorability of bonds between complementary nts (primarily A to U, C to G, and the wobble pair G to U). However, these bonds impose an entropic cost. Therefore, the conformations most frequently adopted balance the energetic gain of maximal basepairing with the entropic cost of structural constraints. In equilibrium, the RNA adopts each possible structure with Boltzmann weighted probabilities.

Because of the relevance of RNA structure to function (16, 17), current research aims to predict the minimum free energy (MFE) structures given the sequence. Algorithms typically predict “secondary structure,” a list of the basepairings (18, 19). The early Pipas-McMahon RNA structure prediction algorithm sought to completely enumerate and evaluate the free energy of all possible secondary structures, thereby constructing the entire energy landscape (20). More recent algorithms have made progress in making similar enumerations less computationally intensive (21), the most successful of which are the TT2NE algorithm and its stochastic version, McGenus (22, 23). The complete landscape enumeration approach including all secondary structures has so far been limited to short (<30 nt) RNA molecules (24, 25), and the field has instead almost entirely been dominated by dynamic programming approaches (26, 27, 28, 29, 30). Such algorithms efficiently consider an enormous number of structures without explicitly generating them by iteratively finding the optimal structure for subsequences (18).

Despite the substantial success of dynamic programming, these algorithms have difficulty predicting RNA secondary structures that include pseudoknots (i.e., structural elements with at least two non-nested basepairs) (see Fig. S1 A for an example) that make up roughly 1.4% of basepairs (18) and are overrepresented in functionally important regions of RNA (31). Pseudoknots are disallowed from the most popular RNA structure prediction algorithms (32, 33) because of computational cost; indeed, enumerating all pseudoknotted structures a given RNA molecule can fold into has been shown to be NP-complete (34, 35, 36). Significant advances have been made with heuristics, which do not guarantee finding the MFE structure (23, 37, 38, 39, 40, 41, 42, 43), and by disallowing all but a limited class of pseudoknots (44, 45, 46, 47, 48, 49, 50, 51).

A further major challenge for predicting pseudoknotted structures is the relative lack of experimental data or physical models to estimate their entropies (52, 53). An important caveat is the simple “H-type” pseudoknot for which both experimental data (54, 55, 56, 57) and physical models (37, 50, 51, 58, 59, 60) are available. However, for more complex single-molecule pseudoknots, even those which can be enumerated by current dynamic programming algorithms (47), entropy estimates have been limited to phenomenological extensions of the non-pseudoknotted and H-type pseudoknot models (43, 44, 61), and few experimental studies are available (62). A recent strategy uses machine learning of large experimental data sets (50, 63); although these approaches can be useful, they come with the disadvantages of compounding possible experimental errors and often using an enormous number of parameters, which can impact generalizability. A sketch of a theoretical description of simple pseudoknot entropies based on polymer physics was developed by Isambert and Siggia (37, 60); however, their derivations have not been published. Given the relative lack of experimental data to validate current simple phenomenological approaches on complex pseudoknots, the lack of a physical model for such structures is a pressing concern.

In this study, we develop a physical model to calculate the entropies of arbitrarily complex pseudoknots. We combine our model with complete enumeration of the secondary structure landscape, demonstrating that we can exactly solve for the probabilities of the RNA folding into each of the possible structures, including those with pseudoknots (Fig. 1). We demonstrate that this approach is feasible, not only for short RNA molecules of ∼25 nts that have been examined in previous studies (25) but even for biologically relevant RNA sequences of ∼80 nts in length.

Figure 1.

Figure 1

Schematic overview of the algorithm. Given an RNA sequence, the algorithm first enumerates all potential stems (sequences of consecutive basepairs) that can form. It then searches for all possible combinations of stems such that no nt is paired with more than one other, thus forming all possible secondary structures. For each structure, it calculates the free energy, which is comprised of a bond free energy term and a loop entropy term. In this work, we describe a polymer physics model to calculate this loop entropy term for arbitrarily complex pseudoknotted structures using only two parameters. The histogram of free energies for the sequence shown is plotted with an arrow pointing to the minimum free energy (MFE). Given the entire free energy landscape, the algorithm calculates the probability of any arbitrary secondary structure of forming in equilibrium. Finally, we coarse grain over similar structures described by the same topology, arriving at a probability distribution for every possible topology forming in equilibrium. To see this figure in color, go online.

Our approach combines a method based on the work of Isambert and Siggia with a, to our knowledge, novel graph-theoretical depiction of the RNA, allowing us to calculate the entropy of any arbitrary RNA structure. We demonstrate the generality of our formalism using the H-type and kissing hairpin pseudoknots as examples. Despite this generality, our loop entropy model uses only two parameters corresponding to experimentally derived physical quantities: the persistence length of single-stranded RNA, and the volume within which two RNA nts are considered bound. This represents an enormous parameter reduction compared to state-of-the-art algorithms; for example, the phenomenological Dirks-Pierce model has 11 parameters for the loop entropy of pseudoknots and ∼18 parameters for non-pseudoknotted loops (63).

We test our model predictions on molecules from the RNA STRAND (64), PseudoBase++ (65), and CompaRNA (66) databases and find good agreement with experimental results. We find that a significant heterogeneity in pseudoknot types exists even for sequences ≤80 nts in length, based on the polymer model representing their entropies. This heterogeneity is found to result in systematic errors of heuristic models’ estimates of the entropies of complex pseudoknots, motivating the generality of the entropic model derived here, which can correct such errors. Although we fit our entropy model only to data from non-pseudoknotted structures, we find that our model performs as well or better than previously published methods in predicting pseudoknots while performing on par with current methods in the prediction of non-pseudoknotted structures. Given the success of the model alongside its parametric sparsity, future work should build upon it to include further biological considerations neglected in the current treatment, and we give suggestions for where such improvements can be made.

Methods

Calculating free energies

The probability of the RNA sequence folding into a given equilibrium structure σ is given by the Boltzmann factor:

p(σ)=exp(βGσ)/Z, (1)

where β = 1/kBT (T is the temperature, and kB is Boltzmann’s constant), and the partition function, Z, is defined such that the probability distribution is normalized: σp(σ)=1. Here Gσ, the Gibbs free energy of structure σ, is a function of the enthalpy Hσ and entropy Sσ of the structure:

ΔGo=ΔHoTΔSo, (2)

where we drop the subscripts for notational convenience and introduce Δ to signify that free energies are measured with respect to the free chain. The superscripts implying standard conditions will be dropped from here on.

We separate the free energy calculation into two independent components: the free energy of consecutive basepairs (stems) and the free energy of loops. We make the simplifying assumption that ΔH is determined solely by the basepairs in the structure, ignoring higher order corrections, such that ΔH = ΔHstems. For the entropy, we make no such assumption, and ΔS = ΔSstems + ΔSloops, where the entropy of stems represents the entropy lost by basepaired nts, and the entropy of loops represents the entropy lost by the constraints those basepairs place on the rest of the molecule. To calculate the terms ΔHstems and ΔSstems, we consider nearest-neighbor interactions among basepairs following the Nearest Neighbor Database (67), assuming (with few exceptions tabulated in the database) independence of the free energy contributions of each stem. See further details in Supporting Materials and Methods.

Calculating loop entropies

The goal of this and the next section is to build up a theoretical framework to estimate the loop entropies of arbitrarily complex RNA pseudoknots. This calculation has a significant effect on the prediction results. In fact, the magnitude of the loop entropy is on average equal to that of the overall free energy at physiological temperatures (see Fig. S7). This is as expected intuitively; the difficulty in RNA structure prediction lies precisely in predicting the balance between the energy gain from basepair constraints and the entropy gain from unpaired nts.

Because the following calculation is somewhat involved, we will begin by clarifying explicitly the nature of the loop entropy. A free RNA chain has a large number of conformations available to it, which we will call Ω. The loop entropy is the quantification of the reduction in conformations available to the RNA molecule upon introducing constraints on the structure, such as that certain nts are paired (68).

Ω depends on the length x of the RNA, such that Ω(x1)Ω(x2) = Ω(x1 + x2); in other words, we assume (for the free chain) independence of the various subsections of the RNA. This is in principle only true in the limit x1, x2b, where b ≈ 2.4 nts is the Kuhn length of single-stranded RNA, and further neglects self-avoidance of the RNA molecule. Throughout, we will consider regions of single-stranded RNA long enough such that xb but short enough such that we assume self-avoidance has negligible probability. We discuss how to systematically consider shorter RNA loops in Supporting Materials and Methods and will make some notes regarding self-avoidance later in this section. We will also make the approximation that Ω is independent of sequence.

When a loop is formed in RNA, that loop constrains the number of conformations available to the RNA. For example, an RNA molecule that has its first nts bonded to its last only has available to it a fraction of the conformations available to the free chain—namely, all those that have the first and last nts close enough to bind. We are interested not in absolute values of the entropy S, but in ΔS, where the free chain is our reference state with ΔSloopsfree=0. The entropy of a structured RNA of length x with ωstruct conformations available to it is given by ΔSloops = kBlog(ωstruct/Ω(x)) < 0, where we have written the difference of logs as the log of the ratio. We can simplify this formula by writing ωstruct = Ω(x) × p, where p is the fraction of conformations available to the free chain that are consistent with the structure being considered. We therefore have

ΔSloops=kBlogp. (3)

It is worth reiterating that the entropy of stems themselves was already taken into account in the term ΔSstems and that ΔSloops only measures the entropy lost because of loop closures (69). To avoid overcounting the entropy lost because of the constraints placed on basepaired nts, stems do not directly contribute to ΔSloops. Therefore, a stem comprised of l basepairs (or 2l nts) should be treated—for the purposes of the ΔSloops calculation—as if it has Ω(2l) available conformations; that it in reality has far fewer has already been quantitatively accounted for in the ΔSstems term. Because of this, factors of Ω cancel out entirely in calculations of ΔSloops.

We now turn to polymer physics to quantitatively describe how loop closure constraints affect p, the fraction of configurational space available to the molecule. We model a single-stranded region comprised of x unpaired nts as a random walk of (x + 1)/b steps, whereas before b is the Kuhn length of single-stranded RNA. We denote by Ps(R)dR the probability of a random walk of length s to have end-to-end vector R:

Ps(R)=(32πsb)3/2exp(3R22sb). (4)

We have assumed sb to arrive at the Gaussian formula above through the central limit theorem. The mean of the Gaussian is zero by symmetry. To find the variance we first consider a single step of length b in three dimensions, which has variance in the iˆ, jˆ, and kˆ coordinates of b2/3 by symmetry. For a random walk of N = s/b steps, by independence of subsequent steps, the total variance is equal to Nb2/3 = sb/3, leading to Eq. 4.

Eq. 4 is accurate for non-self-avoiding random walks; self-avoiding random walks cannot be treated analytically in this way. However, for sufficiently short walks, the probability of self-interaction is low. As described in Supporting Materials and Methods, we can systematically consider higher order corrections to Eq. 4 while maintaining its Gaussian nature. Whereas the accuracy of the assumption sb does not always hold in the problems considered, we ultimately find very good agreement between results using Eq. 4 and experiment and that corrections to Eq. 4 as described in Supporting Materials and Methods, are negligible.

For a structure with n single-stranded regions of lengths si (1 ≤ in), the fraction of conformations consistent with the structure is given by the following:

p=iPsi(Ri)dRi, (5)

where Ri is the end-to-end distance vector of the ith single-stranded region, and the primed integral is taken only over those Ri consistent with the overall structure. We will describe how to address these integrals via a Feynman diagram-like approach in the next section.

To demonstrate how Eqs. 3, 4, and 5 are applied, we first consider the simple hairpin loop. We will call its entropy ΔSclosed-net-0, neglecting the subscript of “loops” from here on. The notation follows (37, 60), and the subscript references the number of stems enclosed by the loop (zero in this case; see Fig. S2 for other examples). Following Jacobson and Stockmayer (70), we allow that basepairing can occur as long as the two nts are within a small volume vs of one another. We assume that the bond length rs is small enough that for all |R|rs, Ps(R)Ps(0). Therefore, p=vsPs(0), and Eqs. 3, 4, and 5 yield

ΔSclosednet0=kB[log(vs)+32log(32πsb)]. (6)

We emphasize that within our model, this formula is applicable to hairpin loops, bulge loops, internal loops, and multiloops. We discuss in a later section how our model can be extended to break this equivalency.

We estimate vs by fitting experimental measurements of the entropy of hairpin loops of variable lengths to Eq. 6. Although Eq. 6 implies that the entropy of a hairpin should increase monotonically as a function of its length, the experimental measurements are nonmonotonic, and their nonmonotonicity exceeds the error bars (71). This nonmonotonicity may be due to enthalpic effects (72), which were neglected in our analysis following (30). Nevertheless, Fig. 2 shows that Eq. 6 gives a reasonable fit to the experimental data with vs = 0.0201 ± 0.0036 nts3. A more precise definition of vs might include a dependence on the closing basepairs of the hairpin loop; we expect that the penalties placed on specific closing basepairs and first mismatches in (30, 71) play a similar role, though such penalties were not included here. If one ignores all angular dependences of bond formation, our estimate of vs leads to a naive underestimate of the length of a hydrogen bond of 0.56 Å, which nonetheless is well within an order of magnitude of the true length of hydrogen bonds.

Figure 2.

Figure 2

vs estimated from experimental data. Experimental estimates for the free energy of hairpin loops of length s from Table 1 of (71) were converted to entropy estimates (blue points and error bars) by assuming ΔH = 0 as in (30). These data were fit to Eq. 6, yielding an estimate of vs = 0.0201 ± 0.0036 nts3. To see this figure in color, go online.

Because we find b using previous experimental results and fit vs based on data from non-pseudoknotted structures, our model is in truth a zero-parameter model when it comes to pseudoknots. No data from pseudoknotted structures were used to fit our model.

Pseudoknot loop entropies: RNA Feynman diagrams

Our goal in this section is to find Eq. 5 for arbitrary pseudoknots. In Eq. 5, the Ps(R) terms are given by the single-stranded segments, whereas stems appear through the constraint on the integral. The persistence length of double-stranded RNA is extremely long (∼200 nts (73)) compared to both single-stranded RNA and the length of any stem we will actually consider. Therefore, we will model stems as rigid rods with a fixed end-to-end distance given by the length of the stem. In other words, a stem in which nts i through i + k are bound to j through jk constrains nts i and i + k (as well as j and jk) to be a fixed distance apart. As we will see, such constraints end up only affecting the value of the integral for pseudoknotted structures, as exemplified by Fig. 3 c.

Figure 3.

Figure 3

RNA Feynman diagrams. (a) An instance of the canonical H-type pseudoknot is shown (the first panel). Bold lines represent the RNA backbone; thin lines represent hydrogen bonds. The loop entropy of this structure can be calculated by first assuming sequence independence of the loop entropy (second panel) and then converting the structure to a graph (third panel). The nodes of the graph represent the first and last basepairs of each stem, and two types of edges represent single- and double-stranded RNA. The graph directly represents the integral in Eq. 7, reprinted in the fourth panel. The nodes are integrated over three-dimensional space, subject to constraints specified by the rigid double-stranded edges (blue), which correspond to delta functions. The integrand is given by the flexible single-stranded edges (red), which correspond to a Ps(R) term. (b) The intramolecular kissing hairpin pseudoknot (first panel) is converted to a graph (second panel), representing the integrals necessary to compute its loop entropy (third panel). Although for this structure, the integrals are in general not analytically solvable, we numerically solve them (see Supporting Materials and Methods) as well as solve them analytically for the case s1/s3 = s2/s4 (Eq. 8). (c) The process of calculating the loop entropy of an RNA structure by converting it to a graph representing the entropy in integral form can be applied to any arbitrary structure. Separable integrals are represented by graphs which can be disconnected by the removal of any one edge. Thus, once appropriate factors of vs are included (one for each stem in the original structure), the loop entropy of the example structure in question is simple to calculate and is given by four closed-nets-0 (originating from the three hairpins and multiloop). The four closed-net-0 loops contribute multiplicatively to the exponential of the loop entropy, meaning additively to the loop entropy itself. For non-pseudoknotted structures, all double-stranded edges (blue) can be removed in this way. To see this figure in color, go online.

To calculate the entropy of a pseudoknot of arbitrary complexity, we invent a, to our knowledge, novel graph formulation inspired by Feynman diagrams from quantum field theory. We build on previous work by Rivas and Eddy (44) and later by Orland and Zee (74) who developed innovative graphical decomposition methods for RNA structures for the purposes of pseudoknot enumeration; here, we use a related diagrammatic approach for the entropy calculation instead. First, the RNA structure being considered is translated into a graph. Nodes are used to represent the two end points of a stem, and two types of edges represent single- and double-stranded RNA.

Defined in this way, the graph of the RNA structure directly represents the integrals necessary to compute its entropy. The positions of the nodes, ri, are integrated over all of space, while the constraints of the structure are included in the integrand: a double-stranded edge of length l between nodes i and j leads to a term vsδ(|rirj|l)/4πl2 (because of the rigid rod approximation of the stem), and a single-stranded edge of length s between these nodes leads to a term Ps(rirj) in the integrand (as in Eq. 5). Note that two bonded nts in isolation are considered a stem of length l → 0.

As a concrete example, we consider the canonical H-type pseudoknot, an instance of which is shown in Fig. 3 a (first panel). The loop entropy is sequence independent (second panel) and can be calculated by translating the structure into a graph (third panel) in which each node represents the edge of a stem, blue edges represent regions of double-stranded RNA of length li, and red edges represent regions of single-stranded RNA of length si. For the example in Fig. 3 a, s3 = 6 nts, and l1 = 3 nts. We set the origin of our coordinate system to node 0 and call the distance vector between node i and the origin ri. Integrating over the possible placements of nodes 1–3 (while including the constraints of the structure in the integrand as described previously) we obtain the following Gaussian integral formulation of the entropy:

eΔSHtype/kB=vs2dr1dr2dr3 δ(|r1|l1)4πl12δ(|r3r2|l2)4πl22Ps1(r3r1)Ps2(r2r1)Ps3(r2), (7)

where using the assumption sb, we allow the integrals to extend over all of space. A more comprehensive derivation of this formula, including the origin of the vs terms, can be found in Supporting Materials and Methods. This integral can be calculated analytically (Supporting Materials and Methods; (37)).

A complex pseudoknot involved in biological processes ranging from viral replication to antisense regulation is the intramolecular kissing hairpin pseudoknot (Fig. 3 b; (43, 75, 76, 77, 78, 79)). Despite its biological prevalence, its entropy cannot be estimated using existing formalisms, necessitating the use of simple heuristic energy models (43). Our formalism on the other hand can readily address this pseudoknot by translating the structure to integrals as in Eq. 7. Although the integrals representing the entropy of the kissing hairpin are not in general analytically solvable, they are for the special case of s1/s3 = s2/s4. Rescaling the s to be s/γ with γ = 3/2b, we define the variables sc = s5(s1 + s2)(s3 + s4) and sd = s1s2(s3 + s4) + s3s4(s1 + s2) along with 

sA=s5(sc+sd)sc;sB=s3s4(s1+s2)2sd;sv=sc+sd

to arrive at

eΔSKH/kB=(vs/sv)32π9/2sAl1l3e(l12+l32sA)l22sBsinh(2l1l3sA), (8)

where sinh is the hyperbolic sin function.

The complete derivation of Eq. 8, along with a derivation of the numerically solvable general case, can be found in Supporting Materials and Methods. We have also provided an eight-dimensional table of the results of the numerical integration for different combinations of the s and l as Supporting Materials and Methods (Table S2).

We note that the intermolecular kissing hairpin complex, for which physical models have previously been developed (80), is simpler than the intramolecular structure in the context of our formalism, and its entropy calculation is shown in Fig. S2.

Our Feynman diagram-like graphical formalism allows intuitive manipulation of the integrals. Graphs that can be disconnected by the removal of any one edge correspond to separable integrals and thus to distinct motifs in the RNA structure. The decomposition of a structure into its component graphs is depicted in Fig. 3 c for a classical cloverleaf RNA (a second example, this one of a pseudoknotted RNA, is provided in Supporting Materials and Methods, Section 10). The RNA in question decomposes into four instances of closed-net-0 (originating from the three hairpins and multiloop) and one instance of an open-net-0, or free chain (which by definition does not affect the entropy). For non-pseudoknotted structures, once appropriate factors of vs are included in the integrals (one for each double-stranded edge of the graph), all double-stranded edges can be removed through this graphical decomposition process. As shown in the figure, nodes that can be removed without changing the topology can be removed in the graph decomposition process. This is made possible by the property of Ps(r) that Px(r1)Py(r2r1)dr1=Px+y(r2) (see Supporting Materials and Methods for further discussion).

In Fig. S2, we display all possible graphs of up to two stems and their respective RNA structures. As in Fig. 3, single-stranded edges are displayed with red, and double-stranded are displayed with blue. For each graph, the integral formulation of its entropy is displayed in the figure alongside what it evaluates. RNA sequences, even those of length ≤80 nts, form a wide array of pseudoknots more complex than those discussed in that figure, such as H-type pseudoknots with internal loops. Heuristics for treating such pseudoknots make systematic errors that our model can correct. See Supporting Materials and Methods for further discussion.

In Supporting Materials and Methods, we provide a full sample calculation for the free energy of a pseudoknotted structure.

Comparison of methodology to other physics-based pseudoknot entropy models

Although our model is able to address arbitrarily complex pseudoknots, prior physical models have been developed to address H-type pseudoknots in particular. The parametric sparsity of the model described above necessitates a neglect of several biological considerations, which have been considered by these previous models. Here, we will discuss how the framework developed above can be modified to include several factors considered in such models. The rationale for building atop our framework is provided in the next section. We demonstrate that despite the loop entropy model’s apparent physical simplicity—it uses an order of magnitude fewer parameters than current tools while being general enough to apply to arbitrarily complex pseudoknots—it performs on par with state-of-the-art prediction software and therefore appears to succinctly capture the essential physics at play (see Results and Discussion).

An early model for the loop entropy of pseudoknots was developed by Gultyaev et al. (81). That model was based in large part on Jacobson and Stockmayer’s derivation of the loop entropy of hairpins, which is rederived (Eq. 6) and then significantly extended by our formalism. To account for excluded volume, Gultyaev et al. replaced the factor of 3/2 in Eq. 4 with 1.75 (82). Such a change does not accurately account for excluded volume for the case of pseudoknots; we therefore did not make this replacement in our own article (in an effort for self-consistency), though it can easily be made. A more systematic treatment of how to include self-avoidance for the case of complex pseudoknots is still lacking.

The first pseudoknot models such as Gultyaev’s did not consider interhelix loops for the H-type pseudoknot (i.e., they only considered those structures for which s2 = 1 in the language of Fig. 3 a). The approximation made in our own work is in fact the opposite limit—that of s2b—and our results should be most appropriate for long single-stranded regions. More precise treatment of short loops would forgo the simple ideal chain approximation of Eq. 4 in favor of the worm-like chain approximation. Although it would preclude analytic solutions of the integrals, numeric integration can easily be employed to make an effective look-up table as we demonstrated for the intramolecular kissing hairpin pseudoknot.

A similar complication is dealt with in Cao and Chen’s Vfold model, which considers bond geometries explicitly using the diamond lattice (50). Although the enumeration procedure employed on the lattice is not computationally feasible for very large or complex pseudoknots, it is expected to capture the atomistic geometries more precisely than our own continuous three-dimensional space theory. Modifications can still be made within our framework, most directly by integrating only over a specific range of angles determined by the geometry. Such geometric considerations may also affect our treatment of non-pseudoknotted structures and, in particular, our equivalent treatment of hairpin, internal, bulge, and multiloops (83).

Perhaps most importantly, our model neglects the twists of the RNA helix. These twists may play a role in the nonmonotonicity of the experimental data in Fig. 2 and are likely significant. Isambert’s KineFold model claims to effectively consider such twists by modification of the value of the double-stranded stem lengths l inputted to the pseudoknot formulae (60); however, as for the pseudoknot formulae themselves, the derivations of these modifications have not been published, and no physical basis for them was given. Finally, although we do not distinguish between the major and minor grooves of the RNA, accounting for the different grooves can explain asymmetries in physiological H-type pseudoknots (58). Aalberts and Nandagopal demonstrated that with the addition of a single experimentally measured parameter, Ps(R) can be modified to account for this factor (84, 85).

Enumerating RNA structures

In this section, we describe the process by which we exhaustively enumerate the secondary structures, including pseudoknots, into which an arbitrary given sequence can fold. This process was developed by Pipas and McMahon (20). The Pipas-McMahon algorithm first enumerates all possible secondary structures for a given sequence (sans pseudoknots) and then evaluates the free energy for each to construct the entire free energy landscape for non-pseudoknotted structures. A major shortcoming is the significant computer time required for long sequences. However, the exponential increase in computer power over the past 40 years, coupled with increased appreciation for the physiological and engineering relevance of short RNA strands, suggests revisiting this approach. This process is also employed by the TT2NE algorithm, with the caveat that rather than stems, that algorithm uses helipoints—defined as sets of stems separated by a bulge loop of size one or a 1 × 1 internal loop—as the backbone of the enumeration procedure, thus coarse graining over many similar structures (22).

We first number the nts in the RNA sequence from 1 to N from the 5′ end. We define an N × N symmetric matrix B, which describes which nts can bind to each other: Bi,j = 1 if nts i and j can bind to form a basepair (i.e., they belong to the set {(A,U)(C,G)(G,U)}) and 0 otherwise.

Next, we search for all possible stems (strings of consecutive basepairs) that could form. We define a parameter m to be the minimal allowed stem length (m ≥ 1; we set m = 1 throughout unless otherwise specified). We also impose the physical constraint that hairpins (single-stranded region connecting one end of a stem) have a minimal length of three nts. We include not only the longest possible stems that can form but all contiguous subsets of those stems (86, 87). We denote the number of stems found by Nstems.

We next define the Nstems × Nstems symmetric compatibility matrix C, where Cp,q = 1 if a structure could be made with both stems p and q ac(Cq,q=1q). We impose the constraint that each nt may be paired with, at most, one other nt by setting Cp,q = 0 if stems p and q share at least one nt.

Finally, we explicitly enumerate the remaining possible secondary structures by identifying all compatible combinations of stems. Starting from a single stem s1, we consider stems s2 where 1 ≤ s1s2Nstems and add the first stem for which Cs1,s2 = 1. Then, we repeat the process, adding the first stem s3 > s2 compatible with both s1 and s2 and so forth, continuing until we can add no more stems. We add the resulting structure, composed of M stems, to the list of possible structures, remove the last stem added (to obtain the structure composed of stems s1, s2, …, sM–1), and continue the process. This algorithm returns all possible secondary structures resulting from the primary sequence.

The algorithm described here was implemented in MATLAB (The MathWorks, Natick, MA), and all code is available on the GitHub repository (https://github.com/ofer-kimchi/RNA-FE-Landscape). The repository also includes a Python version of the code.

Once we completely enumerate the possible secondary structures, we calculate the probabilities that the RNA will fold into each of them by calculating their free energies as described in the previous sections.

Results and Discussion

We use experimentally determined structures to compare the predictions of our model with other current methods; results are shown in Fig. 4. For sequences of length 80 nts from the RNA STRAND (64), PseudoBase++ (65), and CompaRNA (66) databases (186 non-pseudoknotted structures with 58 different topologies; 235 pseudoknotted structures with 52 different topologies), which had a sequence dissimilarity ≥0.2 (using Jukes-Cantor), we measured the number of basepairs correctly predicted by our algorithm’s MFE structure compared to 14 other current algorithms. Seven of these cannot predict pseudoknots and serve as useful benchmarks for the non-pseudoknotted results (detailed methods in Supporting Materials and Methods; we have included the entire benchmark data set in Table S1). We also tested whether our algorithm’s predictions are dependent on the accuracy of our loop entropy model by setting all loop entropies to zero (dark green). The poor performance of our algorithm in this case compared to the case in which loop entropies are considered demonstrates the success of the loop entropy model.

Figure 4.

Figure 4

Summary statistics for comparison to other prediction tools. To assess the relative success of our algorithm, we compare its performance in predicting experimentally determined RNA structures to that of 14 other current prediction tools: RNAFold (33, 118), ViennaRNA (Andronescu parameters) (119), Mfold (32), CONTRAfold (120), PPfold (121), CentroidFold (122), Context Fold (123), HotKnots (Dirks-Pierce parameters), HotKnots (Rivas-Eddy parameters), HotKnots (Cao-Chen parameters) (63), ProbKnot (40), PKNOTS (44), RNAPKplex (33, 118), and iterated loop matching (ILM) (38). We measure sensitivity, PPV, the fraction of topologies predicted correctly by the MFE structure, the average per-base topology accuracy (defined in the main text), and the fraction of MFE structures containing a pseudoknot. We separate the results into sequences that experimentally form pseudoknots and those that do not. Error bars show the standard error. Despite the fact that our algorithm requires only two parameters to describe the entropy of any arbitrary secondary structure (at least an order of magnitude–and often several–fewer than the other algorithms tested against) and that the parameters were trained on non-pseudoknotted structures, our algorithm outperforms the other algorithms tested in predicting pseudoknotted structures and performs on par with them in predicting non-pseudoknotted structures. We also demonstrate that our algorithm’s success is dependent on the accuracy of our loop entropy model because setting all loop entropies to zero (dark green) leads to poor performance (see main text for further discussion). To see this figure in color, go online.

Although the entropy model presented here can give an integral expression for arbitrarily complex pseudoknots, the integral may need to be solved numerically for sufficiently complex structures. For this large-scale comparison, we disallowed pseudoknots more complex than those displayed in Fig. S2, and our algorithm therefore did not require any numerical integration. Fig. S6 demonstrates that even without this practical constraint, the complete enumeration of secondary structures including all possible pseudoknots is nonprohibitive. We similarly disallowed parallel stems, which can be stable in neutral and acidic pH conditions (88). We also set the minimal stem length for each sequence (m) to the minimal value it could take such that the total number of possible stems is less than Nstemsmax=150. These choices were all made to speed up computation time; each sequence took between several seconds and an hour to run. Details of the computation time of our algorithm can be found in Figs. S4–S6.

Although these practical constraints were chosen to speed up the computation time, they also led to errors in the algorithm’s predictions. Of the tested pseudoknots, 64 were topologically more complex than any of those presented in Fig. S2. Furthermore, 33 of the non-pseudoknotted sequences tested (and eight of the pseudoknotted) include basepairs outside of those allowed by the algorithm (AU, GC, and GU). Removing such structures from our comparison analysis leads to our algorithm performing even better compared to current tools (see Fig. S3).

Further errors were due to our choice of m, which was not optimized and was too high compared to the length of the shortest stem in the experimental structure for 58 non-pseudoknotted cases and 54 pseudoknotted cases. By changing Nstemsmax from 150 to 200, these numbers decreased to 46 for both pseudoknotted and non-pseudoknotted sequences, but the results for Nstemsmax=200 were practically identical to the results of Fig. 4 (see full results in Table S1). For Nstemsmax=200, the computation time was increased significantly (to several hours in the worst cases, though the majority of the computation time is spent on the Feynman diagram decomposition process, which has not been optimized in the current code). In addition to these sources of error, the nearest-neighbor parameters may need to be re-examined to be used most effectively with the loop entropy model presented here.

We considered the basepairs present in the experimental structure and in each algorithm’s MFE structure. Basepairs present in both were labeled as true positives (TP), those present only in the predicted algorithm were labeled as false positives (FP), and those present in the experimental structure but not the predicted MFE structure were labeled as false negatives (FN). The sensitivity (TP/TP + FN) and the positive predictive value (PPV; TP/TP + FP) of our algorithm were measured to be 0.80 and 0.75 for the non-pseudoknotted cases and 0.75 and 0.76 for the pseudoknotted cases, respectively. Our algorithm performed better than or as well as all other prediction tools tested for the prediction of pseudoknots and on par with other tools in the prediction of non-pseudoknotted sequences. The full results can be found in Table S1.

Although sensitivity and PPV are the most common metrics used to establish the success of an RNA prediction algorithm (89), we sought to develop a test that measures success on the scale of the full RNA rather than on the scale of individual basepairs. To this end, we measured how frequently each algorithm was able to correctly predict the topology of the experimentally measured structure, in which the topology of a structure is defined by its graph. We found for our algorithm that the experimental topology is within the top 1, 5, and 10 topologies at frequencies of 49, 65, and 70% for non-pseudoknotted structures, and 34, 59, and 62% for pseudoknotted, demonstrating a sharp increase between top 1 and top 5 and a plateau between top 5 and top 10.

Considering whether an algorithm correctly predicts the full topology can lead to errors arising from small variations in structure. For example, the opening of a single bond on the edge of a stem can lead to a different topology as we have defined it, if that stem includes one of the ends of the molecule. To arrive at a per-base measure of topology, we consider for each bond along the RNA backbone to which of the minimal graphs of Fig. S2 it belongs. For example, the bond between the second and third nts of Fig. 3 a belong to a stem of an open-net-2a graph. We then measure for each sequence the fraction of correct per-base topology predictions made by each algorithm’s predicted MFE structure. We find that our algorithm averages an 76% per-base topology prediction accuracy for non-pseudoknotted sequences and a 49% accuracy for pseudoknotted.

Finally, we compare how frequently each algorithm predicts an MFE structure containing a pseudoknot. Our algorithm correctly predicted 174/235 pseudoknots among the pseudoknotted cases, far more than any other algorithm tested. However, it also erroneously predicted 35/186 incorrect pseudoknots among the non-pseudoknotted cases.

For each of these metrics, the success of our algorithm is dependent on the loop entropy model. If we set all loop entropies to zero, our algorithm’s predictive power plummets (see Fig. 4, dark green). This is especially true for the prediction of non-pseudoknotted structures because removing the loop entropy term leads the algorithm to erroneously predict that 88% of these would form pseudoknots.

Our algorithm also provides the probability of folding into a pseudoknotted structure for each sequence. These data for the 421 sequences tested are presented in Fig. 5. Each data point represents a different sequence and the total probability calculated of that sequence folding into a pseudoknotted structure. For figure clarity, a lower bound of pseudoknot probability was set at 2 × 10−10.

Figure 5.

Figure 5

Probability of folding into a pseudoknot. The predicted probability of each of the 421 sequences tested folding into a pseudoknot is presented. Of these sequences, 186 were experimentally found not to form pseudoknots (blue) and 235 were found to form pseudoknots (red). Our algorithm successfully predicts pseudoknots forming in the latter category far more frequently than in the former. For figure clarity, a lower bound of pseudoknot probability was set at 2 × 10−10. To see this figure in color, go online.

The algorithm’s predictions for the six longest RNA sequences less than 89 nts in length from the PseudoBase++ database are presented in Fig. 6. We considered only those sequences whose structure was directly supported by experiments and which could be decomposed into the minimal topologies shown in Fig. S2. We display the experimental structure (green background) alongside the MFE predicted structure (light blue background) and the top six predicted topologies (out of several hundred, depending on the sequence; dark blue) in which the experimental topology is highlighted (purple). RNA secondary structure was plotted using the PseudoViewer package (90). Our results demonstrate successful predictions even for long pseudoknotted sequences, especially in terms of the predicted topology. Detailed methods are provided in Supporting Materials and Methods.

Figure 6.

Figure 6

Comparison to experiments for long sequences. Six long sequences were chosen from the PseudoBase++ database as described in the main text. The sequences are fragments derived from the following (starting from the top left and moving across): tobacco mosaic virus (124, 125, 126), Bacillus subtilis (127), tobacco mild green mosaic virus (125, 128), Bacillus subtilis (129), Giardiavirus (130), and Visna-Maedi virus (131). The experimental structures are supported by (numbering the sequences in the same order) sequence comparison (1–4,6), structure probing (1,3,5,6), mutagenesis (2,4–6), three-dimensional modeling (1), and NMR (6). We show the experimental structure (green background) and the MFE-predicted structure (light blue background) plotted using the PseudoViewer software (90). We also display the top six topologies (out of several hundred, depending on the particular sequence) and their respective predicted probabilities, with the topology corresponding to the experimental structure highlighted in purple. Overall, our results demonstrate successful predictions even for these long pseudoknotted sequences, especially in terms of the predicted topology. To see this figure in color, go online.

Conclusions

The accurate prediction of the ensemble of secondary structures explored by an RNA or DNA molecule has played a major role in shaping modern molecular biology and DNA nanotechnology over the past several decades. In this work, we showed that the modern ubiquity of extremely powerful computers can be used alongside novel polymer physics techniques to completely enumerate and solve for the free energy landscape of an RNA molecule including complex pseudoknots. This exponential time algorithm can be used to tackle even relatively long (∼80 nts) RNA sequences and, aside from the enumeration procedure (which is relatively fast compared to the free energy calculation for long sequences; see Figs. S4 and S6), is easily parallelizable.

Remarkably, the entropy model discussed in this work requires only two parameters—orders of magnitude fewer than other current algorithms—corresponding to clearly measurable physical quantities. Despite this and despite the fact that all parameters used in our model were derived using experiments on non-pseudoknotted RNA, our algorithm is more successful in predicting pseudoknotted structures than any of the other algorithms tested and on par with all predictors tested in predicting non-pseudoknotted structures on a benchmark data set of sequences of length ≤80 nts. The success of our algorithm is particularly notable given that the entropy model developed in this work can be used to address any RNA secondary structure, regardless of complexity. Given these results, we expect that more accurate entropy models can be formulated by building atop the framework presented here and have highlighted several avenues for improvement.

Although we have not done so in this work, we expect that our results can be further improved by optimizing the nearest-neighbor parameters, given the entropy model presented here.

The algorithm presented here can also be easily generalized to probe multiple interacting strands (see discussion in Supporting Materials and Methods). The sequences considered can be any combination of DNA and RNA; their identities affect the nearest-neighbor parameters of the model that have been previously tabulated (91) and to a lesser extent, the two entropy parameters (b and vs).

Our finding that the integral formulation of the entropy of arbitrary complex RNA secondary structures can be represented graphically is reminiscent of Feynman diagrams in quantum field theory. The topologies defined by these graphs can also serve as useful biological constructs to group similar RNA structures together. The depiction of RNA structure as a graph has played an important role in the prediction of RNA secondary structure (22, 74, 92, 93) as well as in the search for novel RNAs (94, 95) and the description of similarity between RNA structures (96, 97, 98, 99), which is especially useful in the study of the effects of mutations (100, 101). A common approach among these graphical depictions of RNA has been to represent loops (e.g., hairpins, internal loops, etc.) as vertices and stems as edges (94, 98, 99). However, this depiction of RNA does not always distinguish between pseudoknotted and non-pseudoknotted structures (94). Our approach has a similar coarse-graining effect of grouping similar structures as the same graph but explicitly distinguishes between different topologies of secondary structure and may therefore be useful in the contexts described previously. Although our approach is in many ways similar to the planar digraphs of (94), it is able to address the ambiguity present in those graphs, particularly with regards to parallel stems (see Fig. 2 of (94)).

We expect that the complete free energy landscape prediction described in this work will be useful in understanding the kinetics of RNA and DNA structure transitions, including the interactions of multiple strands (24, 25, 102, 103, 104, 105, 106, 107, 108). In addition to the complete energy and entropy landscapes, a complete kinetics model only needs a definition of the transition state matrix. Such a matrix can be derived from the energy and entropy landscapes directly. For example, by defining neighboring states as secondary structures differing by the opening or closing of a single basepair, the transition rate of opening a basepair is expected to be exponential in the energy difference of the two states, whereas the rate of closing a basepair is exponential in the entropy difference (24, 102, 109). Even for transitions between two non-pseudoknotted structures, pseudoknots often play a significant role in the transition pathway (108, 110, 111, 112, 113). Predicting the kinetics of structure transitions using this framework and determining whether such kinetics can be accurately predicted for RNA molecules of the lengths considered here, using only secondary structure considerations, will be a subject for future work.

Author Contributions

All the authors designed research. O.K. carried out theoretical calculations and wrote the code. O.K. and T.C. analyzed data. All the authors wrote the article.

Acknowledgments

We thank Elena Rivas, Yohai Bar Sinai, and Carl Goodrich for fruitful discussions.

This work was supported by the National Science Foundation through the Harvard Materials Research Science and Engineering Center (grant numbers DMR-1420570, DMREF grant DMR-123869) and Office of Naval Research (grant N00014-17-1-3029). This research was conducted with Government support under and awarded by Department of Defense, Air Force Office of Scientific Research, National Defense Science and Engineering Graduate Fellowship 32 CFR 168a (O.K.). M.P.B. is an investigator of the Simons Foundation.

Editor: Tamar Schlick.

Footnotes

Supporting Material can be found online at https://doi.org/10.1016/j.bpj.2019.06.037.

Contributor Information

Ofer Kimchi, Email: okimchi@g.harvard.edu.

Lucy J. Colwell, Email: ljc37@cam.ac.uk.

Supporting Citations

References (114, 115, 116, 117) appear in the Supporting Material.

Supporting Material

Document S1. Supporting Materials and Methods and Figs. S1–S11
mmc1.pdf (2.4MB, pdf)
Table S1. Benchmark Dataset and Prediction Results
mmc2.xlsx (617.5KB, xlsx)
Table S2. Intramolecular Kissing Hairpin Pseudoknot Entropy
mmc3.zip (2MB, zip)
Document S2. Article plus Supporting Material
mmc4.pdf (4.2MB, pdf)

References

  • 1.Kapranov P., Cheng J., Gingeras T.R. RNA maps reveal new RNA classes and a possible function for pervasive transcription. Science. 2007;316:1484–1488. doi: 10.1126/science.1138341. [DOI] [PubMed] [Google Scholar]
  • 2.Okada Y., Muramatsu T., Tanaka T. Significant impact of miRNA-target gene networks on genetics of human complex traits. Sci. Rep. 2016;6:22223. doi: 10.1038/srep22223. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Sridhar B., Rivas-Astroza M., Zhong S. Systematic mapping of RNA-chromatin interactions in vivo. Curr. Biol. 2017;27:602–609. doi: 10.1016/j.cub.2017.01.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Butter F., Scheibe M., Mann M. Unbiased RNA-protein interaction screen by quantitative proteomics. Proc. Natl. Acad. Sci. USA. 2009;106:10626–10631. doi: 10.1073/pnas.0812099106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Seemann S.E., Sunkin S.M., Gorodkin J. Transcripts with in silico predicted RNA structure are enriched everywhere in the mouse brain. BMC Genomics. 2012;13:214. doi: 10.1186/1471-2164-13-214. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Mercer T.R., Dinger M.E., Mattick J.S. Long non-coding RNAs: insights into functions. Nat. Rev. Genet. 2009;10:155–159. doi: 10.1038/nrg2521. [DOI] [PubMed] [Google Scholar]
  • 7.McManus M.T., Sharp P.A. Gene silencing in mammals by small interfering RNAs. Nat. Rev. Genet. 2002;3:737–747. doi: 10.1038/nrg908. [DOI] [PubMed] [Google Scholar]
  • 8.Juliano C., Wang J., Lin H. Uniting germline and stem cells: the function of Piwi proteins and the piRNA pathway in diverse organisms. Annu. Rev. Genet. 2011;45:447–469. doi: 10.1146/annurev-genet-110410-132541. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Ellington A.D., Szostak J.W. In vitro selection of RNA molecules that bind specific ligands. Nature. 1990;346:818–822. doi: 10.1038/346818a0. [DOI] [PubMed] [Google Scholar]
  • 10.Tuerk C., Gold L. Systematic evolution of ligands by exponential enrichment: RNA ligands to bacteriophage T4 DNA polymerase. Science. 1990;249:505–510. doi: 10.1126/science.2200121. [DOI] [PubMed] [Google Scholar]
  • 11.Robertson D.L., Joyce G.F. Selection in vitro of an RNA enzyme that specifically cleaves single-stranded DNA. Nature. 1990;344:467–468. doi: 10.1038/344467a0. [DOI] [PubMed] [Google Scholar]
  • 12.Olea C., Joyce G.F. Real-time detection of a self-replicating RNA Enzyme. Molecules. 2016;21:1–12. doi: 10.3390/molecules21101310. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Nowakowski J., Tinoco I. RNA structure and stability. Semin. Virol. 1997;8:153–165. [Google Scholar]
  • 14.Batey R.T., Rambo R.P., Doudna J.A. Tertiary motifs in RNA structure and folding. Angew. Chem. Int.Engl. 1999;38:2326–2343. doi: 10.1002/(sici)1521-3773(19990816)38:16<2326::aid-anie2326>3.0.co;2-3. [DOI] [PubMed] [Google Scholar]
  • 15.Montange R.K., Batey R.T. Riboswitches: emerging themes in RNA structure and function. Annu. Rev. Biophys. 2008;37:117–133. doi: 10.1146/annurev.biophys.37.032807.130000. [DOI] [PubMed] [Google Scholar]
  • 16.Ilyinskii P.O., Schmidt T., Shneider A.M. Importance of mRNA secondary structural elements for the expression of influenza virus genes. OMICS. 2009;13:421–430. doi: 10.1089/omi.2009.0036. [DOI] [PubMed] [Google Scholar]
  • 17.Poot R.A., Tsareva N.V., van Duin J. RNA folding kinetics regulates translation of phage MS2 maturation gene. Proc. Natl. Acad. Sci. USA. 1997;94:10110–10115. doi: 10.1073/pnas.94.19.10110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Mathews D.H., Turner D.H. Prediction of RNA secondary structure by free energy minimization. Curr. Opin. Struct. Biol. 2006;16:270–278. doi: 10.1016/j.sbi.2006.05.010. [DOI] [PubMed] [Google Scholar]
  • 19.Hofacker I.L. 4. Energy-directed RNA structure prediction. In: Gorodkin J., Ruzzo W.L., editors. RNA Sequence, Structure, and Function: Computational and Bioinformatic Methods. Humana Press; 2014. pp. 71–84. [Google Scholar]
  • 20.Pipas J.M., McMahon J.E. Method for predicting RNA secondary structure. Proc. Natl. Acad. Sci. USA. 1975;72:2017–2021. doi: 10.1073/pnas.72.6.2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Bleckley S., Stone J.W., Schroeder S.J. Crumple: a method for complete enumeration of all possible pseudoknot-free RNA secondary structures. PLoS One. 2012;7:e52414. doi: 10.1371/journal.pone.0052414. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Bon M., Orland H. TT2NE: a novel algorithm to predict RNA secondary structures with pseudoknots. Nucleic Acids Res. 2011;39:e93. doi: 10.1093/nar/gkr240. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Bon M., Micheletti C., Orland H. McGenus: a Monte Carlo algorithm to predict RNA secondary structures with pseudoknots. Nucleic Acids Res. 2013;41:1895–1900. doi: 10.1093/nar/gks1204. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Zhang W., Chen S.J. RNA hairpin-folding kinetics. Proc. Natl. Acad. Sci. USA. 2002;99:1931–1936. doi: 10.1073/pnas.032443099. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Cao S., Chen S.J. Biphasic folding kinetics of RNA pseudoknots and telomerase RNA activity. J. Mol. Biol. 2007;367:909–924. doi: 10.1016/j.jmb.2007.01.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Waterman M.S. Secondary structure of single-stranded nucleic acidst. In: Rota G.C., editor. Studies in Foundations and Combinatorics, Advances in Mathematics Supplementary Studies. Volume 1. Academic Press; 1978. pp. 167–212. [Google Scholar]
  • 27.Waterman M.S., Smith T.F. Rapid dynamic programming algorithms for RNA secondary structure. Adv. Appl. Math. 1986;7:455–464. [Google Scholar]
  • 28.Nussinov R., Pieczenik G., Kleitman D.J. Algorithms for loop matchings. SIAM J. Appl. Math. 1978;35:68–82. [Google Scholar]
  • 29.Zuker M., Stiegler P. Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nucleic Acids Res. 1981;9:133–148. doi: 10.1093/nar/9.1.133. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Serra M.J., Turner D.H. Predicting thermodynamic properties of RNA. Methods Enzymol. 1995;259:242–261. doi: 10.1016/0076-6879(95)59047-1. [DOI] [PubMed] [Google Scholar]
  • 31.Hajdin C.E., Bellaousov S., Weeks K.M. Accurate SHAPE-directed RNA secondary structure modeling, including pseudoknots. Proc. Natl. Acad. Sci. USA. 2013;110:5498–5503. doi: 10.1073/pnas.1219988110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Zuker M. Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res. 2003;31:3406–3415. doi: 10.1093/nar/gkg595. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Hofacker I.L., Fontana W., Schuster P. Fast folding and comparison of RNA secondary structures. Monatsh. Chem. 1994;125:167–188. [Google Scholar]
  • 34.Lyngsø R.B., Pedersen C.N.S. Proceedings of the fourth annual international Conference on Computational Molecular Biology. ACM; 2000. Pseudoknots in RNA secondary structures; pp. 201–209. [Google Scholar]
  • 35.Lyngsø R.B., Pedersen C.N. RNA pseudoknot prediction in energy-based models. J. Comput. Biol. 2000;7:409–427. doi: 10.1089/106652700750050862. [DOI] [PubMed] [Google Scholar]
  • 36.Liu B., Mathews D.H., Turner D.H. RNA pseudoknots: folding and finding. F1000 Biol. Rep. 2010;2:8. doi: 10.3410/B2-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Isambert H., Siggia E.D. Modeling RNA folding paths with pseudoknots: application to hepatitis delta virus ribozyme. Proc. Natl. Acad. Sci. USA. 2000;97:6515–6520. doi: 10.1073/pnas.110533697. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Ruan J., Stormo G.D., Zhang W. An iterated loop matching approach to the prediction of RNA secondary structures with pseudoknots. Bioinformatics. 2004;20:58–66. doi: 10.1093/bioinformatics/btg373. [DOI] [PubMed] [Google Scholar]
  • 39.Ren J., Rastegari B., Hoos H.H. HotKnots: heuristic prediction of RNA secondary structures including pseudoknots. RNA. 2005;11:1494–1504. doi: 10.1261/rna.7284905. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Bellaousov S., Mathews D.H. ProbKnot: fast prediction of RNA secondary structure including pseudoknots. RNA. 2010;16:1870–1880. doi: 10.1261/rna.2125310. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Sato K., Kato Y., Asai K. IPknot: fast and accurate prediction of RNA secondary structures with pseudoknots using integer programming. Bioinformatics. 2011;27:i85–i93. doi: 10.1093/bioinformatics/btr215. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Jabbari H., Condon A., Zhao S. Novel and efficient RNA secondary structure prediction using hierarchical folding. J. Comput. Biol. 2008;15:139–163. doi: 10.1089/cmb.2007.0198. [DOI] [PubMed] [Google Scholar]
  • 43.Sperschneider J., Datta A., Wise M.J. Heuristic RNA pseudoknot prediction including intramolecular kissing hairpins. RNA. 2011;17:27–38. doi: 10.1261/rna.2394511. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Rivas E., Eddy S.R. A dynamic programming algorithm for RNA structure prediction including pseudoknots. J. Mol. Biol. 1999;285:2053–2068. doi: 10.1006/jmbi.1998.2436. [DOI] [PubMed] [Google Scholar]
  • 45.Uemura Y., Hasegawa A., Yokomori T. Tree adjoining grammars for RNA structure prediction. Theor. Comput. Sci. 1999;210:277–303. [Google Scholar]
  • 46.Akutsu T. Dynamic programming algorithms for RNA secondary structure prediction with pseudoknots. Discrete Appl. Math. 2000;104:45–62. [Google Scholar]
  • 47.Condon A., Davy B., Tarrant F. Classifying RNA pseudoknotted structures. Theor. Comput. Sci. 2004;320:35–50. [Google Scholar]
  • 48.Dirks R.M., Pierce N.A. A partition function algorithm for nucleic acid secondary structure including pseudoknots. J. Comput. Chem. 2003;24:1664–1677. doi: 10.1002/jcc.10296. [DOI] [PubMed] [Google Scholar]
  • 49.Reeder J., Giegerich R. Design, implementation and evaluation of a practical pseudoknot folding algorithm based on thermodynamics. BMC Bioinformatics. 2004;5:104. doi: 10.1186/1471-2105-5-104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Cao S., Chen S.J. Predicting RNA pseudoknot folding thermodynamics. Nucleic Acids Res. 2006;34:2634–2652. doi: 10.1093/nar/gkl346. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Cao S., Chen S.J. Predicting structures and stabilities for H-type pseudoknots with interhelix loops. RNA. 2009;15:696–706. doi: 10.1261/rna.1429009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Tinoco I., Jr., Bustamante C. How RNA folds. J. Mol. Biol. 1999;293:271–281. doi: 10.1006/jmbi.1999.3001. [DOI] [PubMed] [Google Scholar]
  • 53.van Batenburg F.H., Gultyaev A.P., Oliehoek J. PseudoBase: a database with RNA pseudoknots. Nucleic Acids Res. 2000;28:201–204. doi: 10.1093/nar/28.1.201. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Wyatt J.R., Puglisi J.D., Tinoco I., Jr. RNA pseudoknots. Stability and loop size requirements. J. Mol. Biol. 1990;214:455–470. doi: 10.1016/0022-2836(90)90193-P. [DOI] [PubMed] [Google Scholar]
  • 55.Gluick T.C., Draper D.E. Thermodynamics of folding a pseudoknotted mRNA fragment. J. Mol. Biol. 1994;241:246–262. doi: 10.1006/jmbi.1994.1493. [DOI] [PubMed] [Google Scholar]
  • 56.Liu B., Shankar N., Turner D.H. Fluorescence competition assay measurements of free energy changes for RNA pseudoknots. Biochemistry. 2010;49:623–634. doi: 10.1021/bi901541j. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Qiu H., Kaluarachchi K., Giedroc D.P. Thermodynamics of folding of the RNA pseudoknot of the T4 gene 32 autoregulatory messenger RNA. Biochemistry. 1996;35:4176–4186. doi: 10.1021/bi9527348. [DOI] [PubMed] [Google Scholar]
  • 58.Aalberts D.P., Hodas N.O. Asymmetry in RNA pseudoknots: observation and theory. Nucleic Acids Res. 2005;33:2210–2214. doi: 10.1093/nar/gki508. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Lucas A., Dill K.A. Statistical mechanics of pseudoknot polymers. J. Chem. Phys. 2003;119:2414–2421. [Google Scholar]
  • 60.Xayaphoummine A., Bucher T., Isambert H. Kinefold web server for RNA/DNA folding path and structure prediction including pseudoknots and knots. Nucleic Acids Res. 2005;33:W605–W610. doi: 10.1093/nar/gki447. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Chen H.L., Condon A., Jabbari H. An O(n(5)) algorithm for MFE prediction of kissing hairpins and 4-chains in nucleic acids. J. Comput. Biol. 2009;16:803–815. doi: 10.1089/cmb.2008.0219. [DOI] [PubMed] [Google Scholar]
  • 62.Gregorian R.S., Jr., Crothers D.M. Determinants of RNA hairpin loop-loop complex stability. J. Mol. Biol. 1995;248:968–984. doi: 10.1006/jmbi.1995.0275. [DOI] [PubMed] [Google Scholar]
  • 63.Andronescu M.S., Pop C., Condon A.E. Improved free energy parameters for RNA pseudoknotted secondary structure prediction. RNA. 2010;16:26–42. doi: 10.1261/rna.1689910. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Andronescu M., Bereg V., Condon A. RNA STRAND: the RNA secondary structure and statistical analysis database. BMC Bioinformatics. 2008;9:340. doi: 10.1186/1471-2105-9-340. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Taufer M., Licon A., Leung M.Y. PseudoBase++: an extension of PseudoBase for easy searching, formatting and visualization of pseudoknots. Nucleic Acids Res. 2009;37:D127–D135. doi: 10.1093/nar/gkn806. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Puton T., Kozlowski L.P., Bujnicki J.M. CompaRNA: a server for continuous benchmarking of automated methods for RNA secondary structure prediction. Nucleic Acids Res. 2013;41:4307–4323. doi: 10.1093/nar/gkt101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Turner D.H., Mathews D.H. NNDB: the nearest neighbor parameter database for predicting stability of nucleic acid secondary structure. Nucleic Acids Res. 2010;38:D280–D282. doi: 10.1093/nar/gkp892. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Chirikjian G.S. Modeling loop entropy. Methods Enzymol. 2011;487:99–132. doi: 10.1016/B978-0-12-381270-4.00004-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Turner D.H. 8. Conformational changes. In: Bloomfield V.A., Crothers D.M., Tinoco I., editors. Nucleic Acids: Structures, Properties, and Functions. University Science Books; 2000. pp. 271–291. [Google Scholar]
  • 70.Jacobson H., Stockmayer W.H. Intramolecular reaction in polycondensations. I. The theory of linear systems. J. Chem. Phys. 1950;18:1600–1606. [Google Scholar]
  • 71.Mathews D.H., Disney M.D., Turner D.H. Incorporating chemical modification constraints into a dynamic programming algorithm for prediction of RNA secondary structure. Proc. Natl. Acad. Sci. USA. 2004;101:7287–7292. doi: 10.1073/pnas.0401799101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Lu Z.J., Turner D.H., Mathews D.H. A set of nearest neighbor parameters for predicting the enthalpy change of RNA secondary structure formation. Nucleic Acids Res. 2006;34:4912–4924. doi: 10.1093/nar/gkl472. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Abels J.A., Moreno-Herrero F., Dekker N.H. Single-molecule measurements of the persistence length of double-stranded RNA. Biophys. J. 2005;88:2737–2744. doi: 10.1529/biophysj.104.052811. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Orland H., Zee A. RNA folding and large N matrix theory. Nucl. Phys. B. 2002;620:456–476. [Google Scholar]
  • 75.Gago S., De la Peña M., Flores R. A kissing-loop interaction in a hammerhead viroid RNA critical for its in vitro folding and in vivo viability. RNA. 2005;11:1073–1083. doi: 10.1261/rna.2230605. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Chang K.Y., Tinoco I., Jr. The structure of an RNA “kissing” hairpin complex of the HIV TAR hairpin loop and its complement. J. Mol. Biol. 1997;269:52–66. doi: 10.1006/jmbi.1997.1021. [DOI] [PubMed] [Google Scholar]
  • 77.Melchers W.J., Hoenderop J.G., Galama J.M. Kissing of the two predominant hairpin loops in the coxsackie B virus 3′ untranslated region is the essential structural feature of the origin of replication required for negative-strand RNA synthesis. J. Virol. 1997;71:686–696. doi: 10.1128/jvi.71.1.686-696.1997. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Verheije M.H., Olsthoorn R.C., Meulenberg J.J. Kissing interaction between 3′ noncoding and coding sequences is essential for porcine arterivirus RNA replication. J. Virol. 2002;76:1521–1526. doi: 10.1128/JVI.76.3.1521-1526.2002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Friebe P., Boudet J., Bartenschlager R. Kissing-loop interaction in the 3′ end of the hepatitis C virus genome essential for RNA replication. J. Virol. 2005;79:380–392. doi: 10.1128/JVI.79.1.380-392.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Cao S., Chen S.J. Structure and stability of RNA/RNA kissing complex: with application to HIV dimerization initiation signal. RNA. 2011;17:2130–2143. doi: 10.1261/rna.026658.111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 81.Gultyaev A.P., van Batenburg F.H., Pleij C.W. An approximation of loop free energy values of RNA H-pseudoknots. RNA. 1999;5:609–617. doi: 10.1017/s135583829998189x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82.Fisher M.E. Effect of excluded volume on phase transitions in biopolymers. J. Chem. Phys. 1966;45:1469–1473. [Google Scholar]
  • 83.Zhang J., Lin M., Liang J. Discrete state model and accurate estimation of loop entropy of RNA secondary structures. J. Chem. Phys. 2008;128:125107. doi: 10.1063/1.2895050. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 84.Aalberts D.P., Nandagopal N. A two-length-scale polymer theory for RNA loop free energies and helix stacking. RNA. 2010;16:1350–1355. doi: 10.1261/rna.1831710. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 85.Aalberts D.P. Loop entropy assists tertiary order: loopy stabilization of stacking motifs. Entropy (Basel) 2011;13:1958–1966. doi: 10.3390/e13111958. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 86.Studnicka G.M., Rahn G.M., Salser W.A. Computer method for predicting the secondary structure of single-stranded RNA. Nucleic Acids Res. 1978;5:3365–3387. doi: 10.1093/nar/5.9.3365. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 87.Zuker M., Sankoff D. RNA secondary structures and their prediction. Bull. Math. Biol. 1984;46:591–621. [Google Scholar]
  • 88.Parvathy V.R., Bhaumik S.R., Miles H.T. NMR structure of a parallel-stranded DNA duplex at atomic resolution. Nucleic Acids Res. 2002;30:1500–1511. doi: 10.1093/nar/30.7.1500. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 89.Lu Z.J., Gloor J.W., Mathews D.H. Improved RNA secondary structure prediction by maximizing expected pair accuracy. RNA. 2009;15:1805–1813. doi: 10.1261/rna.1643609. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 90.Byun Y., Han K. PseudoViewer3: generating planar drawings of large-scale RNA structures with pseudoknots. Bioinformatics. 2009;25:1435–1437. doi: 10.1093/bioinformatics/btp252. [DOI] [PubMed] [Google Scholar]
  • 91.SantaLucia J., Jr., Hicks D. The thermodynamics of DNA structural motifs. Annu. Rev. Biophys. Biomol. Struct. 2004;33:415–440. doi: 10.1146/annurev.biophys.32.110601.141800. [DOI] [PubMed] [Google Scholar]
  • 92.Koessler D.R., Knisley D.J., Haynes T. A predictive model for secondary RNA structure using graph theory and a neural network. BMC Bioinformatics. 2010;11(Suppl 6):S21. doi: 10.1186/1471-2105-11-S6-S21. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 93.Zhao J., Malmberg R.L., Cai L. Rapid ab initio prediction of RNA pseudoknots via graph tree decomposition. J. Math. Biol. 2008;56:145–159. doi: 10.1007/s00285-007-0124-4. [DOI] [PubMed] [Google Scholar]
  • 94.Gan H.H., Pasquali S., Schlick T. Exploring the repertoire of RNA secondary motifs using graph theory; implications for RNA design. Nucleic Acids Res. 2003;31:2926–2943. doi: 10.1093/nar/gkg365. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 95.Laing C., Schlick T. Computational approaches to RNA structure prediction, analysis, and design. Curr. Opin. Struct. Biol. 2011;21:306–318. doi: 10.1016/j.sbi.2011.03.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 96.Haslinger C., Stadler P.F. RNA structures with pseudo-knots: graph-theoretical, combinatorial, and statistical properties. Bull. Math. Biol. 1999;61:437–467. doi: 10.1006/bulm.1998.0085. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 97.Bermúdez C.I., Daza E.E., Andrade E. Characterization and comparison of Escherichia coli transfer RNAs by graph theory based on secondary structure. J. Theor. Biol. 1999;197:193–205. doi: 10.1006/jtbi.1998.0866. [DOI] [PubMed] [Google Scholar]
  • 98.Benedetti G., Morosetti S. A graph-topological approach to recognition of pattern and similarity in RNA secondary structures. Biophys. Chem. 1996;59:179–184. doi: 10.1016/0301-4622(95)00119-0. [DOI] [PubMed] [Google Scholar]
  • 99.Le S.Y., Nussinov R., Maizel J.V. Tree graphs of RNA secondary structures and their comparisons. Comput. Biomed. Res. 1989;22:461–473. doi: 10.1016/0010-4809(89)90039-6. [DOI] [PubMed] [Google Scholar]
  • 100.Fontana W., Schuster P. Continuity in evolution: on the nature of transitions. Science. 1998;280:1451–1455. doi: 10.1126/science.280.5368.1451. [DOI] [PubMed] [Google Scholar]
  • 101.Ancel L.W., Fontana W. Plasticity, evolvability, and modularity in RNA. J. Exp. Zool. 2000;288:242–283. doi: 10.1002/1097-010x(20001015)288:3<242::aid-jez5>3.0.co;2-o. [DOI] [PubMed] [Google Scholar]
  • 102.Zhang W., Chen S.J. Exploring the complex folding kinetics of RNA hairpins: I. General folding kinetics analysis. Biophys. J. 2006;90:765–777. doi: 10.1529/biophysj.105.062935. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 103.Flamm C., Fontana W., Schuster P. RNA folding at elementary step resolution. RNA. 2000;6:325–338. doi: 10.1017/s1355838200992161. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 104.Flamm C., Hofacker I.L., Wolfinger M.T. Barrier trees of degenerate landscapes. Z. Phys. Chem. 2002;216:155–173. [Google Scholar]
  • 105.Thachuk C., Manuch J., Condon A. An algorithm for the energy barrier problem without pseudoknots and temporary arcs. Pac. Symp. Biocomput. 2010 doi: 10.1142/9789814295291_0013. [DOI] [PubMed] [Google Scholar]
  • 106.Dotu I., Lorenz W.A., Clote P. Computing folding pathways between RNA secondary structures. Nucleic Acids Res. 2010;38:1711–1722. doi: 10.1093/nar/gkp1054. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 107.Kucharík M., Hofacker I.L., Qin J. Basin Hopping Graph: a computational framework to characterize RNA folding landscapes. Bioinformatics. 2014;30:2009–2017. doi: 10.1093/bioinformatics/btu156. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 108.Kucharík M., Hofacker I.L., Qin J. Pseudoknots in RNA folding landscapes. Bioinformatics. 2016;32:187–194. doi: 10.1093/bioinformatics/btv572. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 109.Zhao P., Zhang W.B., Chen S.J. Predicting secondary structural folding kinetics for nucleic acids. Biophys. J. 2010;98:1617–1625. doi: 10.1016/j.bpj.2009.12.4319. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 110.Xu X., Chen S.J. Kinetic mechanism of conformational switch between bistable RNA hairpins. J. Am. Chem. Soc. 2012;134:12499–12507. doi: 10.1021/ja3013819. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 111.Isambert H. The jerky and knotty dynamics of RNA. Methods. 2009;49:189–196. doi: 10.1016/j.ymeth.2009.06.005. [DOI] [PubMed] [Google Scholar]
  • 112.Fürtig B., Wenter P., Schwalbe H. Conformational dynamics of bistable RNAs studied by time-resolved NMR spectroscopy. J. Am. Chem. Soc. 2007;129:16222–16229. doi: 10.1021/ja076739r. [DOI] [PubMed] [Google Scholar]
  • 113.Höbartner C., Ebert M.O., Micura R. RNA two-state conformation equilibria and the effect of nucleobase methylation. Angew. Chem. Int. Ed. 2002;41:605–609. [Google Scholar]
  • 114.Mammen M., Shakhnovich E.I., Whitesides G.M. Estimating the entropic cost of self-assembly of multiparticle hydrogen-bonded aggregates based on the cyanuric acid-melamine lattice. J. Org. Chem. 1998;63:3821–3830. [Google Scholar]
  • 115.Zhou H.X., Gilson M.K. Theory of free energy and entropy in noncovalent binding. Chem. Rev. 2009;109:4092–4107. doi: 10.1021/cr800551w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 116.Sugimoto N., Nakano S., Sasaki M. Thermodynamic parameters to predict stability of RNA/DNA hybrid duplexes. Biochemistry. 1995;34:11211–11216. doi: 10.1021/bi00035a029. [DOI] [PubMed] [Google Scholar]
  • 117.Watkins N.E., Jr., Kennelly W.J., Santalucia J., Jr. Thermodynamic contributions of single internal rA·dA, rC·dC, rG·dG and rU·dT mismatches in RNA/DNA duplexes. Nucleic Acids Res. 2011;39:1894–1902. doi: 10.1093/nar/gkq905. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 118.Lorenz R., Bernhart S.H., Hofacker I.L. ViennaRNA package 2.0. Algorithms Mol. Biol. 2011;6:26. doi: 10.1186/1748-7188-6-26. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 119.Andronescu M., Condon A., Murphy K.P. Efficient parameter estimation for RNA secondary structure prediction. Bioinformatics. 2007;23:i19–i28. doi: 10.1093/bioinformatics/btm223. [DOI] [PubMed] [Google Scholar]
  • 120.Do C.B., Woods D.A., Batzoglou S. CONTRAfold: RNA secondary structure prediction without physics-based models. Bioinformatics. 2006;22:e90–e98. doi: 10.1093/bioinformatics/btl246. [DOI] [PubMed] [Google Scholar]
  • 121.Sükösd Z., Knudsen B., Pedersen C.N. PPfold 3.0: fast RNA secondary structure prediction using phylogeny and auxiliary data. Bioinformatics. 2012;28:2691–2692. doi: 10.1093/bioinformatics/bts488. [DOI] [PubMed] [Google Scholar]
  • 122.Sato K., Hamada M., Mituyama T. CENTROIDFOLD: a web server for RNA secondary structure prediction. Nucleic Acids Res. 2009;37:W277–W280. doi: 10.1093/nar/gkp367. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 123.Zakov S., Goldberg Y., Ziv-Ukelson M. Rich parameterization improves RNA structure prediction. J. Comput. Biol. 2011;18:1525–1542. doi: 10.1089/cmb.2011.0184. [DOI] [PubMed] [Google Scholar]
  • 124.Rietveld K., Linschooten K., Bosch L. The three-dimensional folding of the tRNA-like structure of tobacco mosaic virus RNA. A new building principle applied twice. EMBO J. 1984;3:2613–2619. doi: 10.1002/j.1460-2075.1984.tb02182.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 125.Mans R.M., Pleij C.W., Bosch L. tRNA-like structures. Structure, function and evolutionary significance. Eur. J. Biochem. 1991;201:303–324. doi: 10.1111/j.1432-1033.1991.tb16288.x. [DOI] [PubMed] [Google Scholar]
  • 126.Felden B., Florentz C., Westhof E. A central pseudoknotted three-way junction imposes tRNA-like mimicry and the orientation of three 5′ upstream pseudoknots in the 3′ terminus of tobacco mosaic virus RNA. RNA. 1996;2:201–212. [PMC free article] [PubMed] [Google Scholar]
  • 127.Soukup G.A. Core requirements for glmS ribozyme self-cleavage reveal a putative pseudoknot structure. Nucleic Acids Res. 2006;34:968–975. doi: 10.1093/nar/gkj497. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 128.García-Arenal F. Sequence and structure at the genome 3′ end of the U2-strain of tobacco mosaic virus, a histidine-accepting tobamovirus. Virology. 1988;167:201–206. doi: 10.1016/0042-6822(88)90070-0. [DOI] [PubMed] [Google Scholar]
  • 129.Wilkinson S.R., Been M.D. A pseudoknot in the 3′ non-core region of the glmS ribozyme enhances self-cleavage activity. RNA. 2005;11:1788–1794. doi: 10.1261/rna.2203605. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 130.Garlapati S., Wang C.C. Identification of an essential pseudoknot in the putative downstream internal ribosome entry site in giardiavirus transcript. RNA. 2002;8:601–611. doi: 10.1017/s135583820202071x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 131.Pennell S., Manktelow E., Brierley I. The stimulatory RNA of the Visna-Maedi retrovirus ribosomal frameshifting signal is an unusual pseudoknot with an interstem element. RNA. 2008;14:1366–1377. doi: 10.1261/rna.1042108. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Document S1. Supporting Materials and Methods and Figs. S1–S11
mmc1.pdf (2.4MB, pdf)
Table S1. Benchmark Dataset and Prediction Results
mmc2.xlsx (617.5KB, xlsx)
Table S2. Intramolecular Kissing Hairpin Pseudoknot Entropy
mmc3.zip (2MB, zip)
Document S2. Article plus Supporting Material
mmc4.pdf (4.2MB, pdf)

Articles from Biophysical Journal are provided here courtesy of The Biophysical Society

RESOURCES