Skip to main content
Nucleic Acids Research logoLink to Nucleic Acids Research
. 2005 Apr 14;33(7):2210–2214. doi: 10.1093/nar/gki508

Asymmetry in RNA pseudoknots: observation and theory

Daniel P Aalberts 1,*, Nathan O Hodas 1
PMCID: PMC1079967  PMID: 15831794

Abstract

RNA can fold into a topological structure called a pseudoknot, composed of non-nested double-stranded stems connected by single-stranded loops. Our examination of the PseudoBase database of pseudoknotted RNA structures reveals asymmetries in the stem and loop lengths and provocative composition differences between the loops. By taking into account differences between major and minor grooves of the RNA double helix, we explain much of the asymmetry with a simple polymer physics model and statistical mechanical theory, with only one adjustable parameter.

INTRODUCTION

Accurately predicting how biological macromolecules fold is one of the great challenges of our day because ‘structure is function’. Encoded in the primary amino acid sequence of proteins are α-helix and β-sheet secondary structures which assemble into final native folds through additional tertiary contacts. The protein folding problem is notoriously difficult because local secondary and non-local tertiary contacts both contribute significantly to the stability of the final fold. In RNA, however, because base-pairing interactions are stronger and more specific typically than tertiary contacts, it is secondary structure which most influences the final fold.

Listing which bases are paired to which other bases uniquely describes the secondary structure of RNA. Base pairs can be annotated with left and right parenthesis pairs; blocks of base pairs, with a letter. In the vast majority of cases, RNA adopts ‘nested’ secondary structures composed of consecutive helices separated by bulges or by hairpin turns, such as the AABB (((())))[[[]]] or the ABBA ((([[[[]]]]))) base-pairing patterns. Folding algorithms like mfold (1,2) or ViennaRNA (3) restrict themselves to nested structures to benefit from the algorithmic efficiency of dynamic programming. These algorithms ignore the more unusual non-nested structures of pseudoknot folds, such as the ABAB ((([[[[[)))]]]]] pattern, depicted in Figure 1.

Figure 1.

Figure 1

(a) An ABAB-pseudoknot is depicted in planar representation. The structure is composed of two double-helical stems (with s1 and s2 base pairs) and the three single-stranded loops of lengths L1, L2 and L3 nucleotides. (b) The 3D fold of the same knot is depicted. The x, y, z axes point left, out, up. Coaxial stacking interactions between stems 1 and 2 can stabilize the structure, particularly if L2 = 0. Note that loop 1 lies on the major groove side, while loop 3 lies on the minor groove side.

Pseudoknots have attracted attention as important functional structures of viruses and auto-catalytic RNAs. This class of structures is more highly constrained by non-local base pairs and exhibits particular 3D geometries.

The general pseudoknot problem has been proven to be NP-complete (4) because of non-local contacts. A number of pseudoknot algorithms have been developed recently (413), which search for only a subset of pseudoknot structures (14,15). Algorithms which lack such basic biochemical elements as GU wobble base pairs or basic polymer theory are of questionable value. Furthermore, none of these approaches are tested against the ensemble of known pseudoknots.

In the following section, we begin by describing the statistics of the pseudoknots in PseudoBase (16). To our knowledge, Ref. (16) is the only online database (http://wwwbio.leidenuniv.nl/~batenburg/pkb.html) focused on pseudoknots. The statistics illuminate key physical characteristics of pseudoknots: (i) the simplest pseudoknots are the most abundant, (ii) these pseudoknots have asymmetric loop and stem lengths and (iii) their loop compositions differ. The asymmetries in the ensemble of pseudoknots have not been characterized previously.

To self-consistently explain the source of these asymmetries, we proceed to develop a polymer physics model and statistical mechanical theory in Section 3. We argue that including the asymmetry of the major versus the minor groove is essential.

CHARACTERIZING PSEUDOBASE AND PSEUDOKNOT ASYMMETRY

PseudoBase is a gold mine of information, allowing us to dig deeply into the properties of pseudoknots. As of January 2005, there are 245 pseudoknots in PseudoBase. After removing duplicate sequences (PKB6 and 9, 25 and 26 and 29, 39 and 40 and 41, 19 and 27, 33 and 34), there are 238 unique pseudoknots. Of these, 230 (97%) are the simple ABAB-pseudoknot variety shown in Figure 1. This most common type of pseudoknot is involved in a number of essential biological processes including RNA self-splicing, translation control and viral frameshifting. It is perhaps not surprising that as the complexity of the knot increases, its likelihood of occurring decreases.

In PseudoBase, there are also six ABACBC kissing hairpin structures (PKB150, 163, 169, 171, 173 and 178), and two more exotic structures (PKB71 and 75). The pseudoknot loops occasionally contain an additional self-contained hairpin loop (e.g. ABACCB) but such nested structures do not change the degree of non-nestedness (e.g. ABAB-class), and in these cases PseudoBase only catalogs the A and B stems.

ABAB-pseudoknots are asymmetric. The distribution of stem lengths s1 and s2 are markedly different, as shown in Figure 2a. Excessively long stems are not required for pseudoknot formation; s1 peaks at 3 bp and s2 favors 5 or 6 bp.

Figure 2.

Figure 2

The statistics of ABAB-pseudoknots in PseudoBase (obs) with L2 = 0 is compared with our theory (thy). (a) Stems favor different numbers of base pairs s1 and s2. (b) Loop lengths L1 and L3 are also asymmetric.

Loop 2 is often very short (172 of the 230 unique ABAB-pseudoknots, or 75% have L2 = 0; 195 of 230, or 85% have L2 ≤ 1) resulting in favorable coaxial helix stacking interactions which stabilize the pseudoknot. The Turner rules (17) permit helix stacking for L2 ≤ 1. In Section 3 we will present a theory for the ABAB class with stacked stems.

In Figure 2b, we also see differences in the distributions of L1 and L3 sizes, including multiple peaks. These features may arise because of differences in tertiary interactions between loops and stems.

We observe striking composition biases in the loops of ABAB-pseudoknots. As Table 1 shows, loop 1 is uracil rich while loop 3 tends to be adenine rich, particularly the end of loop 3 which is across from stem 1. These observations are consistent with reports of tertiary contacts (with one to four hydrogen bonds) between loop adenines and the minor grooves of helices, known as A-minor interactions (1823). Adenine-rich loop 3 is on the minor groove side of stem 1. On the other hand, uracil-rich loop 1 is a more flexible loop (24), and interacts less with the major groove of stem 2 (see Figure 1b).

Table 1.

The overall base composition of loops 1 and 3 differs

A C G U
Loop 1 27.0 15.6 17.9 39.5
Loop 3 46.1 14.3 11.1 28.5
Loop 3 (last) 63.9 11.4 4.4 20.3
Loop 3 (first) 35.1 9.4 11.4 44.1
Stem 1 18.0 27.7 32.1 22.1
Stem 2 19.6 28.2 30.5 21.8

Loop 3 has a high percentage of adenines which makes it prone to A-minor stacking interactions with stem 1. Loop 1 has a high percentage of uracils, making it a more flexible loop and more interaction neutral. The adenines in loop 3 are strongly biased toward the 3′ end of the loop. The large fraction of uracils at the start (5′ side) of loop 3 enhances loop flexibility in the turn.

The asymmetries in the populations of stem and loop lengths have not been explained by previous pseudoknot algorithms and models (413). The algorithms in Refs (58,13) are all symmetric with respect to stem and loop lengths (i.e. transforming an ABAB-into a BABA-pseudoknot by interchanging stems 1 ↔ 2 and loops 1 ↔ 3). The phenomenological estimates of Gultyaev and co-workers (25) do provide different free energies for loops 1 and 3 but result from ad hoc assumptions rather than polymer physics.

We assert that the differences in stem and loop sizes arise primarily from major/minor groove asymmetries and use this fact to reproduce the population of pseudoknots observed in PseudoBase.

ABAB PSEUDOKNOT MODEL

The dominant contributions to the free energy of ABAB-pseudoknots are (i) base-pairing of stems and (ii) entropy of the loops. The overall free energy of the complex is then

ΔG=ΔGs1+ΔGs2TS(s1,s2,L1,L2,L3), 1

where ΔGsj is the free energy of helix j and S(s1, s2, L1, L2, L3) is the entropy of the loops.

Stems, RNA duplex

Step one is to describe the base-paired stems. The cartesian coordinates of complementary bases in double-helical A-form RNA are approximately:

rW={rcos(2πsNt),rsin(2πsNt),hs},rC={rcos(2πsNt+φ),rsin(2πsNt+φ),hs+Hoff}. 2

Here s indexes both the nucleotide on the Watson strand and its complement on the Crick strand. The coordinates of the 4′ carbon from the six double-helical RNA structures which appear in the Protein Data Bank (26) (1AL5, 1RNA, 1RRR, 1RXB, 1SDR and 433D) were incorporated in a least-squares fit to obtain values for the model parameters: the number of base pairs per helical turn Nt = 11.2 ± 0.3, the radius of the 4′ carbon r = 9.9 ± 0.2 Å, the height per stack is h = 2.7 ± 0.2 Å, the phase angle between complementary strands φ = 1.6 ± 0.1 rad = 93 ± 4° and the vertical offset between complementary strands Hoff = −4.2 ± 1.4 Å.

Consider the typical ABAB-pseudoknot, with L2 = 0 and helices 1 and 2 stacked. In this configuration loop 1 must traverse the distance from the junction between the helices to the other end of stem 2 across the major groove. This distance is

DL1=|rW(s)rC(s+s2)|,=2(1cosθs2)r2+(hs2+Hoff)2, 3

where θs2=(2πs2/Nt+φ), is the phase angle between the strands.

The other loop must traverse the distance from the junction between the helices to the other end of stem 1 across the minor groove. This distance is

DL3=|rC(s)rW(s+s1)|,=2(1cosθs1)r2+(hs1Hoff)2, 4

with θs1=(2πs1/Ntφ). The sign difference in θ and in the Hoff term arises from the major/minor groove asymmetry. In Figure 3, we show how the distances differ in the two cases.

Figure 3.

Figure 3

The distances DL1 and DL3 across the major or minor groove as a function of the number of bases s in the associated stem. The differences are due to the geometries of major- and minor grooves.

Loops

Step two is to estimate the loop entropy. In the standard Gaussian approximation, a chain of N links of length a has end-to-end separation distance between D and D + d with probability

pG(D,N)=4πD2d(32πNa2)3/2exp{3D22Na2}, 5

where a = 6.2 Å and d = 0.1 Å is our model's one free parameter. Other polymer physics models, such as the worm-like chain, self-avoiding chain (13), freely-jointed chain models, could also be used in place of Equation (5). The entropic contribution of loop 1 to Equation (1) can be obtained from Equation (5), taking D=DL1 and N = (L1 + 1) links for L1 nucleotides. To be explicit, the total entropy is S=Rln[pG(DL1,L1+1)pG(DL3,L3+1)].

ABAB probability

The probability of an ABAB-pseudoknot with lengths {s1, s2, L1, L2 = 0, L3} is the product of a degeneracy factor for the ABAB pattern and the likelihood of that pattern resulting in a pseudoknot.

The degeneracy of the ABAB pattern is 4s1+s2+L1+L3, out of all patterns 42s1+2s2+L1+L3, because of the required complementarity. For the sake of simplicity, we ignore bulge loops in stems (which occur in ∼30% of structures) at this stage.

To estimate the free energy of the stems, we compose random strings with s1 + s2 consecutive complementary base pairs bookended with mismatch pairs, then calculate their binding free energy using bindigo (27), finding:

Gstem(s1+s2)=(2.14kcal/mol)(s1+s24.88).

For the loop entropy, we use the Gaussian approximation, Equation (5), assuming the loops must traverse the distances given by Equations (3) and (4). Thus,

pABAB=exp{βGstem(s1+s2)}×pG(DL1,L1+1)pG(DL3,L3+1), 6

is the Boltzmann factor for ABAB-pseudoknots, with β−1 = RT37° = 0.62 kcal/mol.

We estimate the free energy of the optimal nested fold of an ensemble of N=2s1+2s2+L1+L3 randomly selected nucleotides using mfold (2), finding

pnest=exp{β(0.286 kcal/mol)(N17)}, 7

for the Boltzmann factor for nested folds.

Combining the degeneracy factors and the Boltzmann likelihoods, the probability of a pseudoknot is thus

pψ=14s1+s2pABABpABAB+pnest+1. 8

The 1 in the denominator includes the Boltzmann factor for an open polymer configuration (Gopen = 0).

To compare Equation (8) with the histograms of Figure 2, we simply sum the other degrees of freedom. For example, to obtain the s1 distribution, we compute

s2,L1,L3pψ, 9

and analogously for the other sub-ensembles. The agreement of theory and observation is excellent. Studying the properties of the ensemble can reveal insights into the folding problem that individual cases may not.

ABAB-pseudoknots form because of their low energy, with about three-quarters of nucleotides base paired, versus about half of bases paired in nested structures. However, because pseudoknots require many base pairs constrained to the ABAB pattern, they remain unlikely in sequence space.

CONCLUSIONS

Pseudoknots are rare compared with conventional nested secondary structures but their structure gives them biological importance. Of the pseudoknots that appear, the ABAB-type are by far the most common. The structures of these ABAB-pseudoknots are asymmetric. We have argued that this asymmetry is due to structural differences between the major and minor groove. Our simple model is consistent with the observed asymmetry of s1 and s2. The statistical mechanical theory Equation (8) provides remarkable agreement with experiment as seen in Figure 2. This suggests that PseudoBase is a representative sample of ABAB-pseudoknot characteristics in nature and that we can now compute pseudoknot abundances in aggregate. Using free energies specific to a given sequence, we can also use the Boltzmann factors to calculate the likelihood of forming a particular pseudoknot.

Models which ignore major/minor groove asymmetry will predict the same free energies for an ABAB-pseudoknot and its BABA counterpart. For example, the symmetry of the theory in Ref. (8) arises because those authors effectively take Hoff = 0 and φ = π rad = 180°, in disagreement with the actual A-form structural asymmetry. We predict that the differences between loops 1 and 3 will destabilize many of the BABA version pseudoknots due to the differences between Equations (3) and (4), the decrease in A-minor interactions and the increased rigidity of the major groove loop.

The rarity of more complicated folds makes comparisons with observed distributions infeasible. Nevertheless, other pseudoknot types like kissing hairpins can be treated with methods similar to those presented. In addition, our simple theory could be extended to permit the possibility of secondary strucure within the loops (e.g. an ABACCB structure) and to permit flexibility between the stems when L2 > 0.

Our interest in this paper has been to estimate properties of the ensemble of ABAB-pseudoknots and compare those with observed pseudoknots. To study a particular pseudoknot, values specific to its sequence should be used in place of the general Gstem and Gnest average values given.

Acknowledgments

The authors thank Jesse Dill for noticing the duplications in PseudoBase and assisting with Table 1. This work was supported by National Institutes of Health grant GM068485. Funding to pay the Open Access publication charges for this article was provided by Williams College.

Conflict of interest statement. None declared.

REFERENCES

  • 1.Zuker M., Stiegler P. Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nucleic Acids Res. 1981;9:133–148. doi: 10.1093/nar/9.1.133. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Zuker M. Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res. 2003;31:3406–3415. doi: 10.1093/nar/gkg595. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Hofacker I.L., Fontana W., Standler P.F., Bonhoeffer L.S., Tacker M., Schuster P. Fast folding and comparison of RNA secondary structures. Monatsh. Chem. 1994;125:167–188. [Google Scholar]
  • 4.Lyngsø R.B., Pedersen C.N. RNA pseudoknot prediction in energy-based models. J. Comput. Biol. 2000;7:409–427. doi: 10.1089/106652700750050862. [DOI] [PubMed] [Google Scholar]
  • 5.Rivas E., Eddy S.R. A dynamic programming algorithm for RNA structure prediction including pseudoknots. J. Mol. Biol. 1999;285:2053–2068. doi: 10.1006/jmbi.1998.2436. [DOI] [PubMed] [Google Scholar]
  • 6.Rivas E., Eddy S.R. The language of RNA: a formal grammar that includes pseudoknots. Bioinformatics. 1999;16:334–340. doi: 10.1093/bioinformatics/16.4.334. [DOI] [PubMed] [Google Scholar]
  • 7.Orland H., Zee A. RNA folding and large N matrix theory. Nucl. Phys. B. 2002;620:456–476. [Google Scholar]
  • 8.Isambert H., Siggia E.D. Modeling RNA folding paths with pseudoknots: application to hepatitis delta virus ribozyme. Proc. Natl Acad. Sci. USA. 2000;97:6515–6520. doi: 10.1073/pnas.110533697. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Akutsu T. Dynamic programming algorithms for RNA secondary structure prediction with pseudoknots. Discrete Applied Mathematics. 2000;104:45–62. [Google Scholar]
  • 10.Dirks R.M., Pierce N.A. A partition function algorithm for nucleic acid secondary structure including pseudoknots. J. Comput. Chem. 2003;24:1664–1677. doi: 10.1002/jcc.10296. [DOI] [PubMed] [Google Scholar]
  • 11.Lyngsø R.B., Pedersen C.N. Pseudoknots in RNA secondary structures. Proceedings of the Fourth Annual International Conference on Computational Molecular Biology (RECOMB '00); 8–11 April; Tokyo, Japan. 2000. pp. 201–209. [Google Scholar]
  • 12.Uemura Y., Hasegawa A., Kobayashi S., Yokomori T. Tree adjoining grammars for RNA structure prediction. Theor. Comp. Sci. 1999;210:277–303. [Google Scholar]
  • 13.Lucas A., Dill K.A. Statistical mechanics of pseudoknot polymers. J. Chem. Phys. 2003;119:2414–2421. [Google Scholar]
  • 14.Haslinger C. Predition algorithms for restricted RNA pseudoknots. 2001. Ph.D. dissertation, Universität Wien, Vienna, Austria.
  • 15.Condon A., Davy B., Rastegari B., Zhao S., Tarrant F. Classifying RNA pseudoknotted structures. Theor. Comp. Sci. 2004;320:35–50. [Google Scholar]
  • 16.van Batenburg F.H.D., Gultyaev A.P., Pleij C.W.A., Ng J., Oliehoek J. Pseudobase: a database with RNA pseudoknots. Nucleic Acids Res. 2000;28:201–204. doi: 10.1093/nar/28.1.201. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Mathews D.H., Sabina J., Zuker M., Turner D.H. Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. J. Mol. Biol. 1999;288:911–930. doi: 10.1006/jmbi.1999.2700. [DOI] [PubMed] [Google Scholar]
  • 18.Michiels P.J.A., Versleijen A.A.M., Verlaan P.W., Pleij C.W.A., Hilbers C.W., Heus H.A. Solution structure of the pseudoknot of SRV-1 RNA, involved in ribosomal frameshifting. J. Mol. Biol. 2001;310:1109–1123. doi: 10.1006/jmbi.2001.4823. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Nixon P.L., Giedroc D.P. Energetics of a strongly pH dependent RNA tertiary structure in a frameshifting pseudoknot. J. Mol. Biol. 2000;296:659–671. doi: 10.1006/jmbi.1999.3464. [DOI] [PubMed] [Google Scholar]
  • 20.Egli M., Minasov G., Su L., Rich A. Metal ions and flexibility in a viral RNA pseudoknot at atomic resolution. Proc. Natl Acad. Sci. USA. 2002;99:4302–4307. doi: 10.1073/pnas.062055599. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Battle D.J., Doudna J.A. Specificity of RNA–RNA helix recognition. Proc. Natl Acad. Sci. USA. 2002;99:11676–11681. doi: 10.1073/pnas.182221799. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Strobel S.A. Biochemical identification of A-minor motifs within RNA tertiary structure by interference analysis. Biochem. Soc. Trans. 2002;30:1126–1131. doi: 10.1042/bst0301126. [DOI] [PubMed] [Google Scholar]
  • 23.Nissen P., Ippolito J.A., Ban N., Moore P.B., Steitz T.A. RNA tertiary interactions in the large ribosomal subunit: the A-minor motif. Proc. Natl Acad. Sci. USA. 2001;98:4899–4903. doi: 10.1073/pnas.081082398. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Aalberts D.P., Parman J.M., Goddard N.L. Single-strand stacking free energy from DNA beacon kinetics. Biophys J. 2003;84:3212–3217. doi: 10.1016/S0006-3495(03)70045-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Gultyaev A.P., van Batenburg F.H.D., Pleij C.W.A. An approximation of loop free energy values of RNA H-pseudoknots. RNA. 1999;5:609–617. doi: 10.1017/s135583829998189x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Berman H.M., Westbrook J., Feng Z., Gilliland G., Bhat T.N., Weissig H., Shindyalov I.N., Bourne P.E. The Protein Data Bank. Nucleic Acids Res. 2000;28:235–242. doi: 10.1093/nar/28.1.235. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Hodas N.O., Aalberts D.P. Efficient computation of optimal oligo–RNA binding. Nucleic Acids Res. 2004;32:6636–6642. doi: 10.1093/nar/gkh1008. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press

RESOURCES