Abstract
Using a statistical mechanical treatment, we study RNA folding energy landscapes. We first validate the theory by showing that, for the RNA molecules we tested having only secondary structures, this treatment (i) predicts about the same native structures as the Zuker method, and (ii) qualitatively predicts the melting curve peaks and shoulders seen in experiments. We then predict thermodynamic folding intermediates. For one hairpin sequence, unfolding is a simple unzipping process. But for another sequence, unfolding is more complex. It involves multiple stable intermediates and a rezipping into a completely non-native conformation before unfolding. The principle that emerges, for which there is growing experimental support, is that although protein folding tends to involve highly cooperative two-state thermodynamic transitions, without detectable intermediates, the folding of RNA secondary structures may involve rugged landscapes, often with more complex intermediate states.
At the center of computational biology is the folding problem for proteins or RNA molecules: to predict the conformation having the global minimum energy from the monomer sequence. This problem is not yet solved. But even when it is solved, it will only give us a small fraction of the information we would like to have about biomolecule folding. We would also like to know how the folding process takes place, what are the folding routes, the folding thermodynamics and cooperativity, intermediate states, transition states, and conformational transitions. To understand these properties requires knowledge of more than just the single native conformation. It requires the full energy landscape (1–5): the free energies of all of the chain conformations as a function of the microscopic degrees of freedom of the molecule.
There are two practical reasons that it is important to know energy landscapes. First, knowledge of landscapes will be of benefit in designing faster and more robust computer methods for predicting native structures (6, 7). Second, a goal of computational biology is not just to predict native structures, per se, but to predict function. Ligand binding to proteins and RNA molecules, and catalytic mechanisms, are often more dependent on the conformations that are fluctuations away from the native structure than on the native conformation itself (8). To predict the fluctuations, we need energy landscapes.
Recent theoretical and experimental advances are beginning to go beyond native structures to shed light on full RNA folding energy landscapes (9–19). But so far, such landscapes have not yet been predictable from monomer sequences. There is one class of biomolecule conformations—RNA secondary structures—for which folding algorithms are fairly successful. A popular method for predicting the native secondary structures of RNA molecules has been developed by Zuker and others (20–25). In this paper, we describe a method for going beyond the prediction of such single points on landscapes. Our method predicts the full energy landscape for RNA secondary structures as a function of monomer sequence.
Predicting realistic energy landscapes for both proteins and RNA has been challenging because of severe computational limitations. All-atom simulations sample conformational space too sparsely (by hundreds of orders of magnitude) to characterize the full landscape. The other main approach has been lattice toy models (1–3) and other simplified treatments (26, 27), which can characterize the energy landscape completely, but only by sacrificing both atomic detail and the ability to treat realistically long sequences. Lattice models that have been used to explore complete energy landscapes are usually limited to chain lengths of ∼16–20 monomers in two dimensions. These folding landscapes so far have been mainly fictional, in the sense that they apply only to toy models and not to real monomer sequences such as lysozyme or ribonuclease.
We report here an approach, based on a polymer theory method that has recently been described in detail elsewhere (28, 29), in which we can retain a relatively high degree of realism and at the same time completely characterize the full energy landscape for RNA secondary structures having chain lengths up to at least 100–200 bases.
Overview of the Model.
The details of the model are given elsewhere (28, 29). But the basic idea is simple. RNA secondary structures involve stretches of helix, where two single-stranded chains run antiparallel to each other, separated by loops, bulges, and turns of various kinds and lengths, which we call intervening regions. Structures may involve any degree of branching. We aim to compute the partition function, which is the sum of Boltzmann factors over all possible ways and branching patterns in which the chain can be arranged into helices and intervening regions. Each Boltzmann factor accounts for the base pairing and stacking free energies for that particular configuration. The partition function is obtained by a matrix multiplication method, of the type that is used with the one-dimensional Ising model (30) or with the Zimm-Bragg theory of helix-coil transitions (31). The full chain partition function, Q(T), for a given set of intrachain contacts is a product of partition functions for short stretches of the chain (29):
where the sum is over all possible arrangements of contacts c, ωc is the count of the number of conformations having a given arrangement of contacts, and the Boltzmann factor accounts for all of the contact interactions for the given set of contacts. The central idea is that ωc is computed as a product of partition functions of component loops and helices (28):
where Ur = row(1, 1, 1, 1), Uc = col(1, 1, 1, 1), Y is a 4 × 4 matrix containing zeros and ones that insures that one stretch of chain has the right configuration of “outlet” (say, type m = 1, 2, 3, 4) to couple to the “inlet” of the next stretch of chain (say, type n = 1, 2, 3, 4). S is a 4 × 4 matrix that counts conformations having a coupling of type “nm ” (28).
These piecewise partition function matrices, S and Y, are computed in advance by lattice model enumerations. This approach does not rely on simplifying assumptions that are traditionally made about chain entropies (for example, that loop entropies are additive, or that loops are random flights that can be treated by Jacobsen-Stockmayer statistics, for example). Tests against exact enumerations show that the polymer theory captures the partition function with high accuracy (28, 29).
To use the method to predict experimental results requires a model for the energetics of base pairing interactions. We use the “Turner rules” for the enthalpies and entropies of the base stacking interactions in double helices (including noncanonical pairs) (32). We assume the stacking enthalpies and entropies are temperature-independent, and we assume pairing energies are zero when nucleotides are not stacked.
The model otherwise has only one parameter. Our method requires counts of the conformations under the many different possible constraints that could arise from all of the possible base-pairing arrangements. We obtain those counts from square lattice simulations. But the lattice model is two-dimensional and has only one degree of freedom per monomer whereas real polynucleotides are three-dimensional and have seven continuum dihedral degrees of freedom per monomer. However, Fig. 1 shows that a single multiplicative factor, μ, which is independent of chain length, converts the loop conformation counts obtained from the lattice simulations to those obtained from experimental measurements of RNA loop entropies.
Tests of the Theoretical Method.
There are some basic tests an energy landscape model should pass. First, it must at least be a good folding algorithm; it should predict the point of global minimum, the native structure, on the energy landscape. In this regard, for the RNA molecules we tested, which are given in Fig. 2, our model has about the same level of accuracy as existing methods for predicting native RNA secondary structures (33, 34).
Second, an energy landscape model must predict more than just native structures; it must capture the rest of the space of conformations and their thermodynamic weights and must be able to predict denaturation thermodynamics. Fig. 2 shows the predictions for the denaturation profiles of six small RNA molecules that have no tertiary structures for which the melting thermodynamics are known (33–35). Some of the data shown are from heat capacities, which give a more direct test of the theory, and the rest are from absorbance measurements, which are related but not identical to the heat capacity curves we have computed. We include them because there are otherwise so few direct heat capacity measurements available and because the absorbance measurements tend to reflect the same general peaks and shoulders as would be seen in heat capacity measurements. We did not attempt to model tRNAs here for two reasons. (i) tRNAs contain pseudoknots, which are not treated in the current implementation of the theory. (ii) tRNAs have modified bases, for which there are no Turner rule energy parameters.
The theoretical predictions are not good enough to be within the experimental errors of the measurements, but they do appear to give qualitatively the peaks and shoulders. We think this model is a reasonable first approximation, on the following grounds. First, nucleic acid interactions depend strongly on salt concentration, but the only energies available to us, the Turner rules, came from measurements in 1 M salt. This may account for the overstability predicted by the model in most cases. Second, all of the quantities used in this model have been obtained by independent experiments, so no nonphysical parameter is used to fit the theory to the data. As with many statistical mechanical models, the relevant quantity for determining model errors is kT, where T = 300K, so errors of 10–20 degrees in melting points are small on this scale. The qualitative features of the denaturation profiles appear to be better predicted here than from more parameterized models (36). Third, because the model is physical, it is clear how to improve it: we could include tertiary contacts if appropriate energy parameters were available, include coaxial base stacking (37, 38) and terminal unpaired nucleotide and terminal mismatch energies, and perhaps include electrostatics. The present treatment illustrates both the successes of this level of approximation and the limitations.
Energy Landscapes and the Folding Pathways.
The main purpose of a model such as this is to relate microscopic conformations to macroscopic properties, such as the melting thermodynamics. In this section, we describe the structures that are predicted for a series of equilibrium melting experiments. No experiments are yet available to test this, as far as we know. We describe the results in terms of energy landscapes. Because a full high dimensional energy landscape is neither visualizable nor illuminating, Fig. 3 shows instead a “reduced” energy landscape, the free energy F(n,nn) = −kTlnQ(n,nn,T), where Q(n,nn,T) is the partition function, the count of all of the conformations that contain n native contacts and nn non-native contacts. F(n,nn) can be treated as a projection of the full landscape on the (n,nn) plane, assuming all of the other degrees of freedom are in thermal equilibrium and hence can be averaged out. A contact is called “native” if that particular hydrogen-bonded base pair exists in the native structure, and “non-native” otherwise.
The main purpose of this particular kind of plot is to explore a question that is motivated by a longstanding issue in protein folding. The question is whether folding intermediates are “on-pathway” or “off-pathway.” This question can be asked of either the kinetics or of the thermodynamics. Here we focus on the thermodynamics. The distinction is as follows. For kinetics, we would need to define a reaction coordinate. We would then ask for the time sequence of conformations that follow a jump either to folding or unfolding conditions.
For thermodynamics, we would only need to define an order parameter, not a reaction coordinate. An order parameter is just some structural or energetic measure of the “nativeness” of the protein. An order parameter need not also include a measure of “kinetic closeness” of one conformation from another that is necessary for a reaction coordinate. We focus on a series of equilibrium experiments, each performed at a different temperature. At each temperature, we compute the equilibrium ensemble of conformations, and whatever average structural property is of interest. In the present work, we have not considered reaction coordinates or any other aspect of kinetics. An assumption is often made that the kinetics time series of conformations would closely mimic the equilibrium series (39), but we do not take up that question here.
We define an on-pathway intermediate as a stable state having few non-native contacts. In contrast, an off-pathway intermediate can have many non-native contacts, but will have few native contacts.
Landscapes provide insights into cooperativity in biomolecule folding. The shape of the landscape can be characterized by the minima, which correspond to stable intermediates, and the maxima or barriers dividing them. As T changes, the shape of the landscape changes because of the competition between enthalpy and entropy.
We find stable intermediate states for both the mutant α operon mRNA 59-nucleotide fragment shown in Figs. 2c and 3A and the Escherichia coli 23S rRNA 59-nt fragment shown in Figs. 2d and 3B. But the landscapes of the two molecules (Fig. 3 A and B) are clearly different. The intermediates of the mutant α operon mRNA fragment are on-pathway in the sense that the folding process involves the sequential formation of the native helices, as a function of temperature. No non-native helices are formed. The thermodynamic “pathway” is indicated by partial degrees of “unzipping” of the native helices from the middle (see Fig. 3A). The cooperativity can be described by a “stuck zipper” model, where the zipping gets stuck in states X and Y (defined in Fig. 3A), and there is little population of other partially zipped states. Because state X and state Y are separated by a free energy barrier, the transitions between the stuck states are “two-state,” resembling first-order phase transitions such as boiling and freezing in macroscopic systems.
On the other hand, the E. coli 23S rRNA fragment shows a much more complex transition than is predicted for the α mRNA fragment (see Fig. 3B). For example, at T = 30°C, two almost equally stable conformations (N and Z) coexist. The unfolding of N involves two routes: one set of intermediates is on-pathway and the other is off-pathway. Fig. 3B shows that heating the native RNA leads not just to a partial unzipping (the on-path intermediate), but also to a highly helical state having even more contacts than the native structure but involving weaker interactions. Off-pathway intermediates have been previously predicted in other RNA models (40), where a “molten” phase was found to exist before the main unfolding occurs.
Experiments show complex folding cooperativity in RNA tertiary structures. Mg2+ appears to stabilize tertiary structures (41, 42). Folding pathways have been found to involve multiple cooperative (two-state) or noncooperative transitions, depending on the salt concentrations (43–47).
Density of States.
Fig. 4 shows the predicted density of states g(E), which is the number of conformations at each energy level. The power of the present theoretical method is that it can give the density of states over the whole landscape, a quantity that is not readily obtainable by sampling and trajectory-based simulations, such as Monte Carlo or molecular dynamics. Even the rarest of states can be explored; Fig. 3 shows the probabilities of states over a range of more than 100 orders of magnitude in population.
The density of states reveals information about the shape of the energy landscape. The left side of Fig. 4 describes the lowest energy states. Because g(E) is small on the left, it indicates that there are very few low-energy states. The right side indicates that there are very many high energy states; these are the denatured conformations. Taken together, it implies a general funnel-shaped landscape. At a more subtle level, if the left side of the figure were to be concave upwards, it would imply that the native state would be separated by an energy “gap” from the denatured states. Such a concave shape would indicate a paucity of low energy “trap” conformations and that the native energy well is much deeper than other energy wells. But if anything, Fig. 4 is slightly concave downwards on the left side, implying many low energy non-native states, which indicates that these RNA folding energy landscapes are quite bumpy and rugged.
In this regard, the model predicts that RNA folding may be quite different than protein folding. Proteins tend to have energy gaps, and a single native state that is much more stable than alternative conformations. Here, the prediction is that RNA molecules, at least those having predominantly secondary structures, may have many low energy states and bumpy landscapes. This appears to be consistent with experiments (9, 11, 15, 16, 42) and indicates that kinetics and metastability may play a greater role in the properties of RNAs than for proteins (1–5). Perhaps evolution has not optimized RNA secondary structures to fold uniquely, to be very stable, or to fold quickly.
Higgs has also predicted multiple low energy states for some RNA sequences (48, 49). His density of states, however, is not comparable to ours, because it counts only secondary structures, rather than conformations. His observation of a Gaussian function is therefore not in conflict with our non-Gaussian density of states. Some protein models also predict non-Gaussian densities of states (50). Our g(E) shown in Fig. 4 is remarkably linear. Because 1/T = ∂S/∂E = k∂lng(E)/∂E, the slope of our density of states curve gives the temperature of the main melting transition. For example, for the mutant α operon mRNA 59-nt fragment, the slope is 0.68 mol/kcal, corresponding to a transition temperature (k slope)−1 − 273 ≃ 65 (°C).
We have described a method for predicting energy landscapes for RNA molecules. It predicts that RNA secondary structures have a wide variety of cooperative behaviors, including one-state and two-state transitions, stable intermediate states, some of which are on-path (monotonically becoming more native-like) and some of which are off-path (dominated by non-native contacts). Our approach is presently limited to the prediction of equilibrium folding for the secondary structures of RNAs and of single-stranded DNA molecules. But because it treats increasingly complex chain conformations by a systematic hierarchy of equations, the present approach may ultimately be useful also for more complex tertiary structures in proteins and RNAs.
Acknowledgments
We thank M. Brenowitz, H. S. Chan, D. E. Draper, A. Frankel, T. C. Gluick, E. Tostesen, J. R. Williamson, and S. A. Woodson for stimulating discussions and D. Heap, P. Monohon, and J. Schreurs for technical help with the figures. We thank the National Institutes of Health for support, through Grant GM34993.
Footnotes
This paper was submitted directly (Track II) to the PNAS office.
References
- 1.Dill K A, Chan H S. Nat Struct Biol. 1997;4:10–19. doi: 10.1038/nsb0197-10. [DOI] [PubMed] [Google Scholar]
- 2.Onuchic J N, Luthey-Schulten Z, Wolynes P G. Annu Rev Phys Chem. 1997;48:545–600. doi: 10.1146/annurev.physchem.48.1.545. [DOI] [PubMed] [Google Scholar]
- 3.Dill K A, Bromberg S, Yue K, Fiebig K M, Yee D P, Thomas P D, Chan H S. Protein Sci. 1995;4:561–602. doi: 10.1002/pro.5560040401. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Socci N D, Onuchic J N, Wolynes P G. Proteins. 1998;32:136–158. [PubMed] [Google Scholar]
- 5.Bryngelson J D, Wolynes P G. Proc Natl Acad Sci USA. 1987;84:7524–7528. doi: 10.1073/pnas.84.21.7524. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Foreman K W, Phillips A T, Rosen J B, Dill K A. J Comput Chem. 1999;20:1589–1594. [Google Scholar]
- 7.Klepeis J L, Floudas C A. J Chem Phys. 1999;110:7491–7512. [Google Scholar]
- 8.Repsilber D, Wiese S, Rachen M, Schroder A W, Riesner D, Steger G. RNA. 1999;5:574–584. doi: 10.1017/s1355838299982018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Pan J, Thirumalai D, Woodson S A. J Mol Bio. 1999;273:7–13. doi: 10.1006/jmbi.1997.1311. [DOI] [PubMed] [Google Scholar]
- 10.Sclavi B, Sullivan M, Chance M R, Brenowitz M, Woodson S A. Science. 1998;279:1940–1943. doi: 10.1126/science.279.5358.1940. [DOI] [PubMed] [Google Scholar]
- 11.Treiber D K, Rook M S, Zarrinkar P P, Williamson J R. Science. 1998;279:1943–1946. doi: 10.1126/science.279.5358.1943. [DOI] [PubMed] [Google Scholar]
- 12.Batey R T, Doudna J A. Nat Struct Biol. 1998;5:337–340. doi: 10.1038/nsb0598-337. [DOI] [PubMed] [Google Scholar]
- 13.Pan T, Sosnick T R. Nat Struct Biol. 1997;4:931–938. doi: 10.1038/nsb1197-931. [DOI] [PubMed] [Google Scholar]
- 14.Herschlag D. J Biol Chem. 1995;270:20871–20874. doi: 10.1074/jbc.270.36.20871. [DOI] [PubMed] [Google Scholar]
- 15.Pan J, Woodson S A. J Mol Biol. 1998;280:597–609. doi: 10.1006/jmbi.1998.1901. [DOI] [PubMed] [Google Scholar]
- 16.Rook M S, Treiber D K, Williamson J R. J Mol Biol. 1998;281:609–620. doi: 10.1006/jmbi.1998.1960. [DOI] [PubMed] [Google Scholar]
- 17.Narlikar G J, Herschlag D. Nat Struct Biol. 1996;3:701–710. doi: 10.1038/nsb0896-701. [DOI] [PubMed] [Google Scholar]
- 18.Wu M, Tinoco I., Jr Proc Natl Acad Sci USA. 1998;95:11555–11560. doi: 10.1073/pnas.95.20.11555. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Thirumalai D. Proc Natl Acad Sci USA. 1998;95:11506–11508. doi: 10.1073/pnas.95.20.11506. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Zuker M. Science. 1989;224:48–52. doi: 10.1126/science.2468181. [DOI] [PubMed] [Google Scholar]
- 21.Turner D H, Sugimoto N, Freier S M. Annu Rev Biophys Biophys Chem. 1998;17:167–192. doi: 10.1146/annurev.bb.17.060188.001123. [DOI] [PubMed] [Google Scholar]
- 22.Jaeger J A, Santa Lucia J, Jr, Tinoco I., Jr Annu Rev Biochem. 1993;62:255–287. doi: 10.1146/annurev.bi.62.070193.001351. [DOI] [PubMed] [Google Scholar]
- 23.Jaeger J A, Zuker M, Turner D H. Biochemistry. 1990;29:10147–10158. doi: 10.1021/bi00496a002. [DOI] [PubMed] [Google Scholar]
- 24.Klaff P, Mundt S M, Steger G. RNA. 1997;3:1468–1479. [PMC free article] [PubMed] [Google Scholar]
- 25.Mathews D H, Sabina J, Zuker M, Turner D H. J Mol Biol. 1999;288:911–940. doi: 10.1006/jmbi.1999.2700. [DOI] [PubMed] [Google Scholar]
- 26.Cule D, Hwa T. Phys Rev Lett. 1997;79:2375–2378. [Google Scholar]
- 27.Thirumalai D, Woodson S A. Acc Chem Res. 1996;29:433–439. [Google Scholar]
- 28.Chen S-J, Dill K A. J Chem Phys. 1995;103:5802–5813. [Google Scholar]
- 29.Chen S-J, Dill K A. J Chem Phys. 1998;109:4602–4616. [Google Scholar]
- 30.Wannier G H. Statistical Physics. New York: Wiley; 1966. [Google Scholar]
- 31.Zimm B H, Bragg J K. J Chem Phys. 1959;31:526–535. [Google Scholar]
- 32.Serra M J, Turner D H. Methods Enzymol. 1995;259:242–261. doi: 10.1016/0076-6879(95)59047-1. [DOI] [PubMed] [Google Scholar]
- 33.Gluick T C, Draper D E. J Mol Biol. 1994;241:246–262. doi: 10.1006/jmbi.1994.1493. [DOI] [PubMed] [Google Scholar]
- 34.Laing L G, Gluick T C, Draper D E. J Mol Biol. 1994;237:577–587. doi: 10.1006/jmbi.1994.1256. [DOI] [PubMed] [Google Scholar]
- 35.Matveev S V, Filimonov V V, Privalov P L. Mol Biol. 1982;16:990–1000. [PubMed] [Google Scholar]
- 36.McCaskill J S. Biopolymers. 1990;29:1105. doi: 10.1002/bip.360290621. [DOI] [PubMed] [Google Scholar]
- 37.Xia T, Santa Lucia J, Jr, Burkard M E, Kierzek R, Schroeder S J, Jiao X, Cox C, Turner D H. Biochemistry. 1998;37:14719–14735. doi: 10.1021/bi9809425. [DOI] [PubMed] [Google Scholar]
- 38.Walter A E, Turner D H, Kim J, Lyttle M H, Muller P, Mathews D H, Zuker M. Proc Natl Acad Sci USA. 1994;91:9218–9222. doi: 10.1073/pnas.91.20.9218. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Bryngelson J D, Onuchic J N, Socci N D, Wolynes P G. Proteins. 1995;21:167–195. doi: 10.1002/prot.340210302. [DOI] [PubMed] [Google Scholar]
- 40.Bundschuh R, Hwa T. Phys Rev Lett. 1999;83:1479–1482. [Google Scholar]
- 41.Cech T. In: RNA World. Gesteland R F, Atkins J F, editors. Plainview, NY: Cold Spring Harbor Lab. Press; 1993. pp. 239–269. [Google Scholar]
- 42.Tinoco I, Jr, Bustamante C. J Mol Biol. 1999;293:271–281. doi: 10.1006/jmbi.1999.3001. [DOI] [PubMed] [Google Scholar]
- 43.Gluick T C, Wills N M, Gesteland R F, Draper D E. Biochemistry. 1997;36:16173–16186. doi: 10.1021/bi971362v. [DOI] [PubMed] [Google Scholar]
- 44.Zarrinkar P P, Williamson J R. Science. 1994;265:918–924. doi: 10.1126/science.8052848. [DOI] [PubMed] [Google Scholar]
- 45.Pan J, Thirumalai D, Woodson S A. Proc Natl Acad Sci USA. 1999;96:6149–6154. doi: 10.1073/pnas.96.11.6149. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Li Y, Turner D H. Biochemistry. 1997;36:11131–11139. doi: 10.1021/bi971034v. [DOI] [PubMed] [Google Scholar]
- 47.Misra V K, Draper D E. Biopolymers. 1998;48:113–135. doi: 10.1002/(SICI)1097-0282(1998)48:2<113::AID-BIP3>3.0.CO;2-Y. [DOI] [PubMed] [Google Scholar]
- 48.Higgs P G. J Phys. 1993;3:43–59. [Google Scholar]
- 49.Higgs P G. J Chem Soc Faraday Trans. 1995;91:2531–2540. [Google Scholar]
- 50.Hao M-H, Scheraga H A. J Chem Phys. 1997;107:8089–8102. [Google Scholar]