Abstract
The overall structure of the transition-state and intermediate ensembles observed experimentally for dihydrofolate reductase and interleukin-1β can be obtained by using simplified models that have almost no energetic frustration. The predictive power of these models suggests that, even for these very large proteins with completely different folding mechanisms and functions, real protein sequences are sufficiently well designed, and much of the structural heterogeneity observed in the intermediates and the transition-state ensembles is determined by topological effects.
Explaining how proteins self assemble into well-defined structures is a long-standing challenge. Energy landscape theory and the funnel concept (1–7) have provided the theoretical framework necessary for improving our understanding of this problem: efficient folding sequences minimize frustration. Frustration may arise from the inability to satisfy all native interactions and from strong non-native contacts, which can create conformational traps. The difficulty of minimizing energetic frustration by sequence design, however, also depends on the choice of folding motif. Some folding motifs are easier to design than others (8, 9), suggesting the possibility that evolution not only selected sequences with sufficiently small energetic frustration but also selected more easily designable native structures. To address this difference in foldability, we have introduced the concept of “topological frustration” (10–13): even when sequences have been designed with minimal energetic frustration, variations in the degree of nativeness of contacts in the transition-state ensemble (TSE) are observed because of asymmetries imposed by the chosen final structure.
Recent theoretical and experimental evidence (14–18) suggests that proteins, especially small fast-folding (submillisecond) proteins, have sequences with a level of energetic frustration sufficiently reduced that the global characteristics of the observed heterogeneity observed in the TSE are strongly influenced by the native-state topology. We have shown (13) that the overall structure of the TSE for chymotrypsin inhibitor 2 and for the Src-Homology 3 (SH3) domain of the src tyrosine–protein kinase can be obtained with simplified models constructed by using sequences that have almost no energetic frustration (Gō-like potentials). These models drastically reduce the energetic frustration and energetic heterogeneity for native contacts, leaving the topology as the primary source of the residual frustration. Topological effects, however, go beyond affecting the structure of the TSE. The overall structure of the populated intermediate-state ensembles during the folding of proteins such as barnase, ribonuclease H, and CheY have also been determined successfully by using a similar model (13). It is interesting to note that, because they concern totally unfrustrated sequences, these models may not reproduce the precise energetics of the real proteins, such as the value of the barrier heights and the stability of the intermediates; nevertheless, they are able to determine the general structure of these ensembles. Therefore, the fact that these almost energetically unfrustrated models reproduce most of the major features of the TSE of these proteins indicates that real protein sequences are sufficiently well designed (i.e., with reduced energetic frustration) that much of the heterogeneity observed in the TSEs and intermediates has a strong topological dependence.
Do these conclusions hold to larger and slower folding proteins with a more complex folding kinetics than two-state folders such as chymotrypsin inhibitor 2 and SH3? The success obtained with barnase, ribonuclease H, and CheY intermediates already provides some encouragement: topology appears to be important in determining on-pathway folding intermediates. In this paper, this approach is extended to a pair of larger proteins: dihydrofolate reductase (DHFR) and interleukin-1β (IL-1β). The synoptic analysis of these two proteins is particularly interesting because they have a comparable size (slightly over 150 amino acids) but different native structures, folding mechanisms, and functions: DHFR is a two-domain α/β enzyme that maintains pools of tetrahydrofolate used in nucleotide metabolism, whereas IL-1β is a single-domain all-β cytokine with no catalytic activity of its own, but it elicits a biological response by binding to its receptor.
Numerical Procedures.
The energetically unfrustrated models of DHFR and IL-1β are constructed by using a Gō-like Hamiltonian (19, 20). A Gō-like potential takes into account only native interactions, and each of these interactions enters in the energy balance with the same weight. Residues in the proteins are represented as single beads centered in their C-α positions. Adjacent beads are strung together into a polymer chain by means of bond and angle interactions, whereas the geometry of the native state is encoded in the dihedral angle potential and a nonlocal bead–bead potential.
A detailed description of this energy function can be found elsewhere (13). The local (torsion) and nonlocal terms have been adjusted so that the stabilization energy residing in the tertiary contacts is approximately twice as large as the torsional contribution. This balance among the energy terms is optimal for the folding of our Gō-like protein models (4). Solvent mediation and side-chain effects are already included in these effective energy functions. Therefore, entropy changes are associated with the configurational entropy of the chain. The native contact map of a protein is derived with csu software (available at http://www.weizmann.ac.il/sgedg/csu/), based on the approach developed in ref. 21. Native contacts between pairs of residues (i, j) with j ≤ i + 4 are discarded from the native map, as any three and four subsequent residues are already interacting in the angle and dihedral terms. A contact between two residues (i, j) is considered formed if the distance between the Cαs is shorter than γ times their native distance σij. It has been shown (11) that the results do not depend strongly on the choice made for the cutoff distance γ. In this work, we used γ = 1.2.
For both protein models (DHFR and IL-1β), folding and unfolding simulations have been performed at several temperatures around the folding temperature. The results from the different simulations have been combined by using the wham algorithm (22). Several very different initial unfolded structures for the folding simulations have been selected, and they have been obtained from high-temperature unfolding simulations. To have appropriate statistics, we made sure that for every transition-state ensemble or intermediate, we have sampled about 500 uncorrelated conformations (thermally weighted). For smaller proteins such as SH3 and chymotrypsin inhibitor 2 (which have about 1/3 of the tertiary contacts of DHFR and IL-1β), we have determined that about 200 uncorrelated conformations in the transition-state ensemble are necessary to have an error on the estimates of contact probabilities (or Φ values) of ±0.05 (13).
Comparing Simulations and Experiments for DHFR and IL-1β.
Not only do DHFR and IL-1β have dissimilar native folds, but also the nature of the intermediate states populated during the folding event is remarkably different. (The 162 residues of DHFR arrange themselves in 8 β-strands and 4 α-helices, grouped together in the folded state as detailed in Fig. 2d, whereas IL-1β is 153 residues, all β protein, composed of 12 β strands packed together as shown in Fig. 4 c and d.) To explore the connection between the protein topology and the nature of the intermediates, we used an energetically minimally frustrated Cα model for these two proteins, with a potential energy function defined by considering only the native local and nonlocal interactions as being attractive (see Numerical Procedures for details). This is a very simplified potential that retains only information about the native fold; energetic frustration is almost fully removed. Notice that, although the real amino acid sequence is not included in this model, the chosen potential is like a “perfect” sequence for the target structure, without the energetic frustration of real sequences (because this potential includes attractive native tertiary contacts, it implicitly incorporates hydrophobic interactions). Therefore, this model provides the perfect computational tool to investigate how much of the structural heterogeneity observed during the folding mechanism could be inferred from the knowledge of the native structure alone, without contributions from energetic frustration.
Because early work suggests that proteins (at least small fast-folding proteins) have sufficiently reduced energetic frustration, they have a funnel-like energy landscape with a solvent-averaged potential strongly correlated with the degree of nativeness (but with some roughness because of the residual frustration). In this situation, the folding dynamics can be described as the diffusion of an ensemble of protein configurations over a low-dimensional free-energy surface, defined in terms of the reaction coordinate Q, where Q represents the fraction of the native contact formed in a conformation (Q = 0 at the fully unfolded state, and Q = 1 at the folded state) (10–13, 23). The ensemble of intermediates observed in this free-energy profile is expected to mimic the real kinetic intermediates.
Fig. 1 shows a comparison between the folding mechanism obtained from our simulations for the minimally frustrated analogue of DHFR (Fig. 1 a and c) and IL-1β (b and d). The different nature of the folding intermediates of the two proteins and their native ensembles emerging from these data is in substantial agreement with the experimental observations, with the adenine-binding domain of DHFR being folded in the main intermediate in the simulation and the central β strands of IL-1β being formed early in this single-domain protein. The absolute values of the free-energy barriers resulting from simulations may not necessarily agree with the experimental ones, because we are dealing with unfrustrated designed sequences. Thus, quantitative predictions that depend on barrier heights and stability of the intermediate ensembles (e.g., folding time, rate determining barriers, and lifetime of intermediates) are not possible for this kind of model. However, we show that topology is sufficient to detect correctly the positions of the transition and intermediate states. A more detailed description follows.
DHFR.
The folding process emerging from the dynamics of the Gō-like analogue of DHFR (as summarized in Fig. 1 a and c) is interestingly peculiar and consistent with the experimentally proposed folding mechanism (24) (see Fig. 3d). Refolding initiates by a barrierless collapse to a quasistable species (Q = 0.2), which corresponds to the formation of a burst-phase intermediate, IBP, with little stability but some protection from H-exchange across the central β sheet (25). This initial collapse is followed by production of the main intermediate IHF (highly fluorescent), which is described in the mechanism of Fig. 3d as the collection of intermediates I1−I4. I1−I4 are structurally similar to each other but differentiated experimentally by the rate at which they proceed toward the native protein. Finally, after the overcoming of a second barrier, the protein visits an ensemble of native structures with different energies. The experimentally determined folding mechanism of DHFR shows transient kinetic control in the formation of native conformers (N4 dominant). This is later overridden by thermodynamic considerations (N2 dominant) at final equilibrium (24). This latter finding is consistent with the nature of the folding ensemble determined by the simulations. As shown in Fig. 3b, a set of structures close to the native state (Q around 0.7 − 0.8) is transiently populated beside the fully folded state (Q = 1). Because the main intermediate IHF has been characterized recently by experimental studies (24, 29), we take our analysis a step further by comparing the average structure of the IHF ensemble from our simulations to the one experimentally determined. For this purpose, we compute the formation probability Qij(Q) for each native DHFR contact involving residues (i, j) at different stages of the folding process by averaging the number of times the contact occurs over the set of structures existent in a selected range of Q. As detailed in Fig. 2, the central result from this analysis is that the main intermediate IHF is characterized by a largely different degree of formation in different parts of the protein: domain 1 (i.e., interactions among strands 2–5 and helices 2–3) appears to be formed with probability greater than 0.7, whereas domain 2 (i.e., interactions among strands 6–8, helix 1 and helix 4) is almost nonexistent.
The formation of domain 1 and domain 2 during the folding event is more closely understood from Fig. 3 a and c, where the rms distance of the parts of the protein constituting each domain from the corresponding native structures is shown for a typical folding simulation. Indeed, the two domains fold in a noticeably different way: in the stable intermediate IHF, domain 1 is closer than 5 Å (rms) to that found in the native structure, whereas domain 2 is highly variable (rms distance greater than 15 Å from its native structure). Still, in agreement with hydrogen exchange studies (25), some protection is expected across domains from our simulations, and complete protection from exchange is expected only after the formation of the fully folded protein. A combination of fluorescence, CD mutagenic, and new drug-binding studies on DHFR indeed demonstrates that domain 1 is largely folded with specific tertiary contacts formed and that this collection of intermediates is obligatory in the folding route (29).
IL-1β.
Supported by some recent experiments, Heidary et al. (30) have proposed a kinetic mechanism for the folding of IL-1β that requires the presence of a well-defined on-pathway intermediate species. The structural details of these species were determined from NMR and hydrogen exchange techniques (30, 31). We have compared these experimental data with our simulations for the IL-1β Gō-like analogue (Fig. 1 b and d). The folding picture emerging from these numerical studies differs substantially from that observed for DHFR (see Fig. 1 a and c). An intermediate state is populated for Q around 0.55, followed by a rate-limiting barrier (around Q = 0.7), after which the system proceeds to the well-defined native state.
Is the theoretical intermediate similar to the one observed experimentally? By using the same procedure as for DHFR, a comparison between the average structure of the IL-1β intermediate ensemble and the one emerging from experimental studies is shown in Fig. 4. These results indicate that in the calculated intermediate, residues 40–105 (strands 4–8) are folded into a native-like topology, but interactions between strands 5 and 8 are not fully completed. Experimental results confirm that strands 6–8 are well folded in the intermediate state and that strands 4–5 are partially formed. However, results of experiments and theory differ in the region between residues 110–125, where hydrogen exchange shows early protection, and theory predicts late contact formation. This region contains four aromatic groups, Phe-112, Phe-117, Tyr-120, and Trp-121, which may be sequestered from solvent because of clustering of these residues and removal from unfavorable solvent interactions. This effect would not be fully accounted for from our model, where all native interactions are considered as energetically equivalent, and large stabilizing interactions are not differentiated. Thus, energetics may favor early formation of the structure corresponding to residues 105–125, whereas topology considerations favor the formation of strands 4–8.
Conclusions
Theoretical and experimental studies of protein folding at times appear to be at odds. Theoretical analysis of simple model systems oftentimes predicts a large number of routes to the native protein, whereas experimental work on larger systems indicates that folding proceeds through a limited number of intermediate species. Although in the eyes of some people these two descriptions are inconsistent, this is clearly not true. The large number of routes may or may not lead to the production of on-route kinetic intermediate ensembles, depending on the result of the competition between configurational entropy and the effective folding energy. In this study, we show that productive intermediate species are produced by using simplified protein models with funnel-like landscapes based on purely topological considerations, and the results are in good agreement with the available experimental data. The fact that these simplified minimally frustrated models for DHFR and IL-1β can predict the overall features of the folding intermediates and transition states experimentally measured for these two proteins, with completely different folding mechanisms and functions, supports our general picture that real proteins have a substantially reduced level of energetic frustration, and a large component of the observed heterogeneity during the folding event is determined topologically. Such observations lead us to propose that the success in designing sequences that fold to a particular shape is constrained by topological effects (32). What is more challenging are the consequences of this conclusion: do these topological constraints have to be tolerated only during the folding event, or are they actually used by biology to help function? Here we speculate only in the context of these two examples, but this question should be addressed more generally in the future.
Acknowledgments
This work was supported by the National Science Foundation (Grant no. 96–03839), the La Jolla Interfaces in Science program (sponsored by the Burroughs Wellcome Fund), and the National Institutes of Health (Grant no. GM54038). We thank Angel García for many fruitful discussions. One of us (C.C.) expresses her gratitude to Giovanni Fossati for his suggestions and helpful discussions.
Abbreviations
- TSE
transition-state ensemble
- DHFR
dihydrofolate reductase
Footnotes
This paper was submitted directly (Track II) to the PNAS office.
Article published online before print: Proc. Natl. Acad. Sci. USA, 10.1073/pnas.100547897.
Article and publication date are at www.pnas.org/cgi/doi/10.1073/pnas.100547897
References
- 1.Leopold P E, Montal M, Onuchic J N. Proc Natl Acad Sci USA. 1992;89:8721–8725. doi: 10.1073/pnas.89.18.8721. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Onuchic J N, Luthey-Schulten Z, Wolynes P G. Annu Rev Phys Chem. 1997;48:545–600. doi: 10.1146/annurev.physchem.48.1.545. [DOI] [PubMed] [Google Scholar]
- 3.Dill K A, Chan H S. Nat Struct Biol. 1997;4:10–19. doi: 10.1038/nsb0197-10. [DOI] [PubMed] [Google Scholar]
- 4.Nymeyer H, Garcìa A E, Onuchic J N. Proc Natl Acad Sci USA. 1998;95:5921–5928. doi: 10.1073/pnas.95.11.5921. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Klimov D K, Thirumalai D. Proteins Struct Funct Genet. 1996;26:411–441. doi: 10.1002/(SICI)1097-0134(199612)26:4<411::AID-PROT4>3.0.CO;2-E. [DOI] [PubMed] [Google Scholar]
- 6.Mirny L A, Abkevich V, Shakhnovich E I. Folding Des. 1996;1:103–116. doi: 10.1016/S1359-0278(96)00019-3. [DOI] [PubMed] [Google Scholar]
- 7.Shea J E, Nochomovitz Y D, Guo Z Y, Brooks C L. J Chem Phys. 1998;109:2895–2903. [Google Scholar]
- 8.Li H, Helling R, Tang C, Wingreen N. Science. 1996;273:666–669. doi: 10.1126/science.273.5275.666. [DOI] [PubMed] [Google Scholar]
- 9.Nelson E D, Onuchic J N. Proc Natl Acad Sci USA. 1998;95:10682–10686. doi: 10.1073/pnas.95.18.10682. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Nymeyer H, Socci N, Onuchic J. Proc Natl Acad Sci USA. 1999;97:634–639. doi: 10.1073/pnas.97.2.634. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Onuchic J, Nymeyer H, García A, Chahine J, Socci N. Adv Protein Chem. 2000;53:87–152. doi: 10.1016/s0065-3233(00)53003-4. [DOI] [PubMed] [Google Scholar]
- 12.Shea J, Onuchic J, Brooks C., III Proc Natl Acad Sci USA. 1999;96:12512–12517. doi: 10.1073/pnas.96.22.12512. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Clementi, C., Nymeyer, H. & Onuchic, J. (2000) J. Mol. Biol., in press. [DOI] [PubMed]
- 14.Plaxco K W, Simons K T, Baker D. J Mol Biol. 1998;277:985–994. doi: 10.1006/jmbi.1998.1645. [DOI] [PubMed] [Google Scholar]
- 15.Alm E, Baker D. Proc Natl Acad Sci USA. 1999;96:11305–11310. doi: 10.1073/pnas.96.20.11305. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Muñoz V, Eaton W A. Proc Natl Acad Sci USA. 1999;96:11311–11316. doi: 10.1073/pnas.96.20.11311. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Galzitskaya O V, Finkelstein A V. Proc Natl Acad Sci USA. 1999;96:11299–11304. doi: 10.1073/pnas.96.20.11299. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Micheletti C, Banavar J R, Maritan A, Seno F. Phys Rev Lett. 1999;82:3372–3375. [Google Scholar]
- 19.Ueda Y, Taketomi H, Gō N. Int J Peptide Res. 1975;7:445–459. [PubMed] [Google Scholar]
- 20.Ueda Y, Taketomi H, Gō N. Biopolymers. 1978;17:1531–1548. [Google Scholar]
- 21.Sobolev V, Wade R, Vriend G, Edelman M. Proteins. 1996;25:120–129. doi: 10.1002/(SICI)1097-0134(199605)25:1<120::AID-PROT10>3.0.CO;2-M. [DOI] [PubMed] [Google Scholar]
- 22.Swendsen R H. Physica A. 1993;194:53–62. [Google Scholar]
- 23.Onuchic J, Socci N, Luthey-Schulten Z, Wolynes P. Folding Des. 1996;1:441–450. doi: 10.1016/S1359-0278(96)00060-0. [DOI] [PubMed] [Google Scholar]
- 24.Jennings P A, Finn B E, Jones B E, Matthews C R. Biochemistry. 1993;32:3783–3789. doi: 10.1021/bi00065a034. [DOI] [PubMed] [Google Scholar]
- 25.Jones B, Matthews C. Protein Sci. 1995;4:167–177. doi: 10.1002/pro.5560040204. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Jones B, Jennings P, Pierre R, Matthews C. Biochemistry. 1994;33:15250–15258. doi: 10.1021/bi00255a005. [DOI] [PubMed] [Google Scholar]
- 27.Kuwajima K, Garvey E, Finn B, Matthews C, Sugai S. Biochemistry. 1991;30:7693–7703. doi: 10.1021/bi00245a005. [DOI] [PubMed] [Google Scholar]
- 28.Jones B, Beechem J, Matthews C. Biochemistry. 1995;34:1867–1877. doi: 10.1021/bi00006a007. [DOI] [PubMed] [Google Scholar]
- 29.Heidary D K, O'Neill J C, Jr, Roy M, Jennings P A. Proc Natl Acad Sci USA. 2000;97:5866–5870. doi: 10.1073/pnas.100547697. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Heidary D, Gross L, Roy M, Jennings P. Nat Struct Biol. 1997;4:725–731. doi: 10.1038/nsb0997-725. [DOI] [PubMed] [Google Scholar]
- 31.Varley P, Gronenborn A M, Christensen H, Wingfield P T, Pain R H, Clore G M. Science. 1993;260:1110–1113. doi: 10.1126/science.8493553. [DOI] [PubMed] [Google Scholar]
- 32.Plotkin, S. & Onuchic, J. (2000) Proc. Natl. Acad. Sci. USA, in press.