The dark mystery of protein folding has been greatly illuminated by the last decade's work. This enlightenment has come from the simultaneous flowering of new theoretical approaches (1–3), powerful computational studies (4, 5), and a fresh wave of experiments using protein engineering (6) and ultrafast kinetics (7, 8). How a disordered chain molecule finds its organized, folded state is no longer a paradox, but a scientific problem requiring a close comparison of theories, computation and experiments. A beautiful experimental study of the mechanism of folding of the B domain of streptococcal protein A appears in this issue of PNAS (9). This study joins such landmarks as the elucidation of the folding mechanism of CI2 and barnase by the same group (10). So what's the news? The excitement comes because this protein has been the poster child of computational folders. It is attractive to computationalists because of its small size, 60 residues, at the extreme limit exhibiting cooperative protein-like thermodynamics. The protein also folds fast, so computer studies have the best chance of success. Numerous simulations were carried out (11–23) without extensive laboratory kinetic studies at the residue level to influence the work. It is thus interesting to see how well the computationalists have been able to predict the folding mechanism without much biasing data.
Sato et al. (9) liken the comparison of simulations with their kinetic experiment to the biennial exercise Critical Assessment of Structure Prediction (CASP) (24). They, like most participants in CASP, refer to it as a competition, although the organizers decry that appellation. As the leader of a participating group, I can testify to CASP's painfulness, but, despite its undignified similarity to a sporting event, good science comes from CASP. Likewise, the present results do indeed provoke serious thought.
When compared with the old view of a single obligate folding pathway (25), the computational studies are in remarkable agreement with each other and with experiment. All these studies show a diffuse folding mechanism in which a fairly diverse ensemble of structures is guided to the native structure by trading stabilization free energy for configurational entropy (1). The energy landscape resembles a funnel in which more native-like structures are stabilized over kinetic traps that could arise from conflicting energetic signals (“frustration”). In a funnel landscape, although trapping is not a problem, a kinetic bottleneck arises because stabilization energy and entropy do not smoothly compensate each other as the protein descends in the funnel. The ensemble of structures at the bottleneck, called the transition state ensemble (TSE), is probed by examining the effect of changing individual amino acids on folding rates. In a funnel-like landscape, these changes can be converted to φ values that measure how much more structures in the TSE resemble the native structure or the denatured ensemble. If φ = 1 for a given residue, in the TSE that residue will be in a native-like conformation and environment, whereas if φ = 0, the residue will be configured in a range of structures like the denatured state (6). The fractional φ values attest to the ensemble nature of the TSE and semiquantitatively reflect the fraction of the structure formed at any site (26). Ideally, these φ values can be explicitly predicted and then compared point by point with experiment. Unfortunately, none of the simulations were described in such detail, and the comparison with the laboratory results must be made holistically. The resulting comparison is cruder than what is possible for structure prediction.
Globally Good Predictions
Overall, the degree of structure formation in the TSE found by the Fersht group has been well predicted. The average of the experimental φ value's probing tertiary structure is ≈0.45, whereas the φ value's probing secondary structures average to ≈0.4, suggesting that only 40% of the native helical structures are intact. Near UV CD suggests that up to 10% of the native tertiary structure may be present in the denatured state, so a maximum of 55% of the tertiary structure may be in the TSE. When the measured helical fraction of the denatured state (20%) is all native, then perhaps a maximum of 60% of the native secondary structure is found in the TSE. The experimental TSE appears to be less structured than early theoretical estimates for funnel landscapes of a generic helical protein. Making no use of specific structure or sequence information, by mapping the behavior of small lattice simulations to real proteins, Onuchic et al. (OWLS) (27) developed a “spherical cow model” suggesting that the TSE of a small helical protein would be collapsed and have 60% of the native contacts and nearly 80% of the native helicity. The earliest simulations of protein A aimed at benchmarking structure prediction (11) but do not quantitatively describe the TSE. The pioneering simulation that made use of the tools of energy landscape theory to monitor the TSE was done by Brooks and colleagues (12, 13) on an all-atom model. They concluded that in this protein 30% of the native tertiary contacts and only 50–70% of the secondary structure are formed in the TSE; both percentages are lower than the generic funnel estimates. The experimental TSE appears to be more structured at the tertiary level but a bit less structured at the secondary level. The most recent all-atom simulations of Garcia and Onuchic (14) yielded a TSE with ≈45% of the tertiary contacts made and 40% of the secondary structure formed, in close agreement with experiment. Another recent all-atom simulation yielded a TSE more like the generic OWLS prediction, with 63% of native contacts made and ≈80% of the secondary structure formed (15).
An innovative, large time-step pathsampling study (19) agrees with these studies in exhibiting concomitant tertiary and secondary structure formation, as does a simulation in which simplified, all-atom potentials tuned to fold protein A were used (20).
Several all-atom studies make use of a fundamental energy landscape limit: a perfect funnel landscape in which only native contacts are counted. One such study by Linhanata and Zhou (17) actually suggests the existence of a high free-energy intermediate ensemble with two TSEs. In the transition region, this study predicted that 50% of the tertiary contacts would be made (very close to experiment) and >60% of the secondary structure would be formed, near the upper experimental limit.
Using a minimalist model encoding hydrophobicity and hydrogen bonding, Favrin et al. (22) found a TSE with significant tertiary structure, as in the all-atom simulations. They also suggested secondary structure forms only after the molecule collapses. A minimalist model with a perfect funnel (21) concurred and also highlighted the significance of forming contacts in the hinge regions (28).
At the grossest level, most simulations agree with experiment in the existence of strong coupling of secondary and tertiary structure, both only partly formed in the TSE. Thus, for this protein, the framework hypothesis (26) that secondary structure drives folding and occurs first can be laid to rest. Its offspring, the diffusion–collision picture (24), has many adjustable parameters and is more adaptive and less easily falsified. Nevertheless, it is not easy to reconcile these results for the TSE with the diffusion–collision expectations: the experimental TSE is delocalized with both partial tertiary structure and frayed secondary structure formation and an important role for a turn. Also, the TSE has 85% of its surface area buried, not the largely free-floating chains envisioned in the diffusion–collision picture.
Devilish Details
At a finer level, the simulation TSEs fall into two classes, and only some agree with experiment. Brooks and colleagues (12, 13) and Garcia and Onuchic (14) suggest that helices H1 and H2 are mostly involved in the structures formed in the TSE, in close agreement with experimental findings. Other groups (15–23) highlight the role of helix H3, which has been found to be the most stable, in isolation, experimentally. The dominant TSE configurations in the latter simulations involved contacts between H2 and H3. Certainly, there is little reason for alarm. All of the simulations show that trajectories from both families should be sampled, as expected for a funnel landscape. A rather small error in the free energies favoring either set of routes could shift the balance. Indeed, the approximate symmetry of the molecule makes predicting this balance tricky. Experiments on the other homologous domains of protein A or different experimental conditions favoring collapse [as in the salt-induced detour (30)] could reveal these alternate routes. Theorists studying minimalist models have emphasized how mechanism may shift throughout the folding-phase diagram (16, 30, 31).
The detailed shortcomings of the simulations, however, may hold some deeper lessons. Something strange seems to be going on with helix H3. Despite its measured high stability as a fragment (32), Deisenhofer's early x-ray structure failed to reveal electron density for helix H3 when the domain was bound to its target immunoglobulin (33). NMR of the isolated domain shows helix H3 as intact, but NMR also reveals modest differences in the packing of H1 and H2 in the homologous Z domain of protein A (34, 35). These can be taken to be signs of frustration in the domain: there is a modest inconsistency of energetic biases for secondary and tertiary structure. Several of the simulations note that, in contrast to H1 and H2, the stability of H3 is not significantly increased by making native contacts. Apparently this domain did not evolve under sufficient pressure from folding theorists!
It is possible evolution “deliberately” introduced some frustration in the folding of the isolated domain. Protein A is a virulence factor: it functions by binding to many antibodies in the host. Achieving this binding may require the domain to have considerable flexibility. Biotechnologists (36), trying to make convenient fish hooks for protein purification, have created another homologue, known as Z-SPA-1, which folds upon binding to the Z domain. There is significant rearrangement at the binding interface, referred to as “induced fit,” largely involving side-chain reorientation (37).
More experimental work is needed to see whether significant frustration is actually present in the B domain. Is the isolated helix H3 as structured as we are led to believe? Is this helix as weakly stable in the intact molecule as experiments suggest? Nevertheless, I am tempted to suggest that protein A is frustrated, somewhat, because it evolved to do more than just fold, which is why the protein can still be judged to win this round.
See companion article on page 6952.
References
- 1.Bryngelson, J. D., Onuchic, J., Socci, N. & Wolynes, P. G. (1995) Proteins 21, 167-195. [DOI] [PubMed] [Google Scholar]
- 2.Plotkin, S. & Onuchic, J. N. (2002) Q. Rev. Biophys. 35, 205-286. [DOI] [PubMed] [Google Scholar]
- 3.Pande, V. S., Grosberg, A. & Tanaka, T. (2000) Rev. Mod. Phys. 72, 259-314. [Google Scholar]
- 4.Dobson, C. M., Sali, A. & Karplus, M. (1998) Angew. Chem. Int. Ed. Engl. 37, 868-893. [DOI] [PubMed] [Google Scholar]
- 5.Shea, J. E. & Brooks, C. (2001) Annu. Rev. Phys. Chem. 52, 499-535. [DOI] [PubMed] [Google Scholar]
- 6.Fersht, A. (1999) Structure and Mechanism in Protein Science: A Guide to Enzyme Catalysis and Protein Folding (Freeman & Company, New York).
- 7.Eaton, W. A., Munoz, V., Hagen, S. J., Jas, G. S., Lapidus, L. J., Henry, E. R. & Hofrichter, J. (2000) Annu. Rev. Biophys. Biomol. Struct. 29, 327-359. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Snow, C. D., Nguyen, N., Pande, V. S. & Gruebele, M. (2002) Nature 420, 102-106. [DOI] [PubMed] [Google Scholar]
- 9.Sato, S. Religa, T. L., Daggett, V. & Fersht, A. R. (2004) Proc. Natl. Acad. Sci. USA 101, 6952-6956. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Daggett, V. & Fersht, A. (2003) Nat. Rev. Mol. Cell Biol. 4, 497-502. [DOI] [PubMed] [Google Scholar]
- 11.Kolinski, A. & Skolnick, J. (1994) Proteins 18, 353-366. [DOI] [PubMed] [Google Scholar]
- 12.Boczko, E. M. & Brooks, C. L. (1995) Science 269, 393-396. [DOI] [PubMed] [Google Scholar]
- 13.Guo, Z., Brooks, C. L. & Boczko, E. M. (1987) Proc. Natl. Acad. Sci. USA 94, 10161-10166. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Garcia, A. E. & Onuchic, J. N. (2003) Proc. Natl. Acad. Sci. USA 100, 13898-13903. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Alonso, D. O. & Daggett, V. (2000) Proc. Natl. Acad. Sci. USA 97, 133-138. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Zhou, Y. & Karplus, M. (1999) Nature 401, 400-403. [DOI] [PubMed] [Google Scholar]
- 17.Linhanata, A. & Zhou, Y. (2002) J. Chem. Phys. 117, 8983-8995. [Google Scholar]
- 18.Jang, S., Kim, E., Shin, S. & Puk, Y. (2003) J. Am. Chem. Soc. 125, 14841-14846. [DOI] [PubMed] [Google Scholar]
- 19.Gosh, A., Elber, R. & Scheraga, H. A. (2002) Proc. Natl. Acad. Sci. USA 99, 10394-10398. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Kussell, E., Shimada, J. & Shakhnovich, E. I. (2002) Proc. Natl. Acad. Sci. USA 99, 5343-5348. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Berriz, G. F. & Shakhonovich, E. I. (2001) J. Mol. Biol. 310, 673-685. [DOI] [PubMed] [Google Scholar]
- 22.Favrin, G., Irback, A. & Wallin, S. (2002) Proteins 47, 99-105. [DOI] [PubMed] [Google Scholar]
- 23.Islam, S. A., Karplus, M. & Weaver, D. (2002) J. Mol. Biol. 318, 199-215. [DOI] [PubMed] [Google Scholar]
- 24.Moult, J., Fidelis, K., Zemla, A. & Hubbard, T. (2003) Proteins 53, 334-339. [DOI] [PubMed] [Google Scholar]
- 25.Kim, P. S. & Baldwin, R. L. (1982) Annu. Rev. Biochem. 51, 459-489. [DOI] [PubMed] [Google Scholar]
- 26.Onuchic, J. N., Socci, N. D., Luthey-Schulten, Z. & Wolynes, P. G. (1996) Folding Des. 1, 441-450. [DOI] [PubMed] [Google Scholar]
- 27.Onuchic, J. N., Wolynes, P. G., Luthey-Schulten, Z. & Socci, N. D. (1995) Proc. Natl. Acad. Sci. USA 92, 3626-3630. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Saven, J. G. & Wolynes, P. G. (1996) J. Mol. Biol. 257, 199-216. [DOI] [PubMed] [Google Scholar]
- 29.Otzen, D. E. & Oliveberg, M. (1999) Proc. Natl. Acad. Sci. USA 96, 11746-11751. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Socci, N. D., Onuchic, J. N. & Wolynes, P. G. (1998) Proteins 32, 136-158. [PubMed] [Google Scholar]
- 31.Hardin, C., Luthey-Schulten, Z. & Wolynes, P. G. (1999) Proteins 34, 281-294. [PubMed] [Google Scholar]
- 32.Bai, Y., Karimi, A., Dyson, J. H. & Wright, P. (1997) Protein Sci. 6, 1449-1457. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Deisenhofer, J. (1981) Biochemistry 20, 2361-2370. [PubMed] [Google Scholar]
- 34.Tashiro, M., Tejero, R., Zimmerman, D. E., Celda, B., Nilsson, B. & Montelione, G. T. (1997) J. Mol. Biol. 272, 573-590. [DOI] [PubMed] [Google Scholar]
- 35.Gouda, H., Torijoe, H., Saito, A., Sato, M., Arata, Y. & Shimada, I. (1992) Biochemistry 31, 9665-9672. [DOI] [PubMed] [Google Scholar]
- 36.Nilsson, B., Moks, T., Jansson, B., Abrahamsen, L., Elmbled, A., Holmgren, E., Henrickson, C., Jones, J. A. & Uhlen, M. (1987) Protein Eng. 1, 107-113. [DOI] [PubMed] [Google Scholar]
- 37.Wahlberg, E., Lendel, C., Helgstrand, M., Allard, P., Dinchas-Renquist, V., Hedquist, A., Berglund, H., Nygren, P.-A. & Hard, T. (2003) Proc. Natl. Acad. Sci. USA 100, 3185-3190. [DOI] [PMC free article] [PubMed] [Google Scholar]