Abstract
We demonstrate the ability of simultaneously determining a protein’s folding pathway and structure using a properly formulated model without prior knowledge of the native structure. Our model employs a natural coordinate system for describing proteins and a search strategy inspired by the observation that real proteins fold in a sequential fashion by incrementally stabilizing native-like substructures or "foldons". Comparable folding pathways and structures are obtained for the twelve proteins recently studied using atomistic molecular dynamics simulations [K. Lindorff-Larsen, S. Piana, R.O. Dror, D. E. Shaw, Science 334, 517 (2011)], with our calculations running several orders of magnitude faster. We find that native-like propensities in the unfolded state do not necessarily determine the order of structure formation, a departure from a major conclusion of the MD study. Instead, our results support a more expansive view wherein intrinsic local structural propensities may be enhanced or overridden in the folding process by environmental context. The success of our search strategy validates it as an expedient mechanism for folding both in silico and in vivo.
The discovery that a protein’s structure is determined by its amino acid sequence has motivated efforts to replicate the folding process in silico. A successful algorithm for describing folding should enable predicting both the pathway and structure, two intertwined issues that generally have been treated separately. All-atom molecular dynamics (MD) simulations can address both issues simultaneously as demonstrated by a recent success in folding a dozen small proteins 1. Although remarkable, the simulations require very specialized hardware and extensive amounts of computing time. Our goal is to develop an alternate approach that identifies basic folding principles and then integrates them into a rapid, accurate, and physically revealing algorithm.
Our algorithm, termed TerItFix, is motivated by the manner in which real proteins fold. Growing evidence suggests that proteins fold along a limited number of low-energy pathways 2–8, with the order of events guided by a process termed sequential stabilization (SS). Here, nascent native-like substructures serve as templates for the formation of additional structure through the stepwise addition of cooperative folding subunits or “foldons” 9–15. We explicitly implement SS by using the information gleaned from earlier rounds of folding simulations to guide folding in subsequent rounds. The biasing is intended to assist the polypeptide up and over the major free energy barrier between the unfolded and native states in a manner that replicates the authentic folding process 16, 17.
Our initial folding round involves ~500 separate Monte Carlo simulated annealing (MCSA) trajectories that begin from a realistic denatured state ensemble (DSE)18, rather than from a state containing, for example, biases from homology-based secondary structure predictions. The best 25% (lowest energy) structures are used to identify the preferred local and nonlocal interactions for each residue in the form of a consensus secondary structure and average inter-residue contacts and hydrogen bonds. This information from a given round is used in the next round of ~500 trajectories to restrict the sampling of backbone (φ,ψ) dihedral angles and energetically bias the formation of the tertiary contacts and hydrogen bonds. The iterative process incrementally generates additional secondary and tertiary structure and hydrogen bonds as the rounds proceed, producing a series of events that may correspond to the genuine folding pathway.
We use a representation containing all backbone atoms plus the Cβ carbons, a move set involving smart (φ,ψ) dihedral angle distributions, and a combination of single (φ,ψ) pivots and local crankshaft moves17. Angles are selected from a PDB-based coil library, contingent on the chemical identity of the flanking residues. As secondary structure information is deduced from prior rounds, angle selection is correspondingly biased. Three energy functions capture the chemical properties of the different amino acids 16, 17, 19, 20. The first function includes a pairwise additive, distance, orientation, and secondary structure dependent statistical potential designed to promote the formation of chain topologies with hydrophobic cores. The other two statistical potentials are multi-body terms designed to capture the properties of side chain burial and hydrogen bonding.
Figure 1 displays the most native-like structures obtained from the TerItFix simulations for the twelve proteins studied by Lindorff-Larsen, et al.1 The calculations for each protein take around 600 CPU hours on an Intel 2.6 GHz "Sandy Bridge" Xeon E5-2670 processor. Using the same processor running NAMD, a single 10μs MD trajectory would take around 3,000,000 CPU hours/protein. TerItFix produces an average root-mean-square deviation (RMSD) from the native structure of 2.96 ± 1.33 Å for the centroids of the largest clusters, compared to 2.07 ± 1.31 Å for the all-atom MD simulations. TerItFix generates centroids with lower RMSDs for half of the proteins. By a significant 1.5 Å margin, TerItFix’s worst result is for NTL9, whereas this protein produces MD’s best result (5.0 versus 0.5 Å for the cluster centroids). The crystal structure of NTL9 appears with an extra 12 residue helix which produces a helix-swapped dimer. Without this extra helix, the structure has an unusual, exposed hydrophobic face that probably contributes to TerItFix’s difficulty in obtaining a good prediction.
Unlike most free modeling algorithms for predicting structure, TerItFix does not use fragments or invoke any prior assumptions (or predictions from machine learning) about the protein’s secondary structure. An additional feature distinguishing TerItFix from prior approaches is the generation of a de facto folding pathway that is determined from the progressive appearance of structure in the multiple rounds of MCSA simulations. As described elsewhere, TerItFix is sensitive enough to identify the non-native interactions that lead to intermediates23 and non-native strand registry24, results consistent with experimental studies.
Calculations for the majority of the twelve fast-folding proteins converge within the first 2-3 rounds of TerItFix. The folding pathway is depicted for each residue by plotting the fraction of structures from the end of each round for which the residue lies within 2 Å of the native structure when the structures are aligned to the native structure by the TM-score, a global metric used to assess the quality of structures 25.
The TerItFix (Figure 2) and all-atom MD pathways (Figure 3 in Ref. 1) exhibit a similar order of structure formation. The same regions of the sequence tend to develop structure early in both classes of simulations, further validating the ability of TerItFix to identify gross aspects of the folding pathways. Our definition of structure formation is based on a global alignment to the native structure, whereas the definition used by Lindorff-Larsen, et al. employs a local (five residue) metric of similarity to the native structure. Consequently, their depiction of the pathway is primarily sensitive to secondary rather than tertiary structure formation. The two methods also yield similar pathways when analyzed with the same local metric (Fig. S1).
The order in which residues become native-like in the all-atom MD simulations correlates with their propensity to form local native-like secondary structure in the DSE 1. The DSE of the MD simulations contains a high, and potentially excessive amount of secondary structure in an overly collapsed DSE7, 26, 27, particularly for λ repressor where denaturation is accompanied by the unfolding of the helices according to multiple methods17, 28–31. In contrast, TerItFix simulations begin from an unstructured DSE that reproduces the experimentally observed NMR residual dipolar coupling patterns and dimensions of expanded chains in the DSEs 18. Figure S2 provides a sample of five random initial structures from the TerItFix’s DSE for each of the twelve proteins. This difference likely accounts for the early portions of the pathways produced by TerItFix having less helical structure, especially for λ repressor, Protein B and villin.
In spite of this disparity, the similarities between the methods are notable. The collapsed environment in the all-atom DSE present in the MD simulations likely promotes secondary structure in the manner similar to what TertItFix produces as a consequence of templating onto existing structure as the temperature is lowered in the MCSA. The folding behavior is guided by similar environmental clues in both methods, a feature that may account for the general agreement between the two methods.
Our finding that TerItFix produces similar folding pathways despite starting with a much less structured DSE cautions against overemphasizing the importance of the unfolded state propensities in determining folding pathways. TerItFix’s success provides support to a more expansive view where intrinsic local structural propensities (which often favor non-native polyproline II conformations18, 32) may be overridden by environmental context as stable motifs sequentially interact and stabilize the formation of additional structure in an incremental fashion. For example, an otherwise unstable amphipathic helix or hairpin is stabilized in the presence of hydrophobic surfaces, whether they arise semi-randomly or through specific interactions. We believe this view more accurately reflects folding behavior 33 than one that emphasizes the formation of local native-like structure in the DSE.
The TerItFix algorithm contains all six of the necessary physical interactions necessary for a successful algorithm, as recently proposed by Dill, 34 namely, hydrogen bonds, van der Waals interactions, backbone angle preferences, electrostatic interactions, hydrophobic interactions and chain entropy. In addition, we include a backbone desolvation term to reflect the observation that buried hydrogen bond donors and acceptors essentially always form hydrogen bonds in native structures 35. Accordingly, the term we introduce to recognize this 7th feature penalizes buried amide nitrogens and carbonyl oxygens with unsatisfied hydrogen bonds. This burial term also serves to inhibit an unphysical, early, non-specific hydrophobic collapse 26, 27, 36.
However, these seven energetic terms alone are woefully insufficient to locate the “needle in a haystack” native structure within the vastness of conformational space. Early folding steps in the cooperative folding of proteins are uphill in free energy, and even productive conformations unfold faster than they form on the reactant side of the kinetic barrier. Hence, an explicit search strategy is essential to guide the uphill exploration through conformational space. Accordingly, we reinforce the process of sequential stabilization by biasing (rather than enforcing) the backbone sampling, hydrogen bonding and tertiary contacts in an iterative manner intended to mimic how real proteins traverse the energy surface on the route(s) to the native state. The success of this approach provides strong support for the contention that sequential stabilization provides an expedient mechanism for folding proteins both in vitro and in silico. The natural process of stepwise assembly in protein folding might also explain successes of previous protein modeling methods based on a combination of building blocks37 and evolutionary algorithms to increase native content38.
We produce encouraging results for both native structures and folding pathways for a variety of proteins without utilizing homology-based information. The overall simplicity of the Cβ-level model (lacking side chain rotameric states) decreases computational requirements by orders of magnitude and provides a direct route to apply and validate our understanding of the fundamental principles relevant to protein folding, principles also expected to be crucial for predicting protein recognition and conformational changes.
Supplementary Material
Acknowledgments
We thank the Freed/Sosnick groups and S. Piana for helpful discussions and sharing of results, M. Wilde (ANL) for computing assistance, and the ANL/UC CI for computing resources. This work was supported by grants from the NIH (GM55694) (computational work and comparisons with experiment) and the U.S. Department of Energy, Office of Basic Energy Sciences, Division of Materials Sciences and Engineering under Award DE-SC0008631 (theory and algorithm development). Computations were also performed on the Midway and Beagle clusters at the University of Chicago.
Footnotes
Supplementary Materials:
Materials and Methods
Figures S1
Figure S2
References
- 1.Lindorff-Larsen K, Piana S, Dror RO, et al. Science. 334:517. doi: 10.1126/science.1208351. [DOI] [PubMed] [Google Scholar]
- 2.Martinez JC, Serrano L. Nature Struct. Biol. 1999;6:1010. doi: 10.1038/14896. [DOI] [PubMed] [Google Scholar]
- 3.Baxa M, Freed KF, Sosnick TR. J. Mol. Biol. 2008;381:1362. doi: 10.1016/j.jmb.2008.06.067. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Viguera AR, Serrano L. Nat. Struct. Biol. 1997;4:939. doi: 10.1038/nsb1197-939. [DOI] [PubMed] [Google Scholar]
- 5.Grantcharova VP, Riddle DS, Baker D. Proc. Natl. Acad. Sci. U S A. 2000;97:7084. doi: 10.1073/pnas.97.13.7084. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Shandiz AT, Baxa MC, Sosnick TR. Prot. Sci. 2012;21:819. doi: 10.1002/pro.2065. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Sosnick TR, Barrick D. Curr. Opin. Struct. Biol. 2011;21:12. doi: 10.1016/j.sbi.2010.11.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Sosnick TR, Hinshaw JR. Science. 2011;334:464. doi: 10.1126/science.1214018. [DOI] [PubMed] [Google Scholar]
- 9.Chamberlain AK, Handel TM, Marqusee S. Nature Struct. Biol. 1996;3:782. doi: 10.1038/nsb0996-782. [DOI] [PubMed] [Google Scholar]
- 10.Feng H, Vu ND, Bai Y. J. Mol. Biol. 2004;343:1477. doi: 10.1016/j.jmb.2004.08.099. [DOI] [PubMed] [Google Scholar]
- 11.Bedard S, Mayne LC, Peterson RW, et al. J Mol Biol. 2008;376:1142. doi: 10.1016/j.jmb.2007.12.020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Zheng Z, Sosnick TR. J. Mol. Biol. 2010;397:777. doi: 10.1016/j.jmb.2010.01.056. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Maity H, Maity M, Krishna MM, et al. Proc. Natl. Acad. Sci. U S A. 2005;102:4741. doi: 10.1073/pnas.0501043102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Bai Y, Sosnick TR, Mayne L, et al. Science. 1995;269:192. doi: 10.1126/science.7618079. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Krantz BA, Dothager RS, Sosnick TR. J. Mol. Biol. 2004;337:463. doi: 10.1016/j.jmb.2004.01.018. [DOI] [PubMed] [Google Scholar]
- 16.DeBartolo J, Colubri A, Jha AK, et al. Proc. Natl. Acad. Sci. U S A. 2009;106:3734. doi: 10.1073/pnas.0811363106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Adhikari A, Freed KF, Sosnick TR. Proc. Natl. Acad. Sci. U S A. 2012;109:17442. doi: 10.1073/pnas.1209000109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Jha AK, Colubri A, Freed KF, et al. Proc. Natl. Acad. Sci. U S A. 2005;102:13099. doi: 10.1073/pnas.0506078102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Fitzgerald JE, Jha AK, Colubri A, et al. Protein Sci. 2007;16:2123. doi: 10.1110/ps.072939707. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Adhikari AN, Peng J, Wilde M, et al. Protein Sci. 2012;21:107. doi: 10.1002/pro.767. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Adhikari AN, Freed KF, Sosnick TR. PNAS. 2012 doi: 10.1073/pnas.1209000109. accepted. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Jha AK, Colubri A, Freed KF, et al. Proceedings of the National Academy of Sciences of the United States of America. 2005;102:13099. doi: 10.1073/pnas.0506078102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Morton VL, Friel CT, Allen LR, et al. J. Mol. Biol. 2007;371:554. doi: 10.1016/j.jmb.2007.05.010. [DOI] [PubMed] [Google Scholar]
- 24.Yoo TY, Adhikari A, Xia Z, et al. J. Mol. Biol. 2012;420:220. doi: 10.1016/j.jmb.2012.04.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Zhang Y, Skolnick J. Proteins-Structure Function and Bioinformatics. 2004;57:702. doi: 10.1002/prot.20264. [DOI] [PubMed] [Google Scholar]
- 26.Jacob J, Krantz B, Dothager RS, et al. J. Mol. Biol. 2004;338:369. doi: 10.1016/j.jmb.2004.02.065. [DOI] [PubMed] [Google Scholar]
- 27.Yoo TY, Meisburger SP, Hinshaw J, et al. J. Mol. Biol. 2012;418:226. doi: 10.1016/j.jmb.2012.01.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Chugha P, Sage HJ, Oas TG. Protein Sci. 2006;15:533. doi: 10.1110/ps.051856406. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Krantz BA, Srivastava AK, Nauli S, et al. Nature Struct. Biol. 2002;9:458. doi: 10.1038/nsb794. [DOI] [PubMed] [Google Scholar]
- 30.Huang GS, Oas TG. Biochemistry. 1995;34:3884. doi: 10.1021/bi00012a003. [DOI] [PubMed] [Google Scholar]
- 31.Burton RE, Huang GS, Daugherty MA, et al. Nature Struct. Biol. 1997;4:305. doi: 10.1038/nsb0497-305. [DOI] [PubMed] [Google Scholar]
- 32.Jha AK, Colubri A, Zaman MH, et al. Biochemistry. 2005;44:9691. doi: 10.1021/bi0474822. [DOI] [PubMed] [Google Scholar]
- 33.Meisner WK, Sosnick TR. Proc. Natl. Acad. Sci. U S A. 2004;101:13478. doi: 10.1073/pnas.0404057101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Dill KA, MacCallum JL. Science. 2012;338:1042. doi: 10.1126/science.1219021. [DOI] [PubMed] [Google Scholar]
- 35.Fleming PJ, Rose GD. Protein Sci. 2005;14:1911. doi: 10.1110/ps.051454805. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Sosnick TR, Mayne L, Englander SW. Proteins. 1996;24:413. doi: 10.1002/(SICI)1097-0134(199604)24:4<413::AID-PROT1>3.0.CO;2-F. [DOI] [PubMed] [Google Scholar]
- 37.Azoitei ML, Correia BE, Ban YE, et al. Science. 334:373. doi: 10.1126/science.1209368. [DOI] [PubMed] [Google Scholar]
- 38.Schug A, Wenzel W. Biophys J. 2006;90:4273. doi: 10.1529/biophysj.105.070409. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.