Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2010 Nov 1;107(46):19867–19872. doi: 10.1073/pnas.1006428107

Recovering physical potentials from a model protein databank

J W Mullinax 1, W G Noid 1,1
PMCID: PMC2993375  PMID: 21041685

Abstract

Knowledge-based approaches frequently employ empirical relations to determine effective potentials for coarse-grained protein models directly from protein databank structures. Although these approaches have enjoyed considerable success and widespread popularity in computational protein science, their fundamental basis has been widely questioned. It is well established that conventional knowledge-based approaches do not correctly treat many-body correlations between amino acids. Moreover, the physical significance of potentials determined by using structural statistics from different proteins has remained obscure. In the present work, we address both of these concerns by introducing and demonstrating a theory for calculating transferable potentials directly from a databank of protein structures. This approach assumes that the databank structures correspond to representative configurations sampled from equilibrium solution ensembles for different proteins. Given this assumption, this physics-based theory exactly treats many-body structural correlations and directly determines the transferable potentials that provide a variationally optimized approximation to the free energy landscape for each protein. We illustrate this approach by first constructing a databank of protein structures using a model potential and then quantitatively recovering this potential from the structure databank. The proposed framework will clarify the assumptions and physical significance of knowledge-based potentials, allow for their systematic improvement, and provide new insight into many-body correlations and cooperativity in folded proteins.

Keywords: protein structure prediction, coarse-grained models, inverse problems, Yvon–Born–Green theory


Accurate potentials are essential for quantitative models of protein structure, dynamics, and function. Although atomistic force fields provide an accurate description of protein structure and fluctuations on nanosecond time scales (1), atomically detailed models remain prohibitively expensive for investigating processes that evolve on microsecond time scales or longer. In contrast, low-resolution coarse-grained (CG) models provide a highly efficient alternative for characterizing protein dynamics on time scales that are inaccessible to atomistic models (2). Indeed, since the seminal work of Levitt and Warshel (3), CG protein models have provided a powerful tool for protein structure prediction (4), for studying protein self-assembly (5), for characterizing folding dynamics (68), and for investigating functional fluctuations (9). Consequently, the development of transferable CG potentials that accurately model protein structure would represent a significant advance for many areas of computational protein science.

Following the pioneering work of Tanaka and Scheraga (10), many investigators (1115) have employed the structural correlations observed within the Protein Data Bank (PDB) (16) to determine effective “knowledge-based” statistical potentials (KBPs). In particular, quasi-chemical (17) and Boltzmann-motivated (18) approaches have played a central role in a vast array of computational studies. These approaches frequently employ empirical relations to determine interaction potentials as a correction to a reference potential by comparing the structural correlations observed in PDB structures to those that would be expected for the reference potential (12), although the identity of the appropriate reference potential has remained somewhat unclear (19, 20). These KBPs have been employed in successful protein design methods (21), in simulations of protein folding (22, 23), binding (24), and aggregation (25), in protein structure prediction (15), and in fold recognition (14).

Nevertheless, despite their considerable success and widespread popularity, knowledge-based approaches have been sharply criticized (26, 27), and their fundamental basis and physical significance have remained controversial (28, 29). The two most common criticisms of knowledge-based approaches address (i) the approximate treatment of many-body correlations and, in particular, chain connectivity (26, 27, 3033) and (ii) the treatment of structural statistics compiled from different proteins (27). It is well known that, for any condensed phase system, simple excluded volume (packing) effects generate nontrivial many-body correlations that complicate the relationship between interparticle interactions and structural correlations (34). However, quasi-chemical and Boltzmann-based approaches only approximately address these many-body correlations. Thomas and Dill dramatically demonstrated the shortcomings of this approximation by constructing a model PDB of two-dimensional lattice proteins and demonstrating that knowledge-based approaches did not recover the potential used to generate the model PDB (26).

The second common criticism of knowledge-based approaches addresses the treatment of structural statistics compiled from the PDB (26, 27). Conventional statistical thermodynamics relates a potential to the structural fluctuations sampled by a single system at equilibrium. However, knowledge-based methods consider structural statistics sampled from different proteins. Consequently, these statistics cannot be analyzed by using canonical statistical thermodynamics (27).

Very recently, we have proposed solutions to both of these challenges. In refs. 35 and 36, we have introduced a rigorous theory that exactly treats many-body correlations and quantitatively recovers interaction potentials directly from a collection of structures for a single protein. In ref. 37, we have introduced a formal theory for combining statistics from distinct systems to simultaneously optimize both the accuracy and transferability of CG potentials.

In the present work, we synthesize these advances in a coherent framework. Our central hypothesis is that the databank structures for each amino acid sequence approximate representative samples from an equilibrium solution ensemble for that protein. Given this assumption, this approach provides a first-principles variational method for quantitatively determining transferable physics-based potentials that accurately model multiple proteins directly (i.e., without iteration) from three-dimensional off-lattice structures for those proteins. The resulting effective nonbonded potentials are not pair potentials of mean force but instead incorporate both energetic and entropic effects associated with coarse graining so that, when combined, they optimally represent the free energy landscapes. Importantly, this framework does not assume any particular form for the protein potential (e.g., contact or pair additive potentials), does not require any particular resolution for the model, and can simultaneously determine a complete intramolecular potential for off-lattice simulations. The present work also validates this approach by demonstrating that, under appropriate conditions, this approach quantitatively recovers the underlying potentials from a model PDB.

Results

Model Databank.

A databank of structures was generated by using a simple CG protein model adapted from earlier models by Thirumalai and co-workers (38, 39) and Head-Gordon and co-workers (40, 41). Each amino acid was modeled with a single interaction site that was one of three types: hydrophobic (B), hydrophilic (L), or neutral (N). Hydrophobic sites interact with attractive Lennard-Jones potentials that stabilize the hydrophobic core, whereas the nonbonded interactions between other site pairs are completely repulsive. Bond, angle, and dihedral potentials maintain chain connectivity and stabilize secondary structures. This potential was chosen for convenience and does not reflect basic limitations in the present framework, which readily applies for more complex potentials.

The actual PDB contains well over 50,000 structures for many different proteins and protein complexes. Although it would be exceedingly difficult to use a model potential to generate a small number of realistic off-lattice equilibrium protein structures for each of thousands of different protein sequences, it is comparably easy to generate a large number of protein structures for a small number of sequences. Consequently, our model PDB includes thousands of structures for five distinct sequences that fold to α, β, and mixed α/β structures. Fig. 1 presents the lowest energy structure and heat capacity for each sequence. The model PDB was constructed by sampling structures from independent canonical simulations of each sequence at a reduced temperature T = 0.291, which was below the folding temperature of each sequence.

Fig. 1.

Fig. 1.

Minimum energy structure and heat capacity for each protein sequence included in the model databank. AE correspond to the following sequences: (A) B9N3(LB)4N3B9N3(LB)5N3B10; (B) B9N3(LB)4N3B9N3(LB)5L; (C) ((L2BL2B2)2L2N3)3(L2BL2B2)2L2; (D) ((L2BL2B2)2L2N3)2(L2BL2B2)2L2; (E) (LB)4BN3B3LB5N3(L2B)2BL2BN(BL)3B2N3B2(BL)4. Molecular images were made with VMD (84).

Structural Correlations.

Fig. 2 presents structural distributions calculated for the model PDB. Fig. 2 AC compares the intramolecular distributions for the model PDB (solid curves) with those expected of Boltzmann distributions for independent degrees of freedom (dashed curves). Whereas the bond stretch distributions follow simple Boltzmann distributions, the angle and dihedral distributions demonstrate coupling to other degrees of freedom. Fig. 2 DF presents the pair (radial) distribution functions (12, 34) for the B-B, B-L, and L-L pairs calculated from the model PDB. The well-defined peaks in the B-B distribution correspond to the tight packing of amino acids in the hydrophobic core of each protein. In contrast, the B-L and L-L distributions demonstrate distinct long-range structure, indicating an effective long-range attraction between these sites. However, this apparent long-range attraction is not because of direct B-L or L-L interactions, which are identical and purely repulsive. Rather, the effective attraction results from coupling to the other degrees of freedom. In particular, the chain connectivity and hydrophobic attractions drive the hydrophilic sites to the protein surface and generate the observed long-range correlations. Fig. 2 clearly demonstrates the importance of considering many-body correlations in calculating interaction potentials from a databank of structures. Direct Boltzmann inversion of the B-L and L-L pair distributions would not recover the original potentials but would instead determine distinct potentials with artificial long-range attraction.

Fig. 2.

Fig. 2.

Distribution functions calculated from the model protein databank. AC correspond to distributions for bond angles, turn dihedrals, and helix dihedrals, respectively. DF correspond to pair (radial) distributions for B-B, B-L, and L-L pairs, respectively.

Calculated Potentials.

Transferable potentials were calculated from the model PDB by employing structural information to project the many-body mean force field for each sequence onto a “basis vector” for each term in the potential (35, 36). The potentials were not assumed to follow any particular functional form. Instead, the complete potential was determined by employing fine grids and calculating the potentials at each of 1,810 grid points. Fig. 3 compares the calculated potentials (dashed red curves) with the potentials employed in generating the model PDB (solid black curves). With the exception of the high-energy regions of the dihedral potential, which were not adequately sampled below the folding temperature, the calculation recovered the original potentials with quantitative accuracy. A very large number of configurations (105 configurations per sequence) was employed to determine the 1,810 parameters used to represent the potentials on a fine grid. However, the SI Appendix demonstrates that these potentials can be accurately determined from many fewer configurations if standard functional forms are used and that, if the bonded interactions are treated as a known reference state (42), then only 5,000 total configurations are necessary to calculate the nonbonded potentials.

Fig. 3.

Fig. 3.

Comparison of original and calculated potentials corresponding to the distributions presented in Fig. 2. AC correspond to potentials for bond angles, turn dihedrals, and helix dihedrals, respectively. D presents the hydrophobic (B-B) pair potential, whereas E presents the potential for all other nonbonded pairs.

Discussion

Since the pioneering work of Tanaka and Scheraga (10), KBPs have played a central role in many areas of computational protein science (1115). However, despite their considerable success and widespread popularity, the validity and significance of KBPs have been sharply criticized (2628). It is well established that KBPs only approximately treat the many-body structural correlations in densely packed protein structures (26) and that KBPs do not correctly treat the correlations between amino acid sequence and protein structure (20). Furthermore, KBPs often assume Boltzmann statistics for the distribution of individual degrees of freedom in different proteins, even though Boltzmann statistics are justified only for the equilibrium ensemble of configurations sampled by a single protein (27).

The most central criticism of KBPs, though, addresses the relationship between protein structures in solution and in the PDB (43). Because PDB structures have been determined from crystals or NMR spectra under varying conditions (e.g., various temperature, pH, ionic strength, cofactors, detergents, etc.), the physical and biological significance of PDB structures must be carefully considered (44). Nevertheless, considerable evidence suggests that PDB structures provide a reasonable first approximation to the ensemble of structures adopted by proteins in solution. In an important early study, Richards and co-workers demonstrated that crystallized ribonuclease retained significant enzymatic activity (45, 46). In addition, previous studies have demonstrated that the heterogeneity among crystal structures for a single protein reflects the structural heterogeneity present in equilibrium solution ensembles (4751).

Moreover, considerable evidence suggests that KBPs reflect the essential physical forces that generate equilibrium protein structures in solution. Mohanty et al. have demonstrated that knowledge-based potentials are statistically correlated with physics-based atomically detailed potentials (52). Furthermore, KBPs have enjoyed surprising success in protein structure prediction (15, 53), fold recognition (54, 55), ligand docking (56, 57), protein design, (21, 58), and even in MD simulations of protein folding (22, 23) and the coupled folding–binding transitions of intrinsically disordered proteins (24). In particular, recent KBPs that employ physics-motivated reference states (59, 60) have proven to be both particularly successful in protein structure prediction (61) and also more transferable for other applications (62, 63). These results suggest that KBPs might be systematically improved by incorporating statistical physics approaches to treat many-body structural correlations and the correlations between protein sequence and structure.

Consequently, the present work represents a significant advance toward establishing a rigorous bridge between physics-based and knowledge-based CG protein potentials. Our recent generalization (35) of the venerable Yvon–Born–Green (YBG) theory (34) exactly treats the structural correlations between different degrees of freedom within a protein. In addition, the extended ensemble framework (37) exactly addresses the correlations between amino acid sequence and protein structure. Importantly, this framework does not assume a Boltzmann distribution for the statistics of individual degrees of freedom sampled from the PDB but rather adopts the somewhat weaker assumption that the structures present within the PDB for each individual protein correspond to representative samples from an equilibrium solution ensemble for that protein. By combining these advances, the present framework provides a first-principles method for calculating variationally optimized physics-based potentials for CG protein models directly (i.e., noniteratively) from equilibrium structures sampled for multiple proteins.

The present work also provides an important proof of principle for this approach. We employed a simple off-lattice protein model to sample the canonical ensemble of configurations for each of five different CG sequences that adopted various folds. The sampled structures defined a toy model for the PDB. We quantitatively recovered the original interaction potentials directly from structural correlations by solving the generalized-YBG equation (35) for this “extended” canonical ensemble (37). Several earlier studies have directly recovered contact potentials (64, 65) or iteratively determined model potentials (32, 6668). In contrast, we quantitatively recovered a complete molecular potential directly from a databank of off-lattice protein structures.

These results demonstrate that, given sufficient structural data and a sufficiently flexible basis set for the potential, the present method will determine CG potentials that quantitatively describe the many-body potential of mean force (i.e., the configuration-dependent free energy landscape) for each protein in the databank. The resulting models will then quantitatively reproduce the many-body structural correlations and configuration-dependent free energy differences of each sequence (69). In general, there may not be a set of transferable potentials that quantitatively describes the free energy landscape for each protein. In this case, the present framework determines the transferable potentials that provide an optimal approximation to the many-body potential of mean force for each protein. Notably, by directly calculating the reference state and incorporating correlations between interactions in the CG model, the present work addresses two additional challenges for determining protein potentials: (i) identifying the correct reference state (12, 19, 70) and (ii) assigning appropriate weights for various contributions in previous KBPs (7173).

In principle, the present method is limited only by the quality and quantity of equilibrium structural data. The present calculations employed a large number of structures for a few sequences to accurately determine all of the underlying potentials. However, the SI Appendix demonstrated that the structural requirements of this method can be reduced by orders of magnitude when using appropriate functional forms (71) or an appropriate reference state (42). Future work will systematically investigate these aspects of protein potentials. Future calculations using the PDB may use the extended ensemble framework (37) to improve sampling by including statistics from homologous sequences.

Finally, we note that several groups have developed iterative optimization methods for determining effective CG potentials that stabilize PDB structures as the native states for each protein (67, 71, 7477). At least in principle, the present framework should become equivalent to these optimization methods in the limit of zero temperature. The relationship between these approaches is the subject of ongoing research.

Conclusions

The present work introduces and demonstrates a statistical physics approach for determining accurate and transferable interaction potentials for CG protein models directly from equilibrium structures. This approach directly treats many-body structural correlations, addresses the relevant sequence–structure correlations, and provides a rigorous variational method for determining the transferable potentials that provide an optimal representation of the free energy landscape for multiple proteins directly from structural information. In principle, this approach is limited only by the quantity and quality of available structural data. If the structures in the PDB are representative samples from equilibrium solution ensembles, then the present framework will provide a rigorous framework for systematically improving empirical KBPs derived from PDB statistics.

Materials

The present model is adapted from the earlier models of Thirumalai and co-workers (38, 39) and Head-Gordon and co-workers (40, 41). Each amino acid was represented by one of three bead types: either hydrophobic (B), hydrophilic (L), or neutral (N). Nonbonded interactions between hydrophobic sites that are separated by more than two bonds were modeled with attractive Lennard-Jones potentials. All other nonbonded interactions were modeled with purely repulsive potentials. Bond stretches and bond angles were modeled with identical harmonic potentials. Dihedral angle interactions were modeled with one of three functional forms, which stabilize helices, turns, and beta strands, respectively. These dihedral potentials were assigned by sequence and not structure. The SI Appendix describes the potential in greater detail.

All simulations were performed by using the stochastic dynamics algorithm in GROMACS 3.3.3 (78, 79). Lengths are reported in units of equilibrium bond length [a]; energies are in units of the well depth for hydrophobic attraction [ϵ]; and temperatures are in units of [ϵ/kB]. The lowest-energy (native) structure for each sequence was identified by sampling 100 conformations per sequence from high-temperature (T = 1.5) simulations and then annealing each structure to T = 0. The heat capacity was calculated for each sequence by employing independent molecular dynamics (MD) simulations to determine the average energy at different temperatures between T = 0 to T = 1.0 and numerically differentiating the average energy.

Configurations for the model PDB were sampled from independent equilibrium simulations of each sequence at T = 0.291. The distributions presented in Fig. 2 were calculated by first averaging over structures for each sequence and then evaluating an (uniformly weighted) average over sequences.

The structures for each sequence determine a protein-dependent many-body potential of mean force (PMF), i.e., a configuration-dependent (or restricted) free energy landscape, that also defines a mean force field for each sequence (69). The PMF is the appropriate potential for a CG model that quantitatively reproduces the distribution of structures for a particular protein (80). The present approach employs an extension of the multiscale coarse-graining (MS-CG) variational principle (81, 82) to determine the transferable interaction potentials that provide an optimal approximation to the PMF for each protein (37). This approximation is obtained by projecting the mean force field onto the set of force field basis vectors (80, 83) associated with a set of transferable interaction potentials (37). However, in contrast to the MS-CG method, which employs atomistic force information (81, 82), the present approach uses only structural information by employing a recent generalization of the YBG integral equation (35, 36). Because a set of transferable interaction potentials was used in constructing the model PDB and because the force field calculation employed a set of basis vectors that spanned these interaction potentials, the calculation determined the original potentials with quantitative accuracy. More details are provided in the SI Appendix.

Supplementary Material

Supporting Information

ACKNOWLEDGMENTS.

The authors gratefully acknowledge Jayanth Banavar for many helpful conversations and also Pennsylvania State University for funding. Fig. 1 was made with VMD. VMD is developed with National Institutes of Health support by the Theoretical and Computational Biophysics group at the Beckman Institute, University of Illinois at Urbana-Champaign.

Footnotes

The authors declare no conflict of interest.

*This Direct Submission article had a prearranged editor.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1006428107/-/DCSupplemental.

References

  • 1.Karplus M, McCammon JA. Molecular dynamics simulations of biomolecules. Nat Struct Mol Biol. 2002;9:646–652. doi: 10.1038/nsb0902-646. [DOI] [PubMed] [Google Scholar]
  • 2.Tozzini V. Coarse-grained models for proteins. Curr Opin Struct Biol. 2005;15:144–150. doi: 10.1016/j.sbi.2005.02.005. [DOI] [PubMed] [Google Scholar]
  • 3.Levitt M, Warshel A. Computer simulation of protein folding. Nature. 1975;253:694–698. doi: 10.1038/253694a0. [DOI] [PubMed] [Google Scholar]
  • 4.Zhou H, Pandit SB, Skolnick J. Performance of the Pro-sp3-TASSER server in CASP8. Proteins. 2009;77:123–127. doi: 10.1002/prot.22501. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Nguyen HD, Reddy VS, Brooks CL., III Deciphering the kinetic mechanism of spontaneous self-assembly of icosahedral capsids. Nano Lett. 2007;7:338–344. doi: 10.1021/nl062449h. [DOI] [PubMed] [Google Scholar]
  • 6.Honeycutt JD, Thirumalai D. Metastability of the folded states of globular proteins. Proc Natl Acad Sci USA. 1990;87:3526–3529. doi: 10.1073/pnas.87.9.3526. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Leopold PE, Montal M, Onuchic JN. Protein folding funnels—a kinetic approach to the sequence structure relationship. Proc Natl Acad Sci USA. 1992;89:8721–8725. doi: 10.1073/pnas.89.18.8721. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Dill KA, Chan HS. From Levinthal to pathways to funnels. Nat Struct Biol. 1997;4:10–19. doi: 10.1038/nsb0197-10. [DOI] [PubMed] [Google Scholar]
  • 9.Bahar I, Rader A. Coarse-grained normal mode analysis in structural biology. Curr Opin Struct Biol. 2005;15:586–592. doi: 10.1016/j.sbi.2005.08.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Tanaka S, Scheraga HA. Medium- and long-range interaction parameters between amino acids for predicting three-dimensional structures of proteins. Macromolecules. 1976;9:945–950. doi: 10.1021/ma60054a013. [DOI] [PubMed] [Google Scholar]
  • 11.Sippl MJ. Knowledge-based potentials for proteins. Curr Opin Struct Biol. 1995;5:229–235. doi: 10.1016/0959-440x(95)80081-6. [DOI] [PubMed] [Google Scholar]
  • 12.Jernigan RL, Bahar I. Structure-derived potential and protein simulations. Curr Opin Struct Biol. 1996;6:195–209. doi: 10.1016/s0959-440x(96)80075-3. [DOI] [PubMed] [Google Scholar]
  • 13.Lazaridis T, Karplus M. Effective energy functions for protein structure prediction. Curr Opin Struct Biol. 2000;10:139–145. doi: 10.1016/s0959-440x(00)00063-4. [DOI] [PubMed] [Google Scholar]
  • 14.Buchete NV, Straub JE, Thirumalai D. Development of novel statistical potentials for protein fold recognition. Curr Opin Struct Biol. 2004;14:225–232. doi: 10.1016/j.sbi.2004.03.002. [DOI] [PubMed] [Google Scholar]
  • 15.Skolnick J. In quest of an empirical potential for protein structure prediction. Curr Opin Struct Biol. 2006;16:166–171. doi: 10.1016/j.sbi.2006.02.004. [DOI] [PubMed] [Google Scholar]
  • 16.Berman H, Henrick K, Nakamura H. Announcing the worldwide protein data bank. Nat Struct Mol Biol. 2003;10:980. doi: 10.1038/nsb1203-980. [DOI] [PubMed] [Google Scholar]
  • 17.Miyazawa S, Jernigan RL. Estimation of effective interresidue contact energies from protein crystal structures: Quasichemical approximation. Macromolecules. 1985;18:534–552. [Google Scholar]
  • 18.Sippl MJ. Calculation of conformational ensembles from potentials of mean force. An approach to the knowledge-based prediction of local structures in globular proteins. J Mol Biol. 1990;213:859–883. doi: 10.1016/s0022-2836(05)80269-4. [DOI] [PubMed] [Google Scholar]
  • 19.Betancourt MR, Thirumalai D. Pair potentials for protein folding: Choice of reference states and sensitivity of predicted native states to variations in the interaction schemes. Protein Sci. 1999;8:361–369. doi: 10.1110/ps.8.2.361. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Skolnick J, Kolinski A, Ortiz A. Derivation of protein-specific pair potentials based on weak sequence fragment similarity. Proteins. 2000;38:3–16. [PubMed] [Google Scholar]
  • 21.Boas FE, Harbury PB. Potential energy functions for protein design. Curr Opin Struct Biol. 2007;17:199–204. doi: 10.1016/j.sbi.2007.03.006. [DOI] [PubMed] [Google Scholar]
  • 22.Wilson C, Doniach S. A computer-model to dynamically simulate protein folding—studies with crambin. Proteins. 1989;6:193–209. doi: 10.1002/prot.340060208. [DOI] [PubMed] [Google Scholar]
  • 23.Lee J, Liwo A, Scheraga HA. Energy-based de novo protein folding by conformational space annealing and an off-lattice united-residue force field: Application to the 10-55 fragment of staphylococcal protein A and to apo calbindin D9K. Proc Natl Acad Sci USA. 1999;96:2025–2030. doi: 10.1073/pnas.96.5.2025. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Turjanski AG, Gutkind JS, Best RB, Hummer G. Binding-induced folding of a natively unstructured transcription factor. PLoS Comput Biol. 2008;4:e1000060. doi: 10.1371/journal.pcbi.1000060. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Bereau T, Deserno M. Generic coarse-grained model for protein folding and aggregation. J Chem Phys. 2009;130:235106. doi: 10.1063/1.3152842. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Thomas PD, Dill KA. Statistical potentials extracted from protein structures: How accurate are they? J Mol Biol. 1996;257:457–469. doi: 10.1006/jmbi.1996.0175. [DOI] [PubMed] [Google Scholar]
  • 27.Ben-Naim A. Statistical potentials extracted from protein structures: Are these meaningful potentials? J Chem Phys. 1997;107:3698–3706. [Google Scholar]
  • 28.Moult J. Comparison of database potentials and molecular mechanics force fields. Curr Opin Struct Biol. 1997;7:194–199. doi: 10.1016/s0959-440x(97)80025-5. [DOI] [PubMed] [Google Scholar]
  • 29.Koppensteiner WA, Sippl MJ. Knowledge-based potentials—back to the roots. Biochemistry (Moscow) 1998;63:247–252. [PubMed] [Google Scholar]
  • 30.Skolnick J, Jaroszewski L, Kolinski A, Godzik A. Derivation and testing of pair potentials for protein folding. When is the quasichemical approximation correct? Protein Sci. 1997;6:676–688. doi: 10.1002/pro.5560060317. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Dehouck Y, Gilis D, Rooman M. A new generation of statistical potentials for proteins. Biophys J. 2006;90:4010–4017. doi: 10.1529/biophysj.105.079434. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Betancourt MR. Another look at the conditions for the extraction of protein knowledge-based potentials. Proteins. 2009;76:72–85. doi: 10.1002/prot.22320. [DOI] [PubMed] [Google Scholar]
  • 33.Májek P, Elber R. A coarse-grained potential for fold recognition and molecular dynamics simulations of proteins. Proteins. 2009;76:822–836. doi: 10.1002/prot.22388. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Hansen JP, McDonald IR. Theory of Simple Liquids. 2 Ed. New York: Academic; 1990. [Google Scholar]
  • 35.Mullinax JW, Noid WG. A generalized Yvon-Born-Green theory for molecular systems. Phys Rev Lett. 2009;103:198104. doi: 10.1103/PhysRevLett.103.198104. [DOI] [PubMed] [Google Scholar]
  • 36.Mullinax JW, Noid WG. A generalized Yvon-Born-Green theory for determining coarse-grained interaction potentials. J Phys Chem C. 2010;114:5661–5674. [Google Scholar]
  • 37.Mullinax JW, Noid WG. Extended ensemble approach for deriving transferable coarse-grained potentials. J Chem Phys. 2009;131:104110. [Google Scholar]
  • 38.Honeycutt JD, Thirumalai D. The nature of folded states of globular proteins. Biopolymers. 1992;32:695–709. doi: 10.1002/bip.360320610. [DOI] [PubMed] [Google Scholar]
  • 39.Guo Z, Thirumalai D. Kinetics and thermodynamics of folding of a de novo designed four-helix bundle protein. J Mol Biol. 1996;263:323–343. doi: 10.1006/jmbi.1996.0578. [DOI] [PubMed] [Google Scholar]
  • 40.Sorenson JM, Head-Gordon T. Matching simulation and experiment: A new simplified model for simulating protein folding. J Comput Biol. 2000;7:469–481. doi: 10.1089/106652700750050899. [DOI] [PubMed] [Google Scholar]
  • 41.Brown S, Fawzi NJ, Head-Gordon T. Coarse-grained sequences for protein folding and design. Proc Natl Acad Sci USA. 2003;100:10712–10717. doi: 10.1073/pnas.1931882100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Mullinax JW, Noid WG. Reference state for the generalized-yvon-born-green theory: Application for coarse-grained model of hydrophobic hydration. J Chem Phys. 2010;133:124107. doi: 10.1063/1.3481574. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Burgi HB, Dunitz JD. Can statistical-analysis of structural parameters from different crystal environments lead to quantitative energy relationships. Acta Crystallogr, Sect B: Struct Sci. 1988;44:445–448. [Google Scholar]
  • 44.Mozzarelli A, Rossi GL. Protein function in the crystal. Annu Rev Biophys Biomol Struct. 1996;25:343–365. doi: 10.1146/annurev.bb.25.060196.002015. [DOI] [PubMed] [Google Scholar]
  • 45.Doscher MS, Richards FM. The activity of an enzyme in the crystalline state: Ribonuclease S. J Biol Chem. 1963;238:2399–2406. [Google Scholar]
  • 46.Kenkare UW, Richards FM. The histidyl residues in ribonuclease-S photooxidation in solution and in single crystals: The iodination of histidine-12. J Biol Chem. 1966;241:3197–3206. [PubMed] [Google Scholar]
  • 47.Bryant SH, Lawrence CE. The frequency of ion-pair substructures in proteins is quantitatively related to electrostatic potential—a statistical-model for nonbonded interactions. Proteins. 1991;9:108–119. doi: 10.1002/prot.340090205. [DOI] [PubMed] [Google Scholar]
  • 48.Kossiakoff AA, Randal M, Guenot J, Eigenbrot C. Variability of conformations at crystal contacts in BPTI represent true low-energy structures—correspondence among lattice packing and molecular-dynamics structures. Proteins. 1992;14:65–74. doi: 10.1002/prot.340140108. [DOI] [PubMed] [Google Scholar]
  • 49.Butterfoss GL, Hermans J. Boltzmann-type distribution of side-chain conformation in proteins. Protein Sci. 2003;12:2719–2731. doi: 10.1110/ps.03273303. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Best RB, Lindorff-Larsen K, DePristo MA, Vendruscolo M. Relation between native ensembles and experimental structures of proteins. Proc Natl Acad Sci USA. 2006;103:10901–10906. doi: 10.1073/pnas.0511156103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Lange OF, et al. Recognition dynamics up to microseconds revealed from an RDC-derived ubiquitin ensemble in solution. Science. 2008;320:1471–1475. doi: 10.1126/science.1157092. [DOI] [PubMed] [Google Scholar]
  • 52.Mohanty D, Dominy BN, Kolinski A, Brooks CL, Skolnick J. Correlation between knowledge-based and detailed atomic potentials: Application to the unfolding of the GCN4 leucine zipper. Proteins. 1999;35:447–452. [PubMed] [Google Scholar]
  • 53.Sun SJ. Reduced representation model of protein-structure prediction—statistical potential and genetic algorithms. Protein Sci. 1993;2:762–785. doi: 10.1002/pro.5560020508. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Bryant SH, Lawrence CE. An empirical energy function for threading protein-sequence through the folding motif. Proteins. 1993;16:92–112. doi: 10.1002/prot.340160110. [DOI] [PubMed] [Google Scholar]
  • 55.Kocher JPA, Rooman MJ, Wodak SJ. Factors influencing the ability of knowledge-based potentials to identify native sequence-structure matches. J Mol Biol. 1994;235:1598–1613. doi: 10.1006/jmbi.1994.1109. [DOI] [PubMed] [Google Scholar]
  • 56.DeWitte RS, Shakhnovich EI. SMoG: De novo design method based on simple, fast, and accurate free energy estimate. 1. Methodology and supporting evidence. J Am Chem Soc. 1996;118:11733–11744. [Google Scholar]
  • 57.Gohlke H, Hendlich M, Klebe G. Knowledge-based scoring function to predict protein-ligand interactions. J Mol Biol. 2000;295:337–356. doi: 10.1006/jmbi.1999.3371. [DOI] [PubMed] [Google Scholar]
  • 58.Poole AM, Ranganathan R. Knowledge-based potentials in protein design. Curr Opin Struct Biol. 2006;16:508–513. doi: 10.1016/j.sbi.2006.06.013. [DOI] [PubMed] [Google Scholar]
  • 59.Shen MY, Sali A. Statistical potential for assessment and prediction of protein structures. Protein Sci. 2006;15:2507–2524. doi: 10.1110/ps.062416606. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Zhou H, Zhou Y. Distance-scaled, finite ideal-gas reference state improves structure-derived potentials of mean force for structure selection. Protein Sci. 2002;11:2714–2726. doi: 10.1110/ps.0217002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Eramian D, et al. A composite score for predicting errors in protein structure models. Protein Sci. 2006;15:1653–1666. doi: 10.1110/ps.062095806. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Liu S, Zhang C, Zhou HY, Zhou YQ. A physical reference state unifies the structure-derived potential of mean force for protein folding and binding. Proteins. 2004;56:93–101. doi: 10.1002/prot.20019. [DOI] [PubMed] [Google Scholar]
  • 63.Zhang C, Liu S, Zhou HY, Zhou YQ. The dependence of all-atom statistical potentials on structural training database. Biophys J. 2004;86:3349–3358. doi: 10.1529/biophysj.103.035998. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Zhang L, Skolnick J. How do potentials derived from structural databases relate to “true” potentials? Protein Sci. 1998;7:112–122. doi: 10.1002/pro.5560070112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Zhang C. Extracting contact energies from protein structures: A study using a simplified model. Proteins. 1998;31:299–308. [PubMed] [Google Scholar]
  • 66.Thomas P, Dill K. An iterative method for extracting energy-like quantities from protein structures. Proc Natl Acad Sci USA. 1996;93:11628–11633. doi: 10.1073/pnas.93.21.11628. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Mirny L, Shakhnovich E. How to derive a protein folding potential? A new approach to an old problem. J Mol Biol. 1996;264:1164–1179. doi: 10.1006/jmbi.1996.0704. [DOI] [PubMed] [Google Scholar]
  • 68.Seno F, Trovato A, Banavar JR, Maritan A. Maximum entropy method for deducing amino acid interactions in proteins. Phys Rev Lett. 2008;100:078102. doi: 10.1103/PhysRevLett.100.078102. [DOI] [PubMed] [Google Scholar]
  • 69.Liwo A, Czaplewski C, Pillardy J, Scheraga HA. Cumulant-based expressions for the multibody terms for the correlation between local and electrostatic interactions in the united-residue force field. J Chem Phys. 2001;115:2323–2347. [Google Scholar]
  • 70.Godzik A, Kolinski A, Skolnick J. Are proteins ideal mixtures of amino-acids—analysis of energy parameter sets. Protein Sci. 1995;4:2107–2117. doi: 10.1002/pro.5560041016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Liwo A, et al. A united-residue force field for off-lattice protein-structure simulations. 1. Functional forms and parameters of long-range side-chain interaction potentials from protein crystal data. J Comput Chem. 1997;18:849–873. [Google Scholar]
  • 72.Kortemme T, Morozov AV, Baker D. An orientation-dependent hydrogen bonding potential improves prediction of specificity and structure for proteins and protein-protein complexes. J Mol Biol. 2003;326:1239–1259. doi: 10.1016/s0022-2836(03)00021-4. [DOI] [PubMed] [Google Scholar]
  • 73.Lu M, Dousis AD, Ma J. OPUS-Rota: A fast and accurate method for side-chain modeling. Protein Sci. 2008;17:1576–1585. doi: 10.1110/ps.035022.108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Maiorov VN, Crippen GM. Contact potential that recognizes the correct folding of globular-proteins. J Mol Biol. 1992;227:876–888. doi: 10.1016/0022-2836(92)90228-c. [DOI] [PubMed] [Google Scholar]
  • 75.Goldstein RA, Luthey-Schulten ZA, Wolynes PG. Optimal protein-folding codes from spin-glass theory. Proc Natl Acad Sci USA. 1992;89:4918–4922. doi: 10.1073/pnas.89.11.4918. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Hao MH, Scheraga HA. How optimization of potential functions affects protein folding. Proc Natl Acad Sci USA. 1996;93:4984–4989. doi: 10.1073/pnas.93.10.4984. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Seno F, Micheletti C, Maritan A, Banavar J. Variational approach to protein design and extraction of interaction potentials. Phys Rev Lett. 1998;81:2172–2175. [Google Scholar]
  • 78.Lindahl E, Hess B, van der Spoel D. GROMACS 3.0: A package for molecular simulation and trajectory analysis. J Mol Model. 2001;7:306–317. [Google Scholar]
  • 79.van der Spoel D, et al. GROMACS: Fast, flexible, and free. J Comput Chem. 2005;26:1701–1718. doi: 10.1002/jcc.20291. [DOI] [PubMed] [Google Scholar]
  • 80.Noid WG, et al. The multiscale coarse-graining method. I. A rigorous bridge between atomistic and coarse-grained models. J Chem Phys. 2008;128:244114. doi: 10.1063/1.2938860. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 81.Izvekov S, Voth GA. A multiscale coarse-graining method for biomolecular systems. J Phys Chem B. 2005;109:2469–2473. doi: 10.1021/jp044629q. [DOI] [PubMed] [Google Scholar]
  • 82.Izvekov S, Voth GA. Multiscale coarse graining of liquid-state systems. J Chem Phys. 2005;123:134105. doi: 10.1063/1.2038787. [DOI] [PubMed] [Google Scholar]
  • 83.Noid WG, et al. The multiscale coarse-graining method. II. Numerical implementation for molecular coarse-grained models. J Chem Phys. 2008;128:244115. doi: 10.1063/1.2938857. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 84.Humphrey W, Dalke A, Schulten K. VMD: Visual molecular dynamics. J Mol Graph. 1996;14:33–38. doi: 10.1016/0263-7855(96)00018-5. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supporting Information

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES