Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2014 Aug 11;111(34):12408–12413. doi: 10.1073/pnas.1413575111

Coevolutionary information, protein folding landscapes, and the thermodynamics of natural selection

Faruck Morcos a, Nicholas P Schafer a,b, Ryan R Cheng a, José N Onuchic a,b,c,d, Peter G Wolynes a,b,c,d,1
PMCID: PMC4151759  PMID: 25114242

Significance

Natural protein sequences, being the result of random mutation coupled with natural selection, have remarkable properties that are not typical of unselected random sequences, including the ability to robustly fold to an organized structure that is needed to function. We estimate the selection temperature, the effective temperature at which sequences were selected by evolution, for eight protein families and compare these values with experimental data for folding temperatures of proteins in each family. The selection temperature measures the importance of maintaining the stability and structural specificity of the folded state on the evolutionary process. For all families, the selection temperature is below physiological temperature, indicating that maintaining the structural integrity of the folded state is an important constraint on evolution.

Keywords: energy landscape theory, information theory, selection temperature, funneled landscapes, elastic effects

Abstract

The energy landscape used by nature over evolutionary timescales to select protein sequences is essentially the same as the one that folds these sequences into functioning proteins, sometimes in microseconds. We show that genomic data, physical coarse-grained free energy functions, and family-specific information theoretic models can be combined to give consistent estimates of energy landscape characteristics of natural proteins. One such characteristic is the effective temperature Tsel at which these foldable sequences have been selected in sequence space by evolution. Tsel quantifies the importance of folded-state energetics and structural specificity for molecular evolution. Across all protein families studied, our estimates for Tsel are well below the experimental folding temperatures, indicating that the energy landscapes of natural foldable proteins are strongly funneled toward the native state.


The physics and natural history of proteins are inextricably intertwined (1, 2). The cooperative manner in which proteins find their way to a folded structure is the result of proteins having undergone natural selection and not typical of random polymers (3, 4). Likewise, the requirement that most proteins must fold to function is a strong constraint on their phylogeny. The unavoidable random mutation events that proteins have undergone throughout their evolution have provided countless numbers of physicochemical experiments on folding landscapes. Thus, the evolutionary patterns of proteins found through comparative sequence analysis can be used to understand protein structure and energetics. In this paper, we compare the information content in the correlated changes that have occurred in protein sequences of common ancestry with energies from a transferable energy function to quantify the influence of maintaining foldability on molecular evolution.

Funneled Folding Landscapes from Evolution in Sequence Space

The key to our analysis is the principle of minimal frustration (3, 5), which states that, for quick and robust folding, the energy landscape of a protein must be dominated by interactions found in the native conformation. This native conformation is, therefore, separated by an energy gap from other compact structures that otherwise might act as kinetic traps (6, 7). These kinetic traps might appear on the folding landscape during evolution if a random mutation was to stabilize a conformation distinct from the functional one, leading to unviability. In this way, evolution and physical dynamics are coupled. A funneled, minimally frustrated landscape can be achieved if the sequence of the protein evolves to stabilize the native state while not increasing the landscape ruggedness.

If folding were the only physicochemical constraint on evolution, the ensemble of naturally observed sequences would correspond to the set of sequences that has a solvent-averaged free energy for the native conformation below a threshold set by the expected ground-state energy for a random sequence. Because sequence space is vast, the usual arguments showing the equivalence of microcanonical and canonical ensembles in statistical mechanics suggest that this evolutionary ensemble characterized by a threshold energy would be equivalent to a canonical distribution of sequences characterized by a Boltzmann probability: e(ΔE/kBTsel). This Boltzmann-like probability contains the energy gap between the folded configuration and the compact misfolded configurations along with an appropriate selection temperature (Tsel) (4, 810) quantifying how strong the folding constraints have been during evolution. Tsel is the apparent temperature at which sequences were selected by evolution for a particular protein family or fold. It does not correspond to a critical temperature in the laboratory but can, nonetheless, still be usefully compared with other measurable temperatures, such as the glass transition temperature and folding temperature. Of course, other constraints on molecular evolution exist, including the maintenance of the ability of a protein to bind to appropriate partners (11, 12), catalyze appropriate reactions as for the serine proteases with their famous catalytic triad (13, 14), undergo allosteric changes (15), and avoid aggregation (16). All of these factors potentially enter the quantitative statistical theory of molecular evolutionary outcomes.

Under the quasiequilibrium selection hypothesis based on folding energy alone, given the physical free energy function E, the probability of any given sequence having attained a given fold can be computed in principle. For a single structural family, finding this probability essentially corresponds with simulating a Potts model with the Potts variables representing the possible amino acid types being placed on the sites in a representative average native structure, because 3D native structure is largely conserved (9, 17). At the same time, various averages over the allowed natural sequence variation can also be found using mean field theory (9). Thus, from a sufficiently reliable physical energy function, one should be able to predict the distribution of amino acids and the local evolutionary sequence entropy at any given location in the protein. The covariance of the amino acids in the sequence at different loci also would follow from this Potts model (Fig. 1).

Fig. 1.

Fig. 1.

Shown is the structure of the repressor protein CI (32) (Uniprot identifier: RPC1_BP434). Side chains from two additional sequences from the protein family are superimposed on the template structure. The multiple sequence alignment of the three proteins is also shown. DCA is able to quantify the statistical couplings between coevolving residues, which tend to be large when the residues are in contact and weak but potentially significant between residues that are spatially distant. Here, two examples of pairs of coevolving residues are shown: one example where the residues are in direct contact (pink arrows) and one example where they are not (yellow arrows). These pairs of residues have the same color in the 3D structure diagram as they do on the alignment.

By comparing such predictions starting from a physical energy function with the same quantities inferred from observed phylogenetic sequence comparisons, the strength of the selection, as quantified by Tsel, can be found. A statistically more robust way of making the quantitative connection between the physical and evolutionary landscapes is to use inverse statistical mechanics algorithms to infer an information theoretic energy function for the sequence distribution specifically for a single structural family. Such an inferred energy function has been used to predict the common backbone structure of a protein family starting from multiple sequences (18, 19). The energy gaps between random globules and native states from the information theoretic single-family Hamiltonian can then be compared with the gaps for the same native structure that are found from the physical, transferable, free energy function that is ordinarily used for structure prediction.

If the transferable energy function used is sufficiently close to the actual free energy function under which the proteins have evolved, the gap comparison allows us to quantify the strength of the folding constraint. In some cases, we can also estimate Tsel in a different way by computing the mutational stability changes predicted by the information theoretic energy function and comparing the predicted changes directly with measured values. This alternate approach provides a consistency check on the results from the transferable predictive energy landscape, which can be used more widely.

Simple statistical models of protein energy landscapes have been very helpful in understanding folding (3, 20, 21). Similar statistical ideas can also be used to describe protein evolution under the assumption that thermodynamic stability of the native state is the dominant constraint (22). To make this paper self-contained, in SI Text, we review the corresponding statistical analyses of folding and evolution based on uncorrelated energy landscapes. (The energies of structurally related states of proteins are, in fact, correlated. The effects of adding pairwise correlations on the energy landscapes of random heteropolymers have been examined in ref. 23. The change in the estimate of the glass transition temperature on adding pairwise correlations was found to be small.) These analyses connect quantitatively the problems of evolving in sequence space on geological timescales and folding in configuration space on laboratory timescales, resulting in a relation between two physicochemical quantities—the folding temperature Tf and the glass transition temperature Tg—with the evolutionary effective temperature Tsel (22):

2TfTsel=1Tg2+1Tf2. [1]

The folding temperatures from the experiment along with selection temperatures obtained by comparing physical and information theoretic Hamiltonians allow us to obtain Tg in absolute units as well as the dimensionless ratio Tf/Tg. The physical model used in this study assumes that the effective interactions between amino acid residues are temperature-independent, an approximation that breaks down because of solvent effects (24). It has been suggested, therefore, that these temperatures might be usefully interpreted as effective interaction strengths (25). Because of the temperature-dependent nature of intraprotein interactions, the Tg values given here should be understood as measures of landscape ruggedness related to the trap/decoy energy rather than precise determinations of experimental glass transition temperatures. Likewise, the dimensionless ratio of the folding temperature Tf to the glass temperature Tg measures how funneled the landscape is, with high values corresponding to nearly ideal funnels. The evolutionary inferred ratios turn out to be fairly close to the values inferred earlier based on purely physical arguments that set up correspondences between three-letter code lattice models and real proteins by making use of experimental information about residual structure and dynamics in the molten globules of helical proteins (26). The coevolution-based analysis suggests that protein landscapes are actually somewhat more funneled than was originally inferred. In some cases, the ratio approaches the higher estimates for Tf/Tg arrived at by two distinct sets of physical arguments: one set by Kaya and Chan (27) is based on observed high cooperativity of calorimetric folding transitions, and the other set by Clementi and Plotkin (28) is based on matching observed folding kinetics.

Results

We studied eight different protein families (defined by Pfam) (29). Each of these contains more than 4,500 sequences. All have at least one experimentally determined structure. The protein lengths range from 60 to 286 aa. Each of the families represents a distinct tertiary structure. A list of the specific proteins considered and their respective families is provided in Table S1.

For each family of proteins, we use direct coupling analysis (DCA) to infer a global statistical model for sequences in that family. DCA takes as input a multiple sequence alignment of sequences belonging to a single-protein family. Using a maximum entropy approach, DCA infers an effective energy function consisting of single-site fields and pairwise couplings that is able to approximately reproduce the empirically observed single-site and pairwise amino acid frequencies from the input sequence alignment. This energy function can also be used to estimate the probability (PDCA) that an arbitrary sequence (not necessarily present in the input alignment) is part of the family. From this probability, a unitless energy can be defined by HDCA = log(PDCA). For the corresponding physical energy function, we use an estimate from a successful structure prediction model, E = HAWSEM; to be precise, we use the energy of the sequence in its native structure according to the associative memory, water-mediated, structure, and energy model (AWSEM) (30). Details of how these quantities are computed are in SI Text. We use the probabilities and energies of random sequences having the composition of natural proteins as a reference state to get E¯, and therefore, the selection temperature is obtained as a ratio of energy gaps from the two Hamiltonian values:

Tsel=HAWSEMnat+HAWSEMmgkBlog(PDCAnat/PDCAmg)=HAWSEMnat+HAWSEMmgkB(HDCAnatHDCAmg). [2]

The nat superscript indicates that the quantity is evaluated for native sequences, whereas the mg superscript indicates that quantities are evaluated for random (molten globule) sequences: HDCAnatHDCAmg=log(PDCAnat/PDCAmg). We then perform a linear least squares fit to the combined set of native and random ordered pairs (HDCA and HAWSEM) to find the slope of the line and thus, Tsel. This formulation gives a single value of Tsel for each protein family. The result of this analysis is shown for the PDZ family [protein tyrosine phosphatase; Protein Data Bank (PDB) ID code 1GM1] (31) in Fig. 2A. HAWSEM is plotted vs. HDCA for 26,099 sequences in the PDZ family as well as an equal number of random sequences having amino acid compositions typical of natural proteins.

Fig. 2.

Fig. 2.

(A) Correlation of HAWSEM and HDCA. The points corresponding to sequences in the PDZ family are shown in blue, and the points corresponding to an equal number of molten globule sequences are shown in red. The centers of the distributions are well-separated along both coordinates, indicating that both models are able to distinguish native sequences from molten globule sequences. The correlation coefficient between the two models is R = 0.924, indicating that, for these sets of sequences, the models are very well-correlated. The slope of this best fit line is −0.25, which corresponds to a selection temperature of 124 K. (B) Correlation between AWSEM and DCA Hamiltonian values for thermally occupied structures with different values of the fraction of native contacts formed Q, indicated by the color bar, from a molecular dynamics simulation of the Repressor protein CI (PDB ID code 1R69) (32) using the AWSEM potential. The two Hamiltonian values are highly correlated when evaluated over structures with a wide range of Q values.

The global correlation between the two landscapes, one landscape obtained from a transferable energy function useful for structure prediction (AWSEM) and the other landscape inferred from coevolutionary information for each family (DCA), is high (R = 0.924 for the PDZ family; R¯0.9, on average, across all eight protein families). The slope of the best fit line by Eq. 2 corresponds to a selection temperature of 124 K, well below the folding temperature of most proteins. For seven other protein families, the estimated selection temperatures are also well below physiological temperature. These values indicate that the landscapes have evolved to be quite funneled and that specificity of structure and not mere stability plays an important part in selection. If Tsel were to equal the folding temperature, the landscapes would be rugged, and we would be forced to say that energy gap selection played no role in evolution. We also compared the two Hamiltonian values as a function of the fraction of native contacts, Q, for partially folded structures from a folding simulation of the repressor CI protein (PDB ID code 1R69) (32) sampled using AWSEM. Fig. 2B shows that the two landscapes are also highly correlated for the thermal ensembles (R = 0.76).

When mutational stability data are available, Tsel can be found without using the transferable energy function by comparing the mutational stability predictions from DCA with experimental data for single-site mutations (ΔΔG). If we assume that the changes in the entropy and energy of the molten globule states are negligible for a single-point mutation, and assuming no residual structure in the denatured state, then scaling the energy change Δ(EE¯) to be equal to ΔΔGexp implies that Tsel=ΔΔG/kBΔHDCA. In this equation, ΔHDCA=HDCA(mutant)HDCA(WT). We can do this calculation for the PDZ family where experimental data exist and find Tsel = 116 K (Fig. S1). This estimated temperature agrees well with the value of 124 K obtained using the comparison with the transferable energy function, which we use for the other families.

Using the estimates for Tsel from the coevolutionary analysis for a family along with the experimental Tf for a member of each family, Eq. 1 yields estimates for Tg and thus, also Tf/Tg for typical family members. Fig. 3 summarizes the calculated Tsel and Tg values for all of the protein families studied here. Fig. S2 shows the dependence of the estimated value of Tsel on the distance threshold used to determine which pairwise interactions are to be summed in obtaining HDCA. Fig. S3 also shows the pairwise distance dependance of the mean energy of a DCA residue–residue coupling. Fig. 4, Upper displays the Tf/Tg ratios for all protein families calculated using Eq. 1 when a distance threshold of 16 Å is used in calculating the DCA energy. Fig. S4 shows how the ratio of Tf/Tg depends on this distance threshold. The resulting Tf/Tg values are in the range of previous purely physicochemical estimates. The previous estimates were based on generic considerations for all proteins, but this approach yields Tf/Tg values for individual protein families. Another quantity that can be used to quantify the degree of evolutionary sequence optimization is the ratio Tsel/Tg, which should be less than one for a funneled landscape. Pande et al. (21, 22) estimated Tsel/Tg by noting that the elements of the Miyazawa–Jernigan interaction matrix, being based on a quasichemical approximation, could be interpreted as pairwise interaction energies for pairs of amino acid types scaled by the selection temperature Tsel for all natural proteins considered as a single group (33). Combining this observation with the estimated entropy of the disordered collapsed globule inferred by Luthey-Schulten et al. (34) using the theory of secondary structure formation in globules, energy landscape theory arguments then give their estimated value of ∼ 0.85 for Tsel/Tg for the set of all natural proteins (21). This estimate for Tsel/Tg leads to an estimate of Tf/Tg = 1.6, quite close to the value obtained in the work by Onuchic et al. (26) on purely physical grounds without using sequence information. Both the estimates by Onuchic et al. (26) and Pande et al. (21, 22) for Tf/Tg turn out to be on the low end of the values found here for individual families.

Fig. 3.

Fig. 3.

Tsel, Tg, and Tf values in Kelvin for all protein families included in this study (denoted by the PDB ID codes of the representative structures used) are plotted vs. protein length. The names of the proteins and a list of references for the experimentally obtained Tf values are given in Table S1. The value of Tsel obtained by comparing stability changes predicted using DCA with experimental ΔΔG values directly is also given for the one family for which data are available (PDZ). In all cases, the experimental folding temperatures are above physiological temperature (∼310 K), whereas the glass transition and selection temperatures are well below physiological temperature, indicating that selection of protein sequences by evolution leads to funneled folding landscapes for natural proteins.

Fig. 4.

Fig. 4.

(Upper) Tf/Tg ratios for all protein families studied. (Lower) Tsel/Tg ratios for all protein families studied. The families are denoted by the PDB ID codes of the representative structures used, and the names of the proteins are available in Table S1. Tf/Tg is used to quantify the degree of funnelness of a folding landscape, with higher values corresponding to more ideal funnels. The estimated Tf/Tg ratios for all natural protein families studied here fall above the threshold for a landscape to be considered funneled, Tf/Tg = 1, which is plotted as a green dashed horizontal line. Several of the estimates are clustered around the value of Tf/Tg = 2.5 estimated by Clementi and Plotkin (28). Tsel/Tg is used to quantify the degree of evolutionary optimization, with lower values corresponding to more highly optimized sequences. Most of the Tsel/Tg ratios for individual families are below the generic estimate of Tsel/Tg = 0.85 given by Pande et al. (21).

The energy gap between the folded and unfolded states and the folding temperature Tf also allows an estimate of the entropy of the unfolded state using the first-order transition equation ΔF(Tf) = ΔE(Tf) − TfΔS(Tf) = 0. The resulting entropy per residue S(Tf)/N for each family is given in Table S2. Most of these entropy values fall into the range of 0.7–1.1 kB per residue, consistent with but a bit larger than the entropy estimates for the collapsed state by Luthey-Schulten et al. (34) that were used to give the original physical estimate for Tf/Tg ∼ 1.66. We see that these estimates of the entropy of the unfolded state using coevolutionary data agree quite well with the earlier numbers for the two all α-helical proteins [repressor protein CI (PDB ID code 1R69) (32) and Dnab Helicase (PDB ID code 1JWE) (35)] but do tend to be somewhat higher for families with structures containing β-secondary structure elements.

Discussion

The early attempt by Onuchic et al. (26) to quantify the funneled nature of the landscape set up a correspondence between the thermodynamics and dynamics of optimized two- and three-letter lattice model proteins and natural proteins. The Tf/Tg ratios found from coevolutionary analysis are higher than those first estimates. This difference suggests that evolution uses a (somewhat) more complex code than the three-letter coding that gave Tf/Tg ∼ 1.6. Clementi and Plotkin (28) arrived at another purely physics-based estimate for Tf/Tg by asking how much a structure-based folding model, with a perfect funnel landscape, could be perturbed by the addition of nonnative interactions but nevertheless, recapitulate experimental kinetics that are usually consistent with nearly perfectly funneled landscapes, which are known to be well-predicted based on the idealized pure funnel limit (27). By tuning the strength of the nonnative interactions and calculating the corresponding folding and glass transition temperatures for this worst tolerable case, Clementi and Plotkin (28) determined that a degree of frustration corresponding to Tf/Tg ∼ 2.5 would be a lower limit for maintaining consistency with the laboratory observations of kinetics of real proteins. Another estimate for Tf/Tg uses the fact that both theory and simulations agree that the degree of cooperativity in equilibrium folding depends on Tf/Tg. Noting this agreement and using experimental input about the sharpness of thermal unfolding, Kaya and Chan (27) estimated that the ratio Tf/Tg is probably greater than six for calorimetrically two-state proteins. These estimates, based on an optimized physical energy function and an information theoretic model for the global sequence probability derived from multiple sequence alignments, fall within the middle of the range of these previous physically based estimates.

DCA is a global statistical model for the sequences of a given protein family that allows the possibility of pairwise interactions between all residues in the protein, not just those pairs in physical contact in the native state. The correlation between experimental ΔΔG values with those predicted by DCA is best when interactions between residues separated by up to 16 Å in the native state are included (see Fig. S5). This distance is beyond the range of the mediated contacts used in AWSEM (9.5 Å). One possible explanation of this correlation from apparently long-range interactions is that DCA is not perfect in finding the true direct interactions, because it is based on statistical mechanical approximations and not an exact solution of the sequence Potts model, which is currently computationally intractable. At the same time, we must entertain the notion that these distant interactions are not artifacts but are real.

Several studies note that current force field-based methods for predicting ΔΔG on mutation using fixed backbones suffer from mediocre performance. We found that ΔHAWSEM, like other fixed backbone methods, correlates reasonably well but not perfectly with a large database of experimental ΔΔG data (Fig. S6). DCA is a fold-specific model of the energy and therefore, poised to detect forms of energy that are particular to the symmetry-broken native state, which much like a crystal responds to interstitials, can respond collectively to site mutations. Elastic effects coming from harmonic deviations of the structures of a particular protein from the mean family structure may, thus, be important. If so, predicting the effect of mutations on the relative stability of the folded and unfolded states starting from any fixed backbone structure will be inadequate. The limitations of the fixed backbone approximation in predicting the natural covariation of amino acids have recently been noted (36). The long-range interactions inferred from DCA may be relics of these elastic effects. If so, such elastic effects may be crucial to correct the prediction of the effects of mutation on protein stability when using even highly accurate coarse-grained potentials. It is also possible that DCA captures mutational changes of residual structure in the denatured state, a possibility neglected by the assumed complete mixing approximation for the unfolded compact states. All of these effects could potentially contribute to the high correlation between DCA and experimental ΔΔG data; comparisons of the correlations of both DCA (R = 0.84) and AWSEM (R = 0.73) with experimental ΔΔG data for the PDZ family are shown in Fig. S1.

We have shown that genomic data, accurate coarse-grained free energy functions, and family-specific information theoretic models can be combined to give consistent estimates of energy landscape characteristics of natural proteins. These estimates invariably indicate that the energy landscapes of natural foldable proteins are highly funneled. The degree of funneling found by these methods is consistent with previous estimates based on general physicochemical considerations. Comparing the details of the physical and information theoretic models has already suggested ways of improving the prediction of mutational effects on the stability of protein sequence/structure pairs. Knowing the degree of funneling of natural proteins will be helpful to protein design practitioners who wish to mimic natural proteins (37). Additional application, development, and comparison of physical and information theoretic models of protein energy landscapes will greatly enhance our understanding of these critical biological macromolecules and the part that folding physics has played in their evolutionary history.

Supplementary Material

Supporting Information

Acknowledgments

This research was supported by National Science Foundation INSPIRE Award MCB-1241332, National Institutes of Health Grant R01 GM44557, and the Center for Theoretical Biological Physics sponsored by National Science Foundation Grants PHY-1427654 and MCB-1214457 and the Cancer Prevention and Research Institute of Texas. Additional support was also provided by the D. R. Bullard-Welch Chair at Rice University.

Footnotes

The authors declare no conflict of interest.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1413575111/-/DCSupplemental.

References

  • 1.Bornberg-Bauer E, Chan HS. Modeling evolutionary landscapes: Mutational stability, topology, and superfunnels in sequence space. Proc Natl Acad Sci USA. 1999;96(19):10689–10694. doi: 10.1073/pnas.96.19.10689. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Zeldovich KB, Shakhnovich EI. Understanding protein evolution: From protein physics to Darwinian selection. Annu Rev Phys Chem. 2008;59(2008):105–127. doi: 10.1146/annurev.physchem.58.032806.104449. [DOI] [PubMed] [Google Scholar]
  • 3.Bryngelson JD, Wolynes PG. Spin glasses and the statistical mechanics of protein folding. Proc Natl Acad Sci USA. 1987;84(21):7524–7528. doi: 10.1073/pnas.84.21.7524. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Ramanathan S, Shakhnovich E. Statistical mechanics of proteins with evolutionary selected sequences. Phys Rev E Stat Phys Plasmas Fluids Relat Interdiscip Topics. 1994;50(2):1303–1312. doi: 10.1103/physreve.50.1303. [DOI] [PubMed] [Google Scholar]
  • 5.Wolynes PG, Onuchic JN, Thirumalai D. Navigating the folding routes. Science. 1995;267(5204):1619–1620. doi: 10.1126/science.7886447. [DOI] [PubMed] [Google Scholar]
  • 6.Mirny LA, Abkevich V, Shakhnovich EI. Universality and diversity of the protein folding scenarios: A comprehensive analysis with the aid of a lattice model. Fold Des. 1996;1(2):103–116. doi: 10.1016/S1359-0278(96)00019-3. [DOI] [PubMed] [Google Scholar]
  • 7.Onuchic JN, Wolynes PG. Theory of protein folding. Curr Opin Struct Biol. 2004;14(1):70–75. doi: 10.1016/j.sbi.2004.01.009. [DOI] [PubMed] [Google Scholar]
  • 8.Finkelstein AV, Badretdinov AY, Gutin AM. Why do protein architectures have boltzmann-like statistics? Proteins: Struct Funct Bioinform. 1995;23(2):142–150. doi: 10.1002/prot.340230204. [DOI] [PubMed] [Google Scholar]
  • 9.Saven JG, Wolynes PG. Statistical mechanics of the combinatorial synthesis and analysis of folding macromolecules. J Phys Chem B. 1997;101(41):8375–8389. [Google Scholar]
  • 10.Meyerguz L, Grasso C, Kleinberg J, Elber R. Computational analysis of sequence selection mechanisms. Structure. 2004;12(4):547–557. doi: 10.1016/j.str.2004.02.018. [DOI] [PubMed] [Google Scholar]
  • 11.Mintseris J, Weng Z. Structure, function, and evolution of transient and obligate protein–protein interactions. Proc Natl Acad Sci USA. 2005;102(31):10930–10935. doi: 10.1073/pnas.0502667102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Lovell SC, Robertson DL. An integrated view of molecular coevolution in protein–protein interactions. Mol Biol Evol. 2010;27(11):2567–2575. doi: 10.1093/molbev/msq144. [DOI] [PubMed] [Google Scholar]
  • 13.Fersht A. Structure and Mechanism in Protein Science: A Guide to Enzyme Catalysis and Protein Folding. New York: Macmillan; 1999. [Google Scholar]
  • 14.Yomo T, Saito S, Sasai M. Gradual development of protein-like global structures through functional selection. Nat Struct Mol Biol. 1999;6(8):743–746. doi: 10.1038/11512. [DOI] [PubMed] [Google Scholar]
  • 15.Süel GM, Lockless SW, Wall MA, Ranganathan R. Evolutionarily conserved networks of residues mediate allosteric communication in proteins. Nat Struct Mol Biol. 2003;10(1):59–69. doi: 10.1038/nsb881. [DOI] [PubMed] [Google Scholar]
  • 16.Monsellier E, Chiti F. Prevention of amyloid-like aggregation as a driving force of protein evolution. EMBO Rep. 2007;8(8):737–742. doi: 10.1038/sj.embor.7401034. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Chothia C, Lesk AM. The relation between the divergence of sequence and structure in proteins. EMBO J. 1986;5(4):823–826. doi: 10.1002/j.1460-2075.1986.tb04288.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Sulkowska JI, Morcos F, Weigt M, Hwa T, Onuchic JN. Genomics-aided structure prediction. Proc Natl Acad Sci USA. 2012;109(26):10340–10345. doi: 10.1073/pnas.1207864109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Marks DS, Hopf TA, Sander C. Protein structure prediction from sequence variation. Nat Biotechnol. 2012;30(11):1072–1080. doi: 10.1038/nbt.2419. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Plotkin SS, Wang J, Wolynes PG. Statistical mechanics of a correlated energy landscape model for protein folding funnels. J Chem Phys. 1997;106(7):2932–2948. [Google Scholar]
  • 21.Pande VS, Grosberg AY, Tanaka T. Heteropolymer freezing and design: Towards physical models of protein folding. Rev Mod Phys. 2000;72(1):259–314. [Google Scholar]
  • 22.Pande VS, Grosberg AY, Tanaka T. Statistical mechanics of simple models of protein folding and design. Biophys J. 1997;73(6):3192–3210. doi: 10.1016/S0006-3495(97)78345-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Plotkin SS, Wang J, Wolynes PG. Correlated energy landscape model for finite, random heteropolymers. Phys Rev E Stat Phys Plasmas Fluids Relat Interdiscip Topics. 1996;53(6):6271–6296. doi: 10.1103/physreve.53.6271. [DOI] [PubMed] [Google Scholar]
  • 24.Shimizu S, Chan HS. Temperature dependence of hydrophobic interactions: A mean force perspective, effects of water density, and nonadditivity of thermodynamic signatures. J Chem Phys. 2000;113(11):4683–4700. [Google Scholar]
  • 25.Chan HS, Zhang Z, Wallin S, Liu Z. Cooperativity, local-nonlocal coupling, and nonnative interactions: Principles of protein folding from coarse-grained models. Annu Rev Phys Chem. 2011;62(2011):301–326. doi: 10.1146/annurev-physchem-032210-103405. [DOI] [PubMed] [Google Scholar]
  • 26.Onuchic JN, Wolynes PG, Lutheyschulten Z, Socci ND. Toward an outline of the topography of a realistic protein-folding funnel. Proc Natl Acad Sci USA. 1995;92(8):3626–3630. doi: 10.1073/pnas.92.8.3626. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Kaya H, Chan HS. Polymer principles of protein calorimetric two-state cooperativity. Proteins. 2000;40(4):637–661. doi: 10.1002/1097-0134(20000901)40:4<637::aid-prot80>3.0.co;2-4. [DOI] [PubMed] [Google Scholar]
  • 28.Clementi C, Plotkin SS. The effects of nonnative interactions on protein folding rates: Theory and simulation. Protein Sci. 2004;13(7):1750–1766. doi: 10.1110/ps.03580104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Finn RD, et al. The pfam protein families database. Nucleic Acids Res. 2010;38(Database issue):D211–D222. doi: 10.1093/nar/gkp985. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Davtyan A, et al. AWSEM-MD: Protein structure prediction using coarse-grained physical potentials and bioinformatically based local structure biasing. J Phys Chem B. 2012;116(29):8494–8503. doi: 10.1021/jp212541y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Walma T, et al. Structure, dynamics and binding characteristics of the second pdz domain of ptp-bl. J Mol Biol. 2002;316(5):1101–1110. doi: 10.1006/jmbi.2002.5402. [DOI] [PubMed] [Google Scholar]
  • 32.Mondragn A, Subbiah S, Almo SC, Drottar M, Harrison SC. Structure of the amino-terminal domain of phage 434 repressor at 2.0 a resolution. J Mol Biol. 1989;205(1):189–200. doi: 10.1016/0022-2836(89)90375-6. [DOI] [PubMed] [Google Scholar]
  • 33.Miyazawa S, Jernigan RL. Estimation of effective interresidue contact energies from protein crystal structures: Quasi-chemical approximation. Macromolecules. 1985;18(3):534–552. [Google Scholar]
  • 34.Luthey-Schulten Z, Ramirez BE, Wolynes PG. Helix-coil, liquid crystal, and spin glass transitions of a collapsed heteropolymer. J Phys Chem. 1995;99(7):2177–2185. [Google Scholar]
  • 35.Weigelt J, Brown SE, Miles CS, Dixon NE, Otting G. NMR structure of the n-terminal domain of e. coli dnab helicase: Implications for structure rearrangements in the helicase hexamer. Structure. 1999;7(6):681–690. doi: 10.1016/s0969-2126(99)80089-6. [DOI] [PubMed] [Google Scholar]
  • 36.Ollikainen N, Kortemme T. Computational protein design quantifies structural constraints on amino acid covariation. PLoS Comput Biol. 2013;9(11):e1003313. doi: 10.1371/journal.pcbi.1003313. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Shakhnovich EI, Gutin AM. A new approach to the design of stable proteins. Protein Eng. 1993;6(8):793–800. doi: 10.1093/protein/6.8.793. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supporting Information

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES