Skip to main content
Protein Science : A Publication of the Protein Society logoLink to Protein Science : A Publication of the Protein Society
. 2015 Aug 10;25(1):111–122. doi: 10.1002/pro.2758

Constructing sequence‐dependent protein models using coevolutionary information

Ryan R Cheng 1,, Mohit Raghunathan 1,2,, Jeffrey K Noel 1,2, José N Onuchic 1,2,
Editors: Carol B Post, Charles L Brooks III
PMCID: PMC4815312  PMID: 26223372

Abstract

Recent developments in global statistical methodologies have advanced the analysis of large collections of protein sequences for coevolutionary information. Coevolution between amino acids in a protein arises from compensatory mutations that are needed to maintain the stability or function of a protein over the course of evolution. This gives rise to quantifiable correlations between amino acid sites within the multiple sequence alignment of a protein family. Here, we use the maximum entropy‐based approach called mean field Direct Coupling Analysis (mfDCA) to infer a Potts model Hamiltonian governing the correlated mutations in a protein family. We use the inferred pairwise statistical couplings to generate the sequence‐dependent heterogeneous interaction energies of a structure‐based model (SBM) where only native contacts are considered. Considering the ribosomal S6 protein and its circular permutants as well as the SH3 protein, we demonstrate that these models quantitatively agree with experimental data on folding mechanisms. This work serves as a new framework for generating coevolutionary data‐enriched models that can potentially be used to engineer key functional motions and novel interactions in protein systems.

Keywords: coarse‐grained protein models, coevolutionary information, statistical inference, computational biophysics


Abbreviations

SBM

structure‐based models

DCA

Direct Coupling Analysis

mfDCA

mean field Direct Coupling Analysis

MSA

multiple sequence alignment

WHAM

weighted histogram analysis method

FEP

free energy perturbation

TSE

transition state ensemble.

Introduction

Early work applying the statistical mechanics of spin‐glasses to proteins formulated the foundation for the theory of protein folding.1 Subsequent advances led to the development of energy landscape theory of protein folding2, 3 and the modern view that proteins fold as the ensemble of accessible structures is funneled by the underlying energy landscape into a unique native structure. A consequence of this theory is the view that naturally selected proteins are minimally frustrated.

This theoretical picture of protein folding has led to the development of idealized, minimally frustrated protein models called structure‐based models (SBM)4, 5, 6 for studying protein folding and function.7 Consistent with earlier models,8, 9 SBMs encode structural information into the Hamiltonian of a protein model by conceptualizing that the dominant interactions consist only of the interactions found in the native structure. Furthermore, SBMs typically assume that all native contacts are stabilizing as well as equal in their strength (homogeneous). While examples exist where non‐native interactions may play a significant role in folding,10 SBMs are able to capture the highly cooperative nature of the folding transition and are especially successful for studying the folding of proteins where topological effects and entropic barriers dictate the folding mechanisms.6 However, these models cannot systematically capture the effect of stabilizing mutations while destabilizing mutations can only be addressed through coarse approximations. One could mimic the effect of a destabilizing mutation to a particular residue by deleting the stabilizing contacts that it forms in the native state, but this coarse approximation is unable to distinguish between varying degrees of destabilization.

In principle, all‐atom representations of proteins with explicit solvent interactions can capture the effects of mutational stabilization or destabilization (sequence level effects) but are too computationally expensive to explore underlying energy landscapes and explore multiple folding and unfolding transitions. Coarse‐grained optimized transferable potentials11, 12, 13, 14 is one prominent method for exploring sequence‐dependent effects in proteins. Likewise, it has been shown that experimental mutational changes in stability15 or the native‐basin fluctuations in an all‐atom implicit solvent simulation16 can be used as constraints to obtain coarse‐grained protein models with heterogeneous interaction energies. Here, we adopt a novel approach using only sequence and structural data, where we consider the feasibility of constructing a sequence‐dependent coarse‐grained protein model by using sequence data to supplement an SBM.

Recent developments in global statistical inference using maximum entropy modeling have led to a number of advances, particularly in the areas of protein structure prediction (see reviews17, 18). Maximum entropy modeling has also been applied to a diverse range of topics such as drug resistance,19, 20 evolutionary fitness,21 neural networks,22 self‐driven particles,23 and bacterial signaling systems.24, 25, 26, 27, 28 The development of Direct Coupling Analysis (DCA)27, 29, 30 has advanced the study of coevolutionary data by inferring the underlying Potts model Hamiltonian that governs the correlated mutations in a protein family. In particular, the mean field approach (mfDCA)29 makes use of an analytical approximation to perform the inference procedure through a single computationally inexpensive step. It has been demonstrated that the inferred couplings from DCA are highly correlated with experimental mutational changes in protein stability31, 32, 33 and physical protein–protein interaction mechanisms,26 suggesting a direct relationship between the statistical couplings and the pairwise interaction energies of a realistic protein model. Furthermore, the inferred coevolutionary information has allowed for the quantification of the degree that evolved proteins are minimally frustrated,32 consistent with earlier theoretical estimates34, 35 (see review36). Motivated by these findings, our goal is to construct an SBM where the statistical couplings from mfDCA are used to describe the strength of native contact interactions.

Earlier work involved the enrichment of SBMs with coevolutionary data that encodes the functional conformations of a protein37 or complex.38 These studies identified the strongest co‐evolving pairs of sites in a protein with a metric called Direct Information29 and incorporated them into the Hamiltonian of an SBM as homogeneous, stabilizing contacts. Here, we supplement an SBM that is coarse‐grained on the Cα level (i.e., one bead per residue) and adopt the strength of our native contacts from the inferred Potts model couplings of mfDCA, i.e., Jij (Ai, Aj) which depend on positions i and j in a protein and the amino acids at those positions, Ai and Aj, respectively. We enforce that the sum of inferred couplings for all native contacts sum to the total energetic stabilization of native contacts in a homogeneous SBM (i.e., , where N is the number of native contacts and ε is the mean contact energy strength in the SBM). A natural way of incorporating these heterogeneous couplings into an SBM is by linearly mixing them with the interaction energies of a homogeneous SBM with a mixing parameter χ. Our mixing condition interpolates between χ = 0 (fully homogeneous) and χ = 1 (fully heterogeneous), allowing χ to control the standard deviation of the energetic heterogeneity while enforcing a constant mean strength of native contact.39 Additional details of our model are discussed in the Materials and Methods section.

We focus on two well‐studied protein systems: Ribosomal S6, for which energetic heterogeneity plays a significant role in its folding mechanism,15, 40, 41 and SH3, for which energetic heterogeneity is secondary to geometry in dictating the folding mechanism.6, 42 We construct DCA‐enriched SBMs and explore them using molecular dynamics simulations that sample many folding and unfolding transitions at the folding temperature, Tf. We compare our simulation results with experimental data characterizing the folding mechanisms, namely the so‐called Φ‐value analysis,43, 44 which characterizes the transition state ensemble through mutational changes in stability. We find that increasing the weight of heterogeneous interactions (increasing χ) tends to improve the quantitative agreement of our models with experimental data on folding mechanisms. However, increasing χ also coincides with a loss of co‐operativity as well as the disappearance of the free energy barrier separating unfolded and folded states, which is consistent with earlier work on SBMs with heterogeneous contacts39 and theory.45 The general feature of reduced co‐operativity in Cα‐based SBMs has previously been observed even for SBMs with homogeneous contact strengths,46 which can potentially be recovered through the incorporation of, for example, barriers associated with the removal of water to bring hydrophobic residues together.47, 48, 49 For simplicity, we did not consider desolvation barriers and chose to focus on supplementing a traditional SBM, and hence, we were not able to explore models approaching χ = 1 and focus on models constructed in the vicinity of χ = 0.5 as a matter of practicality. Despite its simplicity, the class of SBMs that we introduce serves as a potential framework for the engineering of proteins. By building a global statistical model from large collections of sequence data, one could identify mutations that strengthen or weaken desired interactions in a protein model.

Materials and Methods

Aligned sequences for protein families

We obtained the multiple sequence alignments (MSA) from Pfam50 (version 27) for the protein families that were studied: Ribosomal S6 (PF01250) and SH3 (PF00018). All residue inserts were removed from the data sets such that the aforementioned families have fixed lengths of L = 92 and L = 48, respectively.

Direct coupling analysis (DCA)

We infer the underlying Potts model Hamiltonian that governs the correlated mutations in a particular protein family,

sequence=(A1,A2,...,AL)H(sequence)=1i<jLJij(Ai,Aj)i=1Lhi(Ai) (1)

where L is the length of the MSA, Ai is the amino acid at site i for a sequence in the MSA, Jij(Ai, Aj) is the pairwise statistical couplings between sites i and j in the MSA with amino acids Ai and Aj, respectively, and hi(Ai) is the local field for site i . We perform the inference using the maximum entropy‐based approach called mean field Direct Coupling Analysis (mfDCA),29 for which it has been shown that the pairwise couplings are approximately equal to the inverse of the connected correlation matrix, C:

Jij(Ai,Aj)Cij1(Ai,Aj)C=Pij(Ai,Aj)Pi(Ai)Pj(Aj) (2)

where Pij(Ai, Aj) and Pi(Ai) are the pairwise and single‐site amino acid distributions in the MSA. The local fields are obtained from self‐consistent equations.29 While higher order corrections to Eq. (2) are known, such as the Thouless–Anderson–Palmer solution,51 the inference method of Eq. (2) was used due to its simplicity and its quantitative agreement with mutational changes in protein stability.31, 32, 33

Solutions to this type of inference problem exhibit a gauge freedom associated with being able to add energy to the inferred fields that can be subtracted from the couplings to maintain a constant H as well as shifting the reference energy of the model by adding a constant to H. Here, we fix the gauge by adopting the zero‐sum condition:

Ai=1qJij(Ai,Aj)=Aj=1qJij(Ai,Aj)=Ai=1qhi(Ai)=0 (3)

where the summations are over all choices of amino acids at sites i or j, respectively, which are indexed from 1 to q = 21 for the 20 amino acids and the MSA gap. The zero‐sum condition was adopted because it ensures that distribution of H for the ensemble of random sequences has a mean of 0. Hence, Eq. (1) under our gauge choice is set with respect to the energy of random sequences. Furthermore, we only use the inferred couplings from the Potts model to construct our data‐enriched SBM and we do not use the inferred local fields. It should be noted that this gauge choice does not mean that the average statistical coupling, Jij(Ai, Aj), among all naturally occurring pairs of amino acids at positions i and j has a mean of zero. Coevolving amino acid combinations in naturally occurring sequences will have a positive statistical coupling while combinations of amino acids not realized in nature at those positions will be penalized (negative). As a result of using the zero‐sum condition, realistic protein sequences will always have a net stabilizing (positive) sum of statistical couplings (see following subsection for more details on how the statistical couplings are used to construct a protein model).

Coevolutionary data‐enriched structure‐based models

We obtained a representative structure from PDB52 of the native state conformation of ribosomal S6 (PDB: 1RIS53) and SH3 (PDB: 1FMK54), respectively. Structures of the circular permutants of S6 were modeled from the wild‐type S6 structure. This was done by cutting one of the 5 loop regions of the 1RIS structure (i.e., between residues 13–14, 33–34, 54–55, 68–69, and 81–82) and connecting the N and C termini to create the permutants P13–14, P33–34, P54–55, P68–69, and P81–82, respectively. Using the native structures, we constructed protein models consisting of one bead per residue by treating all the Cα atoms as beads and using their positions to encode the native topology in the angles, dihedrals, and bonded terms as from a traditional Cα‐based SBM6 from the SMOG server (http://smog-server.org).5

The nonbonded native contacts were determined from the representative structure by considering all residue–residue heavy atom contacts within a 6 Å cutoff distance. The strength of the native contact between positions i and j in a protein is determined by linearly interpolating between the homogeneous and heterogeneous models that are always enforced to have a mean strength of ε:

εij(Ai,Aj)=(1χ)ε+χεJij(Ai,Aj)Jij (4A)
Jij=1N(i,j)native contactsJij(Ai,Aj) (4B)

where ε is the strength of a contact in the homogeneous SBM, χ is the parameter that interpolates between χ = 0 (homogeneous) and χ = 1 (heterogeneous), Jij is the pairwise statistical coupling between positions i and j in the MSA [Eq. (2)], Jij is the average of all of the sequence‐specific statistical couplings for all residue pairs that are native contacts in a representative structure, and N is the number of native contacts. Note that positions i and j in Eq. (4) correspond to positions in the protein structure, whereas i and j in Eqs. (1), (2), (3) correspond to positions in the MSA. Hence, a mapping from MSA positions to protein positions must be performed when going from Eqs. (1), (2), (3) to Eq. (4). Equation (4) guarantees that the sum of the nonbonded interaction strengths is equal to a fixed value, i.e., where N is the number of native contacts. Furthermore, this mixing condition extrapolates between an SBM with fully homogeneous native contacts with strength ε at χ = 0 to a heterogeneous SBM with weights determined by the DCA couplings at χ = 1. When εij [Eq. (4)] is positive, the distance dependence of the contact is described using a Gaussian contact potential55 of the form:

U(rij)=εij[(1+1εij(σexclrij)12)(1exp((rijr0ij)22σ2))1] (5)

where σ excl = 4 Å, rij is the distance between Cα beads that are native contacts, r0ij is the native distance between Cα beads that are native contacts, and σ = 0.5 Å. However, when εij [Eq. (4)] is negative, a destabilizing Tanh function is used:

U(rij)=12|εij|(tanh(r0ijrij+σtanhσtanh)+1)+(σexclrij)12 (6)

where σ tanh = 0.5 Å, σ excl = 4 Å, and r0ij is the native distance between Cα beads that are native contacts. A representative plot of Eq. (5) and Eq. ( 6) is shown in Supporting Information, Figure S1.

In principle, the incorporation of the local fields into the SBM would offer additional energetic stabilization related to the generic collapse of the protein chain. One could uniformly distribute the fields for residues i and j into the statistical couplings for all native contacts formed by residues i and j—i.e., Jij*(a,b)=Jij(a,b)+hi(a)/Ni+hj(b)/Nj, where Ni and Nj are the number of native contacts formed by i and j, respectively. However, such a model is not currently well understood and we chose to exclude the local fields from our coevolution‐enriched model. Hence, we focused solely on augmenting an SBM with the inferred pairwise statistical couplings, which can naturally be incorporated into an SBM through the pairwise, nonbonded interactions.

It should be noted that for our model of SH3, the first three residues (β1) and the last six residues (α1–β5 loop and β5) were not aligned to the MSA of SH3. Native contact energies involving these regions were obtained using Eq. (4) by using the average couplings strength inferred for β‐sheet contacts in those regions in lieu of Jij.

Simulations and computation of Φ‐values

The GROMACS software package56 was used to simulate our data‐enriched SBMs. We used the stochastic dynamics integrator with an inverse friction constant of 1.0 reduced time units to run many constant temperature simulations at and in the vicinity of the folding temperature, Tf. A table of folding temperatures for all the models that were constructed can be found in Supporting Information, Table S1. Typical simulations were run for ∼2.5 × 105 reduced time units to sample many folding and unfolding transitions. The weighted histogram analysis method57 (WHAM) was used to combine trajectories at different temperatures in order to compute an optimal density of states and relevant thermodynamic quantities. The unit of energy in our model, ε, is set by kBT = 1 = ε.

Energetic effects upon mutation are naturally incorporated into our model since the strength of a native contact [Eq. (4)] can be obtained for any amino acid combination. This is possible because the statistical couplings [Eq. (2)] are obtained from the global statistical model of the sequence data. Thus, a mutation to site i would change all the strengths (e.g., stabilize or destabilize) of native contacts involving site i, as described by Eq. (4).

Simulation Φ‐values were computed using free energy perturbation (FEP).58

Assuming that a mutation would only act as a small perturbation on the density of states of a model, we used our existing simulation trajectories for our models of the wild‐type proteins to compute the thermally averaged mutational changes in energy over the unfolded (U), folded (N), and transition state (TS) ensembles of our wild‐type models. Thus, to compare with experimental Φ‐values, we compute an FEP Φ‐value as

Φi(χ=X)=ΔGTSΔGUΔGNΔGU, (7a)
βΔGU,TS,orN=lnexp(βΔE)U,TS,orNw.t. (7b)

where 0 ≤ X ≤ 1, β = (kBT)−1, ΔE is the mutational change in the potential energy, and ΔG U,TS, or N is the mutational change in free energy that is computed over the U, N, or TS ensemble of the wild‐type simulations. To compute the Φi(χ=0) (homogeneous case), we interpreted the destabilizing mutations as being the deletion of all native contacts involved with residue i. We interpret the Φ‐values predicted by our models by taking the average over all positions explored by experiment that are on the same secondary structural element, i.e., Φ(element)(χ=X) and Φ(element)(experiment) for χ‐dependent models and experiment, respectively. While we find that our data‐enriched models offer an improvement to the agreement between our simulation Φ‐values and those from experiment even at the residue level, we find that the predictive capability of our methodology is best suited for capturing the folding mechanism at the resolution of secondary structural elements. By averaging over structural elements, our simulation Φ‐values are more insensitive to deviations associated with the accuracy of our model. Although, it should be noted that systematic improvements would potentially offer further agreement at the residue level (see Discussion). Further, we focused on simulation Φ‐values within the range of 0–1 because Φ < 0 or Φ > 1 are not well understood in the context of native contact‐based SBMs, and hence, are not included in the average.

Results

Ribosomal S6 and circular permutants

The protein folding mechanism of ribosomal S6 has been extensively characterized by experiment40, 41, 59, 60 and theory and simulation.15, 46, 48, 61, 62, 63 This includes the extensive characterization of the circular permutants by experiment.40, 41, 60 The circular permutants are constructed to alter the connectivity of the S6 protein while maintaining the overall fold (see coevolutionary data‐enriched structure‐based models for more details). This procedure changes the loop entropy associated with folding and, hence, can change folding mechanisms. A common theme of these studies is to highlight the significant role that energetic heterogeneity plays in the folding of S6.15, 40, 41, 63 Specifically, it is proposed that long‐loop‐length native contacts have evolved to be stronger on average than short‐loop‐length native contacts in order to promote cooperativity. Due to the breadth of literature and the important role of energetic heterogeneity, studies of the folding mechanism of S6 and its permutants offer a stringent test to the predictions of our statistical‐data‐driven models.

By inferring statistical couplings for the protein family, we constructed heterogeneous SBMs for wild‐type S6 at different values of the mixing parameter χ. The distribution of inferred couplings is shown in Figure 1(A), where favorable couplings are positive. Enforcing Eq. (4), we guarantee that the distribution of energetic strengths for the native contacts of our model has a constant mean (i.e., +1ε), which is shown in Figure 1(B) for χ = 0 and χ = 1. It should further be noted that the distributions in Figure 1(B) for χ = 0 and χ = 1 have standard deviations of 0 and 1.82, respectively, and the standard deviation for intermediate values of χ can be obtained via linear interpolation. Shown in Figure 1(C) are the contact energy strengths for χ = 0 and χ = 0.5 projected onto the native contact map.

Figure 1.

Figure 1

Native contact energies for S6 inferred from DCA couplings. (A) The distribution of inferred couplings for all native contacts of S6—i.e., 310 contacts. (B) The distribution of native contact energetic strengths for S6 at χ = 0 and χ = 1. Due to Eq. (4), both distributions are enforced to have the same mean strength—i.e., +1ε. (C) Contact energy strengths projected onto the native contact map of S6 for χ = 0 and χ = 0.5.

χ > 0 relatively strengthens the β1–β3 and β1–β4 contacts in our model, which is consistent with a model that was optimized to reproduce experimental mutational changes in stability.15 Furthermore, increased heterogeneity tended to add destabilizing contacts primarily into the β2–β3 contacts, the β2–β3 loop contacts with β3, and the α2–α2 contacts. Destabilization of β2–β3 loop contacts is consistent with a simulation study where the β2–β3 loop contacts are deleted from an SBM, improving the agreement of the SBM with experiment.62 Further increasing χ beyond χ = 0.5 simply increases the standard deviation of the contact energies via linear interpolation given by Eq. (4), e.g., strengthening the stabilizing and destabilizing contacts.

By characterizing the thermodynamics of our models, we obtained F(Q)/kBT at Tf, which is the free energy at the folding temperature as a function of the fraction of native contacts formed, Q, for each model defined by χ. Consistent with earlier work,39, 45 we find that incorporating more energetic heterogeneity generally decreases the free energy barrier to folding due to the loss of co‐operativity (Supporting Information, Figure S2). Shown in Figure 2 is F(Q)/kBT at Tf for χ = 0 and χ = 0.5. The thermodynamic order of folding (i.e., mean order of folding) of various structural groups are consistent with experiment for both χ = 0 and χ = 0.5.40 Namely, we observe for all of our models that foldon 1 consisting of α1–β1–β3 packing contacts (Fig. 2: dashed and purple) forms before foldon 2 consisting of α2–β1–β4 packing contacts (Fig. 2: dashed and grey). However, the incorporation of heterogeneity deemphasizes the role of the β2–β3 contacts (Fig. 2: solid red) in the formation of the transition state ensemble, which is in direct contrast to the homogeneous model (χ = 0) where the β2–β3 contacts quickly become structured after leaving the unfolded state ensemble.

Figure 2.

Figure 2

Thermodynamic order of folding for models of S6. (A) A cartoon depiction of the structural arrangement of the S6 protein. The labeled rectangles represent β‐sheets, circles represent α‐helices, and black lines represent loop regions. The colored bars represent native contacts between β‐sheets while the colored circles represent native contacts within an α‐helical structural motif. (B) The dashed ellipses represent tertiary structural contacts. The purple, grey, and orange ellipses highlight the packing contacts between α1–β3–β1 (foldon 1), α2–β1–β4 (foldon 2), and α1–α2 (helical contacts), respectively. (C) The free energy at the folding temperature is plotted as a function of Q, the faction of native contacts formed, for χ = 0 and χ = 0.5. The plot of the fraction of native contacts formed by a particular structural element, Q group, is also plotted as a function of Q for each of the secondary structural elements in (A) and tertiary structural elements in (B) while matching colors and dashing. The shaded regions reflect our definitions of the unfolded, transition state, and folded ensembles—i.e., +0.3k B T from the minimum of the free energy for the unfolded and folded states and −0.3k B T from the top of the transition state barrier, respectively.

We next compare our models with experimental data characterizing the transition state ensemble (TSE) of S6 and its circular permutants through Φ‐value analysis.40 Φ‐values are obtained for single residue mutants and attempt to quantify the degree of formation of a residue's native contacts in the TSE. Circular permutants of S6 are constructed by connecting the N and C termini with a linker and cutting one of the structural loop regions, thereby changing the connectivity of the protein. While the permutants have essentially the same native state structure as the S6 protein, changing the connectivity of the protein changes the entropic barriers associated with folding and, thus, changes the folding mechanism. This large set of Φ‐values provides a stringent test of whether our incorporation of energetic heterogeneity can capture folding mechanism better than traditional SBMs (χ = 0).

Plotting Φ(element)(χ=0) and Φ(element)(χ=0.5) vs Φ(element)(experiment) shows that the incorporation of energetic heterogeneity greatly improves the agreement between simulation and experiment (Fig. 3). For χ = 0, the data exhibits a Pearson correlation of 0.21 with experiment with a slope of 0.26 and intercept of 0.20. On the other hand, χ = 0.5 exhibits a Pearson correlation of 0.64 with the data with a slope of 0.71 and an intercept of 0.01. We were not able to explore whether there was any systematic improvement with experiment at higher values of χ since χ > 0.5 resulted in the free energy barrier falling below 1 kBT for all of the permutant models making the TSE hard to define (Supporting Information, Figure S2). While Φ(element) averages the residue‐level Φ‐values over a secondary structure element, it should be noted that χ = 0.5 also greatly improves the agreement between the residue‐level Φ‐values (Supporting Information, Figure S3). This further demonstrates that the models capture coarse features of the folding mechanism of S6 and its permutants.

Figure 3.

Figure 3

Comparison of averaged Φ‐values for S6 and circular permutants. Φ‐values averaged over secondary structural element compared between experiment and simulation for χ = 0 (homogeneous) and χ = 0.5. The solid black line represents the diagonal through zero (i.e., Φelement(χ)=Φelement(experiment)). The colored circles represent S6 (black), P13–14 (red), P33–34 (green), P54–55 (blue), P68–69 (orange), and P81–82 (brown). For χ = 0, the data exhibited a Pearson correlation of 0.21 and a fit (dashed, grey line) with an intercept of 0.20 and slope of 0.26. For χ = 0.5, the data exhibited a Pearson correlation of 0.64 and a fit (dashed, grey line) with an intercept of 0.01 and slope of 0.71. The Φ‐values for the α2 structural element in P13–14 were not included in the comparison because their experimentally determined Φ‐values were anomalously large and excluded in the experimental analysis.40

SH3

Extensive experimental42, 64, 65, 66 and computational6, 63, 64, 66, 67 work has focused on the folding mechanism of SH3. It is generally understood that SH3 folds via a polarized transition state that is dictated by its native geometry since structural homologs of SH3 that differ in their sequence have been found to fold with the same mechanism.42 Furthermore, homogeneous SBMs quantitatively capture the folding mechanism of SH3.6, 67 Thus, SH3 provides a consistency check for our data‐driven models to see if the incorporation of energetic heterogeneity results in an SBM that remains consistent with experimental data.

Similar to S6, we inferred the statistical couplings for the SH3 family using mfDCA and constructed models of SH3 with different values of χ (Fig. 4). The distribution of inferred couplings for all native contacts is shown in Figure 4(A), where attractive couplings are positive. Figure 4(B) shows the distributions of native contact energies strengths from Eq. (4) for χ = 0 and χ = 1, which have standard deviations of 0 and 1.4, respectively. The strongest stabilizing contacts in the heterogeneous models are found among the early forming contacts determined from experiments [green circles in Fig. 4(C)].

Figure 4.

Figure 4

Native contact energies for SH3. (A) The distribution of inferred couplings for all 149 native contacts of SH3. (B) The distribution of native contact energetic strengths for SH3 at χ = 0 and χ = 1, which are enforced by Eq. (4) to have the same mean strength of +1ε. (C) Contact energy strengths projected onto the native contact map for SH3 at χ = 0.5 and χ = 0.75. The contacts located in the translucent, gray region are not covered in the MSA for the SH3 family, and hence, were not part of the inferred global statistical model. For this reason, they were given the average energetic strength of a β‐sheet in a model (see Coevolutionary data‐enriched structure‐based models). The contacts circled in green and red are the early‐forming and late‐forming contacts, respectively (as described in Ref. 67).

We characterized the thermodynamics of our models for different values of χ by computing F(Q,Q path)/k B T at Tf (Fig. 5), which is the free energy as a function of the fraction of native contacts, Q, and the difference in the folding progress between early forming contacts and late forming contacts in the folding mechanism of SH3, Q path.67 A positive value of Q path corresponds to the experimentally observed mechanism in which the formation of the central three β‐sheets is preferred, while a negative value of Q path reflects an off‐pathway, inverted mechanism in which contacts between the two terminal β‐sheets form first. It was previously shown that at χ = 0, a positive‐Q path is observed in the TSE.67 We find that as χ increases, the location of the saddle point on the 2D free energy surface moves to even higher values of Q path. Thus, the folding pathway with statistically inferred heterogeneous contact energies further coincides with the experimental pathway.

Figure 5.

Figure 5

SH3 free energy surface. Free energy surface at the folding temperature projected onto Q and Q path for χ = 0, 0.25, 0.5, and 0.75. The blue circles represent the average folding path taken for each model, while the enveloping white lines represent ±1 standard deviation of the average path.

Next, we compared the more detailed picture of the TSE provided by Φ‐value analysis of SH3.64 Comparing Φ(element)(χ=0), Φ(element)(χ=0.25), Φ(element)(χ=0.5), and Φ(element)(χ=0.75) vs Φ(element)(experiment) shows that the incorporation of energetic heterogeneity leads to subtle improvements in the agreement with experiment compared to the homogeneous model (χ = 0) (Fig. 6). The Φ‐values in the homogeneous model are already well correlated with the experimental data, and increasing χ from 0 to 0.75 only raises the Pearson correlation from 0.94 to 0.97. Heterogeneity results in a general reduction of the slope and intercepts toward 1 and 0, respectively, but again the improvements are subtle. In particular, the slopes of the fits for χ = 0, 0.25, 0.5, and 0.75 were 1.98, 1.54, 1.48, and 1.60, respectively, demonstrating the tendency of our SH3 models to over‐predict experimental Φ‐values that are small and under‐predict experimental Φ‐values that are large. Models where χ > 0.75 were not explored because the free energy barriers to folding at Tf were <1 kBT (Supporting Information, Figure S4). Finally, the subtle improvement when heterogeneity is introduced can further be observed in the residue‐level Φ‐values (Supporting Information, Figure S5), further highlighting that the introduction of heterogeneity into the model of SH3 does not break its ability to capture the coarse‐features of the folding mechanism.

Figure 6.

Figure 6

SH3 Φ‐value comparison. Φ‐values averaged over the secondary structural element compared between experiment and simulation for χ = 0.25, 0.5, and 0.75, which are shown in red, green, and blue, respectively. The averaged Φ‐values for χ = 0 compared with experiment are overlaid on each plot as hollow circles. The solid, black line represents the diagonal through zero (i.e., Φelement(χ)=Φelement(experiment)). The dashed, black line represents the fit for the homogeneous model (χ = 0), while the colored, dashed line represents the fits for the heterogeneous models. The data for χ = 0, 0.25, 0.5, and 0.75 exhibits a Pearson correlation, an intercept of fit, and a slope of fit of (0.94, −0.31, 1.98), (0.96, −0.18, 1.54), (0.96, −0.14, 1.48), and (0.97,−0.27, 1.60), respectively.

Discussion

The presented folding results provide strong evidence that coevolutionary data can be used to enrich simple protein models. These models are consistent with experimental measurements that probe the protein folding mechanism. This agreement further highlights the fundamental relationship36, 68 between the evolved sequences of proteins and the biophysics of their folding and function.

Due to the advances in both statistical inference methodologies17 and the collection of sequence data,69 constructing predictive, data‐driven models from familial protein sequences can potentially be very useful in a number of respects. For protein engineering, the global statistical model of coevolution can be used to select mutations that strengthen or weaken desired interactions in a protein model. This can potentially be used to enhance or suppress functional motions of proteins. In the context of protein–protein interactions, one can potentially construct an interaction model of proteins that have evolved together that builds on an existing information‐based framework.24 For example, a model of coevolved proteins can be constructed for cases where energetic heterogeneity plays an important role in binding and unbinding. The explicit incorporation of conformational entropy, energetic heterogeneity, and thermal fluctuations can potentially be used to study the dynamics of the functional protein–protein interfaces as well as identify soft, transient interfaces that are unknown.

While the simple model that we presented appears to reproduce coarse features of the folding mechanisms for S6 and SH3, many systematic improvements can potentially be made to improve its predictive capabilities. For example, nonadditive contacts49 or desolvation barriers47, 48 can potentially be used to describe the native contacts in an SBM to recover the lost cooperativity associated with introducing additional energetic heterogeneity in the native contact strengths. The general observation that Cα‐based SBMs (Go models) with heterogeneous contact strengths lack sufficient folding cooperativity has been reported in studies39, 70 using the Miyazawa–Jernigan transferable potential.71 This lack of cooperativity is also observed in our study using heterogeneous contact energies from our inferred global statistical model. Recovering cooperativity that is missing from our model would further allow us to go beyond the practical limitations (e.g., χ = 0.5) and allow us to explore larger values of χ to find its optimal value, which in principle should be χ = 1.

Further considerations can be made to improve the accuracy of our statistical inference procedure, including the use of more accurate, albeit more computationally expensive, algorithms30 or methods to more accurately reduce phylogenic bias.72 Finally, one could consider the incorporation of non‐native contacts to address the predictions of Φ<0 or Φ>1. Such non‐native contacts potentially have evolved to guide the folding mechanism66 and, in principle, would also manifest as sequence correlations in an MSA.

Supporting information

Supporting Information

Acknowledgments

We would like to thank Peter Wolynes, Ryan Hayes, Alex Kluber, and Faruck Morcos for helpful comments. We would also like to thank Heiko Lammert for providing the structures of the S6 permutants used in this study as well as helpful comments.

References

  • 1. Bryngelson JD, Wolynes PG (1987) Spin‐glasses and the statistical‐mechanics of protein folding. Proc Natl Acad Sci USA 84:7524–7528. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Onuchic JN, LutheySchulten Z, Wolynes PG (1997) Theory of protein folding: the energy landscape perspective. Ann Rev Phys Chem 48:545–600. [DOI] [PubMed] [Google Scholar]
  • 3. Onuchic JN, Wolynes PG (2004) Theory of protein folding. Curr Opin Struct Biol 14:70–75. [DOI] [PubMed] [Google Scholar]
  • 4. Whitford PC, Noel JK, Gosavi S, Schug A, Sanbonmatsu KY, Onuchic JN (2009) An all‐atom structure‐based potential for proteins: bridging minimal models with all‐atom empirical force fields. Proteins 75:430–441. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Noel JK, Whitford PC, Sanbonmatsu KY, Onuchic JN (2010) SMOG@ctbp: simplified deployment of structure‐based models in GROMACS. Nucleic Acids Res 38:W657–W661. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Clementi C, Nymeyer H, Onuchic JN (2000) Topological and energetic factors: what determines the structural details of the transition state ensemble and “en‐route” intermediates for protein folding? An investigation for small globular proteins. J Mol Biol 298:937–953. [DOI] [PubMed] [Google Scholar]
  • 7. Whitford PC, Sanbonmatsu KY, Onuchic JN (2012) Biomolecular dynamics: order–disorder transitions and energy landscapes. Rep Prog Phys 75:076601 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Tirion MM (1996) Large amplitude elastic motions in proteins from a single‐parameter, atomic analysis. Phys Rev Lett 77:1905–1908. [DOI] [PubMed] [Google Scholar]
  • 9. Go N (1983) Protein folding as a stochastic‐process. J Statistical Phys 30:413–423. [Google Scholar]
  • 10. Chen T, Song JH, Chan HS (2015) Theoretical perspectives on nonnative interactions and intrinsic disorder in protein folding and binding. Curr Opin Struct Biol 30:32–42. [DOI] [PubMed] [Google Scholar]
  • 11. Davtyan A, Schafer NP, Zheng WH, Clementi C, Wolynes PG, Papoian GA (2012) AWSEM‐MD: protein structure prediction using coarse‐grained physical potentials and bioinformatically based local structure biasing. J Phys Chem B 116:8494–8503. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Schafer NP, Kim BL, Zheng WH, Wolynes PG (2014) Learning to fold proteins using energy landscape theory. Israel J Chem 54:1311–1337. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Kim BL, Schafer NP, Wolynes PG (2014) Predictive energy landscapes for folding alpha‐helical transmembrane proteins. Proc Natl Acad Sci USA 111:11031–11036. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Zheng W, Schafer NP, Davtyan A, Papoian GA, Wolynes PG (2012) Predictive energy landscapes for protein–protein association. Proc Natl Acad Sci USA 109:19244–19249. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Matysiak S, Clementi C (2004) Optimal combination of theory and experiment for the characterization of the protein folding landscape of S6: how far can a minimalist model go?. J Mol Biol 343:235–248. [DOI] [PubMed] [Google Scholar]
  • 16. Li WF, Wolynes PG, Takada S (2011) Frustration, specific sequence dependence, and nonlinearity in large‐amplitude fluctuations of allosteric proteins. Proc Natl Acad Sci USA 108:3504–3509. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. de Juan D, Pazos F, Valencia A (2013) Emerging methods in protein co‐evolution. Nature Rev Genet 14:249–261. [DOI] [PubMed] [Google Scholar]
  • 18. Marks DS, Hopf TA, Sander C (2012) Protein structure prediction from sequence variation. Nature Biotech 30:1072–1080. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Haq O, Andrec M, Morozov AV, Levy RM (2012) Correlated electrostatic mutations provide a reservoir of stability in HIV protease. Plos Comput Biol 012;8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Haq O, Levy RM, Morozov AV, Andrec M (2009) Pairwise and higher‐order correlations among drug‐resistance mutations in HIV‐1 subtype B protease. BMC Bioinform 10: [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Ferguson AL, Mann JK, Omarjee S, Ndung'u T, Walker BD, Chakraborty AK (2013) Translating HIV sequences into quantitative fitness landscapes predicts viral vulnerabilities for rational immunogen design. Immunity 38:606–617. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Schneidman E, Berry MJ, Segev R, Bialek W (2006) Weak pairwise correlations imply strongly correlated network states in a neural population. Nature 440:1007–1012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Bialek W, Cavagna A, Giardina I, Mora T, Silvestri E, Viale M, Walczak AM (2012) Statistical mechanics for natural flocks of birds. Proc Natl Acad Sci USA 109:4786–4791. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Cheng RR, Morcos F, Levine H, Onuchic JN (2014) Toward rationally redesigning bacterial two‐component signaling systems using coevolutionary information. Proc Natl Acad Sci USA 111:E563–E571. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Dago AE, Schug A, Procaccini A, Hoch JA, Weigt M, Szurmant H (2012) Structural basis of histidine kinase autophosphorylation deduced by integrating genomics, molecular dynamics, and mutagenesis. Proc Natl Acad Sci USA 109:E1733–E1742. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Procaccini A, Lunt B, Szurmant H, Hwa T, Weigt M (2011) Dissecting the specificity of protein–protein interaction in bacterial two‐component signaling: orphans and crosstalks. Plos One 6: [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Weigt M, White RA, Szurmant H, Hoch JA, Hwa T (2009) Identification of direct residue contacts in protein–protein interaction by message passing. Proc Natl Acad Sci USA 106:67–72. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Schug A, Weigt M, Onuchic JN, Hwa T, Szurmant H (2009) High‐resolution protein complexes from integrating genomic information with molecular simulation. Proc Natl Acad Sci USA 106:22124–22129. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Morcos F, Pagnani A, Lunt B, Bertolino A, Marks DS, Sander C, Zecchina R, Onuchic JN, Hwa T, Weigt M (2011) Direct‐coupling analysis of residue coevolution captures native contacts across many protein families. Proc Natl Acad Sci USA 108:E1293–E1301. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Ekeberg M, Lovkvist C, Lan YH, Weigt M, Aurell E (2013) Improved contact prediction in proteins: using pseudolikelihoods to infer Potts models. Phys Rev E 87: [DOI] [PubMed] [Google Scholar]
  • 31. Lui S, Tiana G (2013) The network of stabilizing contacts in proteins studied by coevolutionary data. J Chem Phys 139: [DOI] [PubMed] [Google Scholar]
  • 32. Morcos F, Schafer NP, Cheng RR, Onuchic JN, Wolynes PG (2014) Coevolutionary information, protein folding landscapes, and the thermodynamics of natural selection. Proc Natl Acad Sci USA 111:12408–12413. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. Contini A, Tiana G (2015) A many‐body term improves the accuracy of effective potentials based on protein coevolutionary data. J Chem Phys 143:025103 [DOI] [PubMed] [Google Scholar]
  • 34. Clementi C, Plotkin SS (2004) The effects of nonnative interactions on protein folding rates: theory and simulation. Protein Sci 13:1750–1766. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35. Onuchic JN, Wolynes PG, Lutheyschulten Z, Socci ND (1995) Toward an outline of the topography of a realistic protein‐folding funnel. Proc Natl Acad Sci USA 92:3626–3630. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36. Wolynes PG (2014) Evolution, energy landscapes and the paradoxes of protein folding. Biochimie http://dx.doi.org/10.1016/j.biochi.2014.12.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Morcos F, Jana B, Hwa T, Onuchic JN (2013) Coevolutionary signals across protein lineages help capture multiple protein conformations. Proc Natl Acad Sci USA 110:20533–20538. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. Jana B, Morcos F, Onuchic JN (2014) From structure to function: the convergence of structure based models and co‐evolutionary information. Phys Chem Chem Phys 16:6496–6507. [DOI] [PubMed] [Google Scholar]
  • 39. Cho SS, Levy Y, Wolynes PG (2009) Quantitative criteria for native energetic heterogeneity influences in the prediction of protein folding kinetics. Proc Natl Acad Sci USA 106:434–439. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. Haglund E, Lindberg MO, Oliveberg M (2008) Changes of protein folding pathways by circular permutation—overlapping nuclei promote global cooperativity. J Biol Chem 283:27904–27915. [DOI] [PubMed] [Google Scholar]
  • 41. Lindberg M, Tangrot J, Oliveberg M (2002) Complete change of the protein folding transition state upon circular permutation. Nature Struct Biol 9:818–822. [DOI] [PubMed] [Google Scholar]
  • 42. Martinez JC, Serrano L (1999) The folding transition state between SH3 domains is conformationally restricted and evolutionarily conserved. Nature Struct Biol 6:1010–1016. [DOI] [PubMed] [Google Scholar]
  • 43. Fersht AR, Matouschek A, Serrano L (1992) The folding of an enzyme. 1. Theory of protein engineering analysis of stability and pathway of protein folding. J Mol Biol 224:771–782. [DOI] [PubMed] [Google Scholar]
  • 44. Matouschek A, Kellis JT, Serrano L, Fersht AR (1989) Mapping the transition‐state and pathway of protein folding by protein engineering. Nature 340:122–126. [DOI] [PubMed] [Google Scholar]
  • 45. Plotkin SS, Onuchic JN (2002) Structural and energetic heterogeneity in protein folding. I. Theory. J Chem Phys 116:5263–5283. [Google Scholar]
  • 46. Stoycheva AD, Brooks CL, Onuchic JN (2004) Gatekeepers in the ribosomal protein S6: thermodynamics, kinetics, and folding pathways revealed by a minimalist protein model. J Mol Biol 340:571–585. [DOI] [PubMed] [Google Scholar]
  • 47. Cheung MS, Garcia AE, Onuchic JN (2002) Protein folding mediated by solvation: water expulsion and formation of the hydrophobic core occur after the structural collapse. Proc Natl Acad Sci U S A 99:685–690. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48. Zhang ZQ, Chan HS (2010) Competition between native topology and nonnative interactions in simple and complex folding kinetics of natural and designed proteins. Proc Natl Acad Sci USA 107:2920–2925. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49. Eastwood MP, Wolynes PG (2001) Role of explicitly cooperative interactions in protein folding funnels: a simulation study. J Chem Phys 114:4702–4716. [Google Scholar]
  • 50. Finn RD, Bateman A, Clements J, Coggill P, Eberhardt RY, Eddy SR, Heger A, Hetherington K, Holm L, Mistry J, et al. (2014) Pfam: the protein families database. Nucleic Acids Res 42:D222–D230. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51. Thouless DJ, Anderson PW, Palmer RG (1977) Solution of “Solvable model of a spin glass”. Philosoph Mag 35:593–601. [Google Scholar]
  • 52. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE (2000) The Protein Data Bank. Nucleic Acids Res 28:235–242. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53. Lindahl M, Svensson LA, Liljas A, Sedelnikova SE, Eliseikina IA, Fomenkova NP, Nevskaya N, Nikonov SV, Garber MB, Muranova TA, et al. (1994) Crystal‐Structure of the Ribosomal‐Protein S6 from Thermus‐Thermophilus. Embo J 13:1249–1254. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54. Xu WQ, Harrison SC, Eck MJ (1997) Three‐dimensional structure of the tyrosine kinase c‐Src. Nature 385:595–602. [DOI] [PubMed] [Google Scholar]
  • 55. Lammert H, Schug A, Onuchic JN (2009) Robustness and generalization of structure‐based models for protein folding and function. Proteins 77:881–891. [DOI] [PubMed] [Google Scholar]
  • 56. Van der Spoel D, Lindahl E, Hess B, Groenhof G, Mark AE, Berendsen HJC (2005) Gromacs: fast, flexible, and free. J Comput Chem 26:1701–1718. [DOI] [PubMed] [Google Scholar]
  • 57. Kumar S, Bouzida D, Swendsen RH, Kollman PA, Rosenberg JM (1992) The weighted gistogram analysis method for free‐energy calculations on biomolecules.1. The method. J Comput Chem 13:1011–1021. [Google Scholar]
  • 58. Zwanzig RW (1954) High‐temperature equation of state by a perturbation method.1. Nonpolar gases. J Chem Phys 22:1420–1426. [Google Scholar]
  • 59. Otzen DE, Oliveberg M (2002) Conformational plasticity in folding of the split beta‐alpha‐beta protein S6: evidence for burst‐phase disruption of the native state. J Mol Biol 317:613–627. [DOI] [PubMed] [Google Scholar]
  • 60. Lindberg MO, Haglund E, Hubner IA, Shakhnovich EI, Oliveberg M (2006) Identification of the minimal protein‐folding nucleus through loop‐entropy perturbations. Proc Natl Acad Sci USA 103:4083–4088. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61. Hubner IA, Oliveberg M, Shakhnovich EI (2004) Simulation, experiment, and evolution: understanding nucleation in protein S6 folding. Proc Natl Acad Sci USA 101:8354–8359. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62. Lammert H, Noel JK, Haglund E, Schug A, Onuchic JN (2015) Constructing a folding model for protein S6 guided by dynamics deduced from NMR structures. J Chem Phys, submitted. [DOI] [PubMed] [Google Scholar]
  • 63. Chen J, Wang J, Wang W (2004) Transition states for folding of circular‐permuted proteins. Proteins 57:153–171. [DOI] [PubMed] [Google Scholar]
  • 64. Riddle DS, Grantcharova VP, Santiago JV, Alm E, Ruczinski I, Baker D (1999) Experiment and theory highlight role of native state topology in SH3 folding. Nature Struct Biol 6:1016–1024. [DOI] [PubMed] [Google Scholar]
  • 65. Grantcharova VP, Riddle DS, Baker D (2000) Long‐range order in the src SH3 folding transition state. Proc Natl Acad Sci USA 97:7084–7089. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66. Zarrine‐Afsar A, Wallin S, Neculai AM, Neudecker P, Howell PL, Davidson AR, Chan HS (2008) Theoretical and experimental demonstration of the importance of specific nonnative interactions in protein folding. Proc Natl Acad Sci USA 105:9999–10004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67. Lammert H, Noel JK, Onuchic JN (2012) The dominant folding route minimizes backbone distortion in SH3. Plos Comput Biol 8:e1002776 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68. Sikosek T, Chan HS (2014) Biophysics of protein evolution and evolutionary protein biophysics. J Royal Soc Interface 11:20140419 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69. UniProt C (2015) UniProt: a hub for protein information. Nucleic Acids Res 43:D204–D212. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70. Chen T, Chan HS (2015) Native contact density and nonnative hydrophobic effects in the folding of bacterial immunity proteins. PLoS Comput Biol 11:e1004260 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71. Miyazawa S, Jernigan RL (1996) Residue‐residue potentials with a favorable contact pair term and an unfavorable high packing density term, for simulation and threading. J Mol Biol 256:623–644. [DOI] [PubMed] [Google Scholar]
  • 72. Obermayer B, Levine E (2014) Inverse Ising inference with correlated samples. New J Phys 16:123017 [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supporting Information


Articles from Protein Science : A Publication of the Protein Society are provided here courtesy of The Protein Society

RESOURCES