Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2005 Jul 11;102(29):10171–10175. doi: 10.1073/pnas.0504171102

Φ values in protein-folding kinetics have energetic and structural components

Claudia Merlo *, Ken A Dill , Thomas R Weikl *,
PMCID: PMC1177393  PMID: 16009941

Abstract

Φ values are experimental measures of how the kinetics of protein folding is changed by single-site mutations. Φ values measure energetic quantities, but they are often interpreted in terms of the structures of the transition-state ensemble. Here, we describe a simple analytical model of the folding kinetics in terms of the formation of protein substructures. The model shows that Φ values have both structural and energetic components. It also provides a natural and general interpretation of “nonclassical” Φ values (i.e., <0or >1). The model reproduces the Φ values for 20 single-residue mutations in the α-helix of the protein CI2, including several nonclassical Φ values, in good agreement with experiments.

Keywords: transition-state ensemble, mutational analysis, statistical mechanics


The folding kinetics of small single-domain proteins has been widely studied by single-site mutagenesis (1-16). The central quantity in these studies, the Φ value, is defined as (17, 18)

graphic file with name M1.gif [1]

Here, kwt and kmut are the folding rates of the wild type and mutant protein, and ΔGN is the change of the protein stability upon mutation. The stability GN of a protein is the free-energy difference between the native (N) and the denatured (D) state.

There are several theoretical studies of Φ values and transition states. The thermal unfolding kinetics of CI2 has been extensively studied in molecular dynamics simulations (19-24). Here, the transition state is defined as a “small ensemble of structures populated immediately prior to the onset of a large structural change” (20) in the unfolding trajectories. Other groups have considered statistical mechanical or Go-type models (25-36). In some of these models, transition states are identified as free-energy maxima along a folding reaction coordinate or as free-energy saddle points if two or more degrees of freedom are used for the reaction coordinate. More recent approaches define the transition state ensemble (TSE) from experimental Φ values by using these Φ values as restraints in simulations (37-39). Each of these definitions of transition state, although plausible, is nevertheless based on one or more ad hoc premises.

In classical transition-state theory, the folding rate is proportional to exp[-GT/RT], where GT = Gtransition state - Gdenatured state is the free-energy difference between the TSE and the denatured state. Possible changes in the prefactor of this proportionality relation upon mutation are usually neglected. Thus, Φ=ΔGTGN. In this way, Φ values measure the energetic consequences of mutations on the TSE relative to the native state.

A central question is whether Φ values also give structural information about the TSE (18, 40, 41). In the traditional interpretation, Φ = 1 is taken to indicate that the mutated residue has native-like structure in its TSE, whereas Φ = 0 is taken to indicate that the mutated residue is not structured in the TSE. Typically, experiments give Φ values that are fractional, with values between 0 and 1, apparently indicating partial native-like structural character of the residue in the TSE.

However, there are three problems with this traditional structural interpretation. First, Φ values are sometimes “nonclassical”; they can be <0 or >1. In the traditional view, such values are impossible, implying a transition state that is more denatured than D or more native than N; hence, there is some controversy about how such Φ values should be interpreted. Second, a given sequence position can have very different Φ values, depending on which amino acid is substituted there, leading to the question of whether such energetic changes always have a simple structural interpretation.

Third, there is a problem of continuity: two residues that are neighbors in the chain are sometimes observed to have very different Φ values. A structural interpretation of this observation would be that there can be sharp boundaries between native-like and non-native-like structure in the TSE, which seems implausible. For example, the protein CI2 consists of an α-helix packed against a four-stranded β-sheet (see Fig. 1). In the α-helix of CI2, 20 single residue mutations have been studied, giving Φ values ranging over the full spectrum from -0.35 to 1.25. Although helix formation is usually regarded as fast and cooperative, these results would seem to imply that this helix does not form as a single cooperative unit: parts are folded, and parts are not in the TSE. It is not clear whether these are problems of experimental errors or problems in the traditional model that is used to interpret Φ values.

Fig. 1.

Fig. 1.

The native structure of CI2 consists of a four-stranded β-sheet packed against an α-helix (Protein Data Bank ID code 1COA).

Is there a more physical way to interpret the formation of protein substructures that comprise the TSE of protein folding? Here, we develop a model. First, we consider the simplest subdivision of the protein: into one α-helical substructure and one β-sheet substructure. Because of its simplicity, the model can be solved analytically and exactly. Then, we generalize this model to apply to CI2. Despite its simplicity, this model reproduces the experimental Φ values in the α-helix of CI2 with a correlation coefficient of 0.85, including some of the nonclassical Φ values. A key conclusion is that it is not sufficient to interpret Φ values solely in terms of structures. However, a Φ value can be decomposed into structural and energetic components.

Model and Methods

The Dynamics. Our approach has the following two aspects, (i) the model, which expresses the relative free energies of the various substructures of the protein as it folds, and (ii) the dynamics of the model. We first describe our treatment of the dynamics. To simplify the notation, we define here the free-energy Gn of each partially folded state n = 1, 2, 3,..., and the dimensionless free-energy gnGn/RT, with respect to the fully denatured state in which none of the substructures is formed. Thus, the denatured state is the reference, defined as having zero free energy. The transition rate from any state m to state n is given by the following expression:

graphic file with name M2.gif [2]

provided the states n and m are connected by a single step in which only one substructure folds or unfolds (36). For other transitions, the transition rates are zero. Here, to is a reference time scale.§

The folding kinetics is described by the following master equation:

graphic file with name M3.gif [3]

The elements of the vector P(t) are the probabilities Pn(t) that the protein is in state n at time t, and the matrix elements of W are given by Wnm = -wnm for nm and Wnn = ∑mnwmn. The general solution of the master equation has the form

graphic file with name M4.gif [4]

Here, λ are the eigenvalues and Yλ are the eigenvectors of the matrix W. The prefactors cλ depend on the initial conditions at time t = 0. The eigenvalues represent relaxation rates. It can be shown that one eigenvalue is zero, corresponding to the equilibrium distribution, whereas all other eigenvalues are positive (42). For t → ∞, the probability vector P(t) tends to coYo, where Yo is the eigenvector with eigenvalue 0.

The Model: Two Substructures. The dynamics described above is applicable to any model of the protein, its substructures, and their relative free energies. Here, we first apply the dynamics to the simplest possible model of the substructures of CI2. The model has the following four states: (i) the denatured state, D, in which neither the helix nor the sheet is formed; (ii) a partially folded state, α, in which only the helix is formed; (iii) a partially folded state, β, in which only the β-sheet is formed; and (iv) the native state, N, in which both the helix and sheet are formed and packed against each other.

In this simple four-state model, the energy landscape is characterized by the dimensionless free-energy differences gα, gβ, and gN of the states α, β, and N, each taken with respect to the denatured state D, which is defined as having zero free energy.

The folding kinetics of this model can be solved exactly by determining the eigenvalues λ and eigenvectors Yλ of the matrix W. Because this model has four states, W is a 4 × 4 matrix. In units of 1/to, the eigenvalues are given by λ = 0, 1 - q, 1 + q, and 2, where

graphic file with name M5.gif [5]

Because we have -1 < q < 1, the three nonzero eigenvalues are positive and describe the relaxation to the equilibrium state of the model (see Eq. 4). The equilibrium state simply is coYo, where Yo is the eigenvector with eigenvalue 0.

This model exhibits two-state folding kinetics under two conditions. First, the native state must be stable; the free-energy gN of the native state must be significantly smaller than the free energies of the other three states. Under such folding conditions, the equilibrium native state will be more populated than the other three states. Second, the intermediate states α and β must have positive free energies, relative to D, so that the system will have a kinetic barrier, which is required to achieve single-exponential dynamics.

Under these two conditions, the three Boltzmann weights egN-gα-gβ, egN-gα, and egN-gβ in Eq. 5 are much less than 1 and also much less than e-gα and e-gβ. Therefore, these three Boltzmann weights can be neglected. We set them equal to zero. The factor q in Eq. 5 then simplifies to the following expression:

graphic file with name M6.gif [6]

For large barrier energies gα and gβ, we have e-gα ≪ 1 and e-gβ ≪ 1 and, therefore, (1 + e-gα)(1 + e-gβ) ≅ (1 + e-gα + e-gβ). If we next use the expansion (1 + x) ≅ 1 - x/2 with x = e-gα + e-gβ ≪ 1, the smallest nonzero relaxation rate, or folding rate, k ≡ 1 - q is given by

graphic file with name M7.gif [7]

The folding rate k is much smaller than the other two relaxation rates 1 + q and 2. In that case, these two fast relaxations constitute an initial “burst phase,” and the model otherwise gives two-state single-exponential folding behavior with the slowest rate k (see Eq. 4). The folding rate k simply is the sum of the rates for the two possible folding routes: one folding route in which α forms first and the other in which β forms first. The factor 1/2 in the equation above arises because a molecule, after reaching one of the barrier states α or β, either falls back to D or falls forward to N, with almost equal probability.

By using this model, we next explore the effects of mutations. Consider a mutation within the α-helix. The free energy of the helix will change from gαgαgα, and the free energy of the native state will change from gNgN + ΔgN. In contrast, gβ is not affected by the mutation. The folding rate of the mutant will be kmut = k(gαgα, gβ), with k given by Eq. 7. For small perturbations Δgα, we have ln kwt - ln kmut ≅ - (∂ ln k/∂ gαgα. For mutations in the α-helix, the Φ value defined in Eq. 1 has the following general form:

graphic file with name M8.gif [8]

with

graphic file with name M9.gif [9]

Hence, the Φ value is a product of the two following terms: a structural factor χα and an energetic factor ΔgαgN. The term χα describes the fractional structure formation of the α-helix within the TSE. In this example, the TSE consists of the two barrier states α and β on the two parallel folding routes. χα ranges between 0 and 1. We have χα = 1 for gαgβ when the state α dominates the TSE, and χα = 0 when β dominates the TSE.

Whereas χα gives structural information, the second term, ΔgαgN, can take on either negative or positive values. Thus, this term accounts for nonclassical Φ values <0 or >1. In the simplest case, ΔgN = Δgα + Δgαβ. Here, Δgαβ is the free-energy change for a tertiary contact between the α-helix and the β-sheet, for example. In that case, negative Φ values arise when Δgαβ is larger in magnitude and opposite in sign to that of Δgα. That is, a negative Φ value is predicted when a helical mutation also has a counteracting and larger effect on a tertiary contact. Correspondingly, Φ > 1 occurs when the following two conditions are met: (i) Δgαβ is opposite in sign but smaller in magnitude than Δgα, and (ii) χα is sufficiently large. This explanation of nonclassical Φ values may also rationalize why more Φ values are <0 than >1 (42). If gα and gαβ have a similar magnitude, it should be more difficult to satisfy the latter two conditions than the former condition.

However, our model is rather general and captures also that nonclassical Φ values can arise from shifts in the free energy of the denatured state. For example, if a mutation only lowers the free energy of the denatured state, we have Δgα > 0 and ΔgN < 0, which gives a negative Φ value according to Eq. 8. In contrast, the traditional structural interpretation of Φ values fails if mutations shift the free energy of the denatured state (48).

In this simple example, a mutation in the α-helix affects only a single structural element formed in the TSE: the α-helix. In general, mutations may affect several structural elements of the TSE. A generalization of Eq. 8 is then Φ = (∑iχi Δgi)/ΔgN with χi = -(∂ ln k)/(∂ gi), provided that the free energies gi of the structural elements are additive.

Mutations in the α-Helix of CI2. To model the folding kinetics of CI2, we must consider at least four substructural units: the α-helix and the three strand pairings β2β3, β3β4, and β1β4. These substructures correspond to contact clusters on the native contact map of CI2 (see Fig. 2). Therefore, the model energy landscape of CI2 is more complex than the landscape of the simple four-state model given above. However, under two assumptions, Eq. 3 also holds for the helix of CI2. These assumptions are as follows: (i) the helix is either fully formed or not formed in each of the states of the TSE, and (ii) the helix does not form tertiary contacts in the TSE. Under these assumptions, the free-energy contribution of the helix to a state of the TSE (in which the helix is formed) simply is gα, and then χα has the same interpretation as described above.

Fig. 2.

Fig. 2.

Contact matrix of CI2. Each black dot represents a contact between two amino acids in the native structure, with a distance of <6 Å between the Cα or Cβ atoms of the amino acids. The four large clusters of contacts correspond to the main structural elements of CI2: the α-helix and the β-strand pairings β2β3, β3β4, and β1β4. The few “isolated” contacts represent either turns or tertiary interactions of α-helix and β-sheet.

To test Eq. 8, we consider the 20 single-residue mutations in the CI2 helix (2). We estimate the change in intrinsic helix stability Δgα from helicities predicted by the program agadir (44-46) (see Table 1). The experimentally measured change in folding rate for these mutations, Inline graphic, correlates with Δgα with a coefficient r = 0.83, and the experimentally determined Φ values correlate with Inline graphic with r = 0.85 (see Fig. 3). According to Eq. 8, the change in log k is proportional to Δgα, and the Φ values are proportional to Inline graphic, both with proportionality constant χα. From the two linear fits shown in Fig. 3, we obtain the estimate χα = 0.88 ± 0.12. We have estimated the errors for χα by using a jackknife method in which up to two data points are deleted randomly from the data set (see legend to Fig. 3). This estimate for χα indicates that the helix is almost fully formed in the TSE. In agreement with this interpretation, molecular dynamics unfolding simulations indicate that a fraction of 0.91 ± 0.14 of the helical residues are structured in the TSE (21).

Table 1. Data for single-residue mutations in the α-helix of CI2.

Mutation Inline graphic Inline graphic Φexp Δgα Inline graphic
S12G 0.23 0.8 0.29 0.28 0.35
S12A 0.38 0.89 0.43 0.14 0.16
E14Q 0.36 0.29 1.23 0.54 1.86
E14D 0.10 0.52 0.2 0.08 0.15
E14N 0.53 0.7 0.75 0.54 0.77
E15Q 0.25 0.47 0.53 0.56 1.19
E15D 0.16 0.74 0.22 0.13 0.18
E15N 0.57 1.07 0.53 0.57 0.53
A16G 1.15 1.09 1.06 0.82 0.75
K17A 0.14 0.49 0.28 0.04 0.08
K17G 0.87 2.32 0.38 0.80 0.34
K18G 0.68 0.99 0.7 0.75 0.76
V19A −0.13 0.49 −0.26 −0.41 −0.84
120V 0.52 1.3 0.4 0.14 0.11
L21A 0.33 1.33 0.25 −0.01 −0.01
L21G 0.48 1.38 0.35 0.26 0.19
Q22G 0.07 0.6 0.12 0.04 0.07
D23A −0.23 0.96 −0.25 −0.41 −0.43
K24A −0.23 0.65 −0.35 0.11 0.17
K24G 0.31 3.19 0.1 0.12 0.04

Experimental data for folding rates Inline graphic and Inline graphic of wild type and mutants, stability changes Inline graphic, and Φ values are from Itzhaki et al. (2). The change Inline graphic in the “intrinsic helix stability” gα is estimated from helicities Pα predicted by agadir (44-46). The wild-type sequence of the 13-residue helix is SVEEAKKVILQDK. Helicities have been calculated at the experimental temperature 298 K, pH 6.25, and ionic strength 0.03 mol, with acetylated N terminus and amidated C terminus of the peptide to avoid terminal charges. The energetic quantities RT Inline graphic, Inline graphic, and Δgα are given in kcal/mol.

Fig. 3.

Fig. 3.

Correlation analysis for mutations in the CI2 helix. (Upper) ln(Inline graphic) versus Δg = ln (Inline graphic) estimated from helicities Pα predicted by agadir (see Table 1). The correlation coefficient r is 0.83, and the slope of the fitted line through the origin is 0.98. The slope of this line is an estimate for the parameter χα of Eq. 8. For subsets of the data generated by deleting up to two data points, the correlation coefficient r varies from 0.77 to 0.93, and the linear slope varies from 0.87 to 1.07. (Lower) Φexp versus Inline graphic. The correlation coefficient r is 0.85, and the slope of the fitted line through the origin is 0.71. For data subsets generated by deleting up to two data points, r varies from 0.79 to 0.90, and the slope varies from 0.64 to 0.90.

Discussion

Our model gives a physical explanation for nonclassical Φ values, but an alternative explanation is in terms of experimental errors. Sánchez and Kiefhaber (47) have observed that mutations with nonclassical Φ values often have relatively small changes ΔgN in stability. Because ΔgN appears in the denominator of the expression for Φ, it means that nonclassical Φ values can arise when a mutation has little effect on the protein stability. Sánchez and Kiefhaber argue that unavoidable experimental errors may be responsible for the nonclassical Φ values, and that Φ values for mutations with ΔgN < 1.7 kcal/mol are unreliable. Others have argued that this error threshold should be considerably smaller, ≈0.6 kcal/mol (16, 48). The analysis of Sánchez and Kiefhaber is based on the assumption that different mutations at a given residue position should lead to the same “true” Φ value for this residue position. Our model gives a different interpretation. In our model, different mutations at a given position can affect the energy landscape in different ways. For example, we believe E14Q in the CI2 helix may affect the helicity significantly, whereas E14D does not (see Table 1).

Our model can explain isolated nonclassical Φ values, such as the four in the α-helix of CI2 (see Table 1). They are “isolated” insofar as they are interspersed among classical Φ values within a local region of the protein. There are other cases in which nonclassical Φ values are clustered together within a given region of the protein. In the second α-helix of ACBP for example, seven Φ values are clearly negative, whereas the other six Φ values are close to 0. Previously, clustered nonclassical Φ values have been explained in terms of parallel flow processes on slightly more complex energy landscapes than we have considered here (49). That is, mutations that destabilize a particular substructure can cause a back flow on the energy landscape into faster flow channels, leading to an increase in the folding rate and negative Φ values.

Here, we considered the α-helix of CI2 to illustrate our structural interpretation of Φ values. One reason is that the helix is very well characterized (i.e., a large number of Φ values is available). Another reason is that these Φ values cover a wide range of possible values, from -0.35 to 1.23. Two other well characterized helices are the α-helices of protein L (9) and G (10). There are 15 single-residue mutations that have been considered in the protein L helix. One of the Φ values is -0.39, whereas the others span a rather narrow range from -0.05 to 0.28 (9). Similarly, one of nine Φ values for the helix of protein G is -0.81, whereas the others range from 0.05 to 0.55. In both cases, our model reproduces the clearly negative, nonclassical Φ value, which leads to relatively high correlation coefficients of 0.58 and 0.81 between the experimental and theoretical Φ value distributions. But because the other Φ values lie in a rather narrow range, the statistical uncertainties from experimental and modeling errors are high, and χα cannot be determined reliably.

Summary

The Φ values give information about the routes of protein folding. The central question is: What information do they give? Previous modeling has been limited in certain ways. First, some models treat only topological aspects of folding and, therefore, cannot explain how single-site mutations can have the large effects on folding rates that are often observed. Second, current models usually make some plausible, but ad hoc, assumption about folding routes, transition states, and reaction coordinates. Protein folding is sufficiently different from simpler reactions that some of these assumptions are not likely to be valid. In particular, Φ values are often assumed to reflect only structural information about transition states. Here, we present a more rigorous approach for interpreting Φ values, and we show that Φ values have both structural and energetic components. We show that our approach gives a consistent interpretation of mutational experiments on the CI2 helix.

Abbreviation: TSE, transition-state ensemble.

Footnotes

§

The transition rates obey detailed balance Inline graphic, where Inline graphic is the equilibrium weight for the state n. Detailed balance ensures that the system ultimately reaches thermal equilibrium.

These two assumptions are clearly simplifying. Based on unfolding simulations, Daggett et al. (21) argue for a crucial tertiary interaction between the residues Ala-16 of the α-helix and Ile-49 of the β-sheet in the TSE of CI2. In contrast, Lazaridis and Karplus (23) found that “the number of contacts made by the Ala side chain [in the TSE]... depend[s] primarily on the presence of the helix and not on interactions with β-strands.”

References


Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES