Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2008 Jan 29;105(5):1489–1493. doi: 10.1073/pnas.0705707105

Folding domain B of protein A on a dynamically partitioned free energy landscape

Erik D Nelson 1,*, Nick V Grishin 1
PMCID: PMC2234171  PMID: 18230738

Abstract

The B domain of staphylococcal protein A (BdpA) is a small helical protein that has been studied intensively in kinetics experiments and detailed computer simulations that include explicit water. The simulations indicate that BdpA needs to reorganize in crossing the transition barrier to facilitate folding its C-terminal helix (H3) onto the nucleus formed from helices H1 and H2. This process suggests frustration between two partially ordered forms of the protein, but recent φ value measurements indicate that the transition structure is relatively constant over a broad range of temperatures. Here we develop a simplistic model to investigate the folding transition in which properties of the free energy landscape can be quantitatively compared with experimental data. The model is a continuation of the Muñoz–Eaton model to include the intermittency of contacts between structured parts of the protein, and the results compare variations in the landscape with denaturant and temperature to φ value measurements and chevron plots of the kinetic rates. The topography of the model landscape (in particular, the feature of frustration) is consistent with detailed simulations even though variations in the φ values are close to measured values. The transition barrier is smaller than indicated by the chevron data, but it agrees in order of magnitude with a similar α-carbon type of model. Discrepancies with the chevron plots are investigated from the point of view of solvent effects, and an approach is suggested to account for solvent participation in the model.

Keywords: folding landscape, frustration, protein folding


The routes a protein travels on the way to its native ensemble are the result of a sensitive exchange between energy and entropy coupling the protein to its solvent environment. Often the basic topography of the folding landscape (1) can be understood just from what would be necessary kinetically for a protein-like molecule to organize itself into a particular native shape (24), but the detailed way in which a protein interacts with solvent can qualitatively affect the way it folds (5, 6). The conformation of a protein is coupled to large-scale fluctuations in solvent density (79), and for small globular proteins investigated in kinetics experiments, most of the change in solvent exposure (10) can occur before the protein is half folded. The rate at which a protein explores its conformational space is slaved to local solvent motions (11, 12), and together these effects can have a significant influence on the time scale for folding.

Recently methods have become available to simulate proteins with explicit solvent on a time scale long enough to explore folding (1315). These simulations provide the most precise picture of protein kinetics, but are as a result less wieldy to describe variations that occur across different experiments and solvent conditions (1618). Typically, much more simplistic α-carbon type models have been used to address these problems, and when guided by experiment they can provide an accurate description of the interplay among protein folding routes (3, 4). A parallel approach established by Muñoz and Eaton (1921) is to partition the free energy landscape of the protein into a network of small ensembles that can be described by polymer physics methods. This approach has the advantage that solvent effects might be included on a practical time scale, but it is usually less accurate than explicit chain models because the partitioning methods introduce dynamical constraints that conflict with, or oversimplify, the kinetics of the protein (22, 23). Here we develop a method to solve this problem.

In the Muñoz–Eaton model, the configuration space of a protein is in effect partitioned in terms of a local order parameter describing the proximity of chemical bond angles in the residues to their native values. The partitioning is accomplished by labeling the residues in a protein configuration as folded (native like) or unfolded (free, or ensemble like) depending on their order parameter values, and then grouping together configurations that have the same pattern of labels. Configurations with unfolded loops (24, 25) are neglected in the model so that folded segments always represent independent nuclei, and consequently the ensemble free energies are simple to describe. The model can be solved exactly by special techniques (26, 27), and it provides a convenient way to construct basic inferences from experiments (28, 29). Nevertheless, the terms that are being neglected (unfolded loops and intermittency of native contacts between folded segments (30, 31)) are precisely those that coordinate the expulsion of solvent from the protein (2, 13, 32), and it is of interest to describe these terms from the point of view of the original model.

We accomplish this here by continuing the geometric approach of Muñoz and Eaton, including separate order variables to describe the state of contact between the folded segments. The resulting system of constraints resembles the local-order coupling model for explicit chains (33). To test the model, we use it to describe the folding landscape of the B domain of staphylococcal protein A (BdpA), which allows an extensive comparison to thermodynamics and kinetics measurements. BdpA (Fig. 1) is one of the most widely studied proteins of the past decade (34). Folding kinetics and single-molecule FRET measurements indicate that BdpA folds cooperatively over a broad range of solvent conditions (18), yet the system is still small enough to simulate by first principles methods on a computer (1315). The folding nucleus (16), which forms around the first two helices, includes most of the binding interface from the complex with an IgG Fc fragment (35); H3 is unstructured in the complex, but structured in the transition state in all-atom simulations; however, in crossing the transition barrier, further rearrangement of the helices is required to consolidate contacts with H3 before the protein can fold. These features suggest that BdpA is weakly frustrated by extra “functional” requirements (binding preferences) of Staphylococcus aureus (34). But recent measurements to detect temperature changes in the transition structure (17) indicate that whatever frustration exists, it is not strong enough to divide the landscape into kinetically separate routes (29). By fitting our model to thermodynamic stability data (17, 29) we are able to bridge between the kinetics measurements and the detailed simulations to describe the folding landscape under various solvent conditions.

Fig. 1.

Fig. 1.

Native structures of BdpA. (a) NMR structure of wild-type BdpA colored according to φ values calculated later in this work. φ values in the range 0 ≤ φ ≤ 0.8 correspond to blue, green, yellow, orange, and red. (b) Crystal structure from the complex of BdpA with the IgG Fc fragment colored as in a.

Model

We consider systems that can be described by a Gō (native only) interaction model in which nonnative contact is repulsive at short distances just to exclude the volume occupied by the protein (3, 4). Let qi = 0, 1 label the local structure of a residue i (folded or unfolded) in a partially unfolded protein as in the Muñoz–Eaton model. Each pattern of labels {qi} describes an ensemble of protein configurations Γ with structurally similar folded segments αk. To describe the intermittency of contacts between the αk, we partition Γ into smaller ensembles Γn ∈ Γ according to whether the degree of native organization between pairs of folded segments αk and αl exceeds some threshold value. This is accomplished by providing the folded segments with separate labels (superscripts), αk → αka, so that native contact between two segments αka and αlb is established only if a = b. Leaving to later the question of what actually decides the threshold values for native order, the ensembles Γn can each be depicted by a Feynman-like diagram (36) formed by the following procedure [see supporting information (SI) Fig. 5]: First a line is drawn to represent the unfolded protein and labeled circles are placed at points along its length to represent the αka. Circles with the same label are then joined together into nuclei by bending the unfolded lines to form the diagram (note that two circles can have the same label only if contacts exist between them in the native state). Each Γ contains an ensemble Γ1 corresponding to the fully collapsed (natively connected) configuration of the αka, and each partially collapsed diagram Γn≠1 corresponds to a topologically different way of “pulling apart” the folded segments in Γ1. For small proteins like BdpA, there is no need to consider diagrams with more than four folded segments, so the number of diagrams in Γ is always ≤15.

To describe the free energy of a diagram the energy is then defined as E = −ε Σi<j Qijn), where ε is the energy per contact, Qijn) = 0 or QijN, and QijN is the native contact matrix. Qijn) = QijN only if residues i and j are located in the same nucleus.

To describe the entropy, first note that besides Γ1, each Γ contains an ensemble Γ0 in which the αka are completely pulled apart (all labels are different). The entropy Δs of unfolded residues in Γ0 can be estimated from the entropy of unfolded proteins (37), and the entropy cost ΔSn) to then sequentially close Γ0 into any of the diagrams Γn ∈ Γ will be estimated in terms of the cost for closing a single loop,

graphic file with name zpq00508-9074-m01.jpg

where p(x, x′v is the fraction of unrestricted chain configurations emanating from a point x that terminate at the point x′ within a small volume Δv about the size of a residue (38) and k is Boltzmann's constant. The entropy of a diagram is then S = (Nqs − ΔS, where N is the number of residues, q is the number of folded residues, and ΔS is a sum of terms described by Eq. 1.

Random loop insertions in proteins are well modeled as Gaussian chains with excluded volume (39), and the persistence length appears close to the size of a backbone unit (40, 41). Therefore, we neglect relatively small excluded volume effects here and use the simplest possible Gaussian approximation,

graphic file with name zpq00508-9074-m02.jpg

giving

graphic file with name zpq00508-9074-m03.jpg

where l is the number of links, r is the distance between end residues, a = 3.8 Å, and for simplicity we let Δv=43πa3, which is about the volume of a large branched residue (38).

Typically, the structure of a diagram makes it obvious how to approximate ΔS in terms of Δsloop(l, r). But we also have to consider situations where, for example, an αka appears along the unfolded part of a loop. This event would lead to a kind of effective loop consisting of two unfolded segments of length l1 and l2 joined across a physical distance r′ by the folded segment. To describe this diagram, we first interpret the folded segment as if it were an unfolded part x of a loop of total length l = l1 + x + l2, where

graphic file with name zpq00508-9074-m04.jpg

and δ ∈ [0, 1]. The variable part of the entropy cost is then multiplied by

graphic file with name zpq00508-9074-m05.jpg

to account for the entropy of segment x. The few more complex diagrams that require this approximation contribute minimally to the results (SI Text and SI Fig. 6). The contact matrix and parameters entering into these expressions are described in Methods.

Results

The system can now be depicted as a kind of foam, or network (1) of ensembles with free energy Fn) = En) − TSn), where T is the temperature. To quantitatively compare the model to the data, the melting free energy of the model is mapped to the measured free energy in terms of a stability parameter ξ. To accomplish this, the energy is written as ε/T = ξNΔs/QN, where QN = Σi<j QijN. The free energy profile is F(q) = −kTlnZ(q) (Fig. 2a), where Z(q) = ΣΓnq exp(−Fn)/kT) and the sum includes ensembles with q folded residues. The ends of the profile balance when ξ = 1.

Fig. 2.

Fig. 2.

ξ dependence of landscape parameters. (a) Free energy profile across the melting transition, ξm ≃ 0.946. (b) Melting free energy ΔF‡-D (□) and transition state barrier ΔF‡-D (○). (c) Mapping between stability values and denaturant concentration at 25°C. The dotted line indicates the melting concentration cm ≃ 3.49 M GdmCl. Data points occur at integer multiples of Δξ = 0.01 (Δc ≃ 0.186) about ξ ≃ 0.943. (d) Distribution of states P(q) corresponding to a. The stability range corresponds to the FRET data in figure 6 of ref. 18. (e) Location of the transition q(ξ). The data points were calculated by fitting the barrier to a quadratic. (f) φ values for the degree of helix formation as in figure 5 of ref. 17 (the color scheme is the same, and the plot describes a similar range of stability). φi(ξ) variations for tertiary structure are slightly larger.

The profile can be quantified by its critical points in the denatured (D), transition (‡), and native (N) ensembles: The transition barrier is defined as ΔF‡-D = F(q) − F(qD)and the melting free energy is ΔFD-N = kTlnZD/ZN, where ZD = ΣqqZ(q) and ZN = Σq>qZ(q) are the partition functions for denatured and native ensembles (Fig. 2b). ξ is mapped to solvent conditions by equating ΔFD-N(ξ) to the melting free energy measured at T = 25°C (17). Fig. 2c plots this relationship and connects the melting point ξm to the melting concentration of denaturant. The predicted melting temperature in 2 M guanidinium chloride (GdmCl) is Tm ≃ 50°C, very close the value Tm ≃ 52°C measured by Sato and Fersht (17), and the melting transition closely resembles single model FRET data (18) (Fig. 2d). The folding barrier (∼2kT at midpoint) has the same order of magnitude as the barrier for an α-carbon model of the protein with Gō interactions (42).

The actual size of the barrier can be estimated from chevron plots of folding/unfolding rates versus denaturant (18). Assuming two-state (exponential) kinetics, the slope of the folding arm of the chevron predicts a change of ∼7kT between 0 M GdmCl and melting, which is not inconsistent with estimates from current kinetic models that include solvent effects explicitly (11, 43). In the kinetic model used by Fersht and coworkers, the slope of the chevron, m‡-D, is proportional to the change in solvent-accessible surface area (SASA) between denatured and transition states (18). The fractional change in SASA can be calculated as m‡-D/mD-N = ∂ΔG‡-D/∂ΔGD-N, where mD-N is the change in SASA between denatured and native states and G is the Gibbs free energy (10). The result is about m‡-D/mD-N ∼ 0.7–0.8 for BdpA. Although our model does not include solvent, we can calculate m‡-D/mD-N from the ratio of slopes ΔΔF‡-D/ΔΔFD-N. The result is m‡-D/mD-N ≃ 0.45, which is about equal to the fraction of contacts formed at q. Although the model and measured transitions occur in roughly the same regions of the profile (range of native order), a smaller fraction of the surface area is lost in the transition. Because the model is mapped to ΔFD-N this value indicates that the slope of ΔF‡-D is a bit smaller than in the chevron plots. The difference in scale of activation barriers is very similar to what results when small cooperative (three-particle) terms are added to the Gō model (44). These terms can be related to solvent participation in terms of the n-cluster model of phase separation (9, 45), which, along with the results above, suggests that the coupling between the solvent and the structure of the protein (32, 46, 47) is described inaccurately here. It should be possible to account for this coupling better by defining the strength of pair interactions to depend on the amount of local (in space) native order, and the inclusion of these terms already tends to improve protein φ values in explicit chain models (44).

In the Gō model, φ values are usually calculated as φi ≃ 〈qi/QiN, where qi = Σj Qijn) and the angle brackets denote the thermal average at q (48). Fig. 2f plots φi(ξ) for comparison with the φi(T) measurements in ref. 17, and Fig. 2e plots the location of the transition, q(ξ). Variations in φi(ξ) in Fig. 2f roughly agree with the those in φi(T) across the measured temperature range (17), and the uniform increase of curves in Fig. 2f with decreasing ξ reflects the slight shift in the transition state toward the native ensemble. Fig. 3 plots the local order, 〈qi〉, and the contact order, 〈Qij〉/QijN, along the folding profile, and Fig. 4 describes the landscape and φ values at 2 M GdmCl for different realizations of QijN described in Methods. The folding pathway described by Fig. 3 is consistent with the replica exchange (all-atom) simulations of Garcia and Onuchic except for H3 helical content [apparently, this is related to the absence of denaturant (17) and high melting temperature (13) of the simulations]. The model pathway appears to describe early formation of the binding interface H1:H2 followed by reorganization across the transition state to consolidate long-range contacts with H3.

Fig. 3.

Fig. 3.

Degree of local order, 〈qi〉, for helix and turn residues (a) and degree of interhelical contact order, 〈Qij〉/QijN, across helices H1:H2, H3:H2, and H1:H3 near the melting transition (b). Keys identify the data for residue index i or index pair i: j. Helix H2 and the C-terminal part of H1 form early, but a indicates that some reorganization occurs in H1 and the turns as the transition is crossed at q = 32. In b, H1:H2 contacts form early, then H3:H2 and finally H1:H3 contacts form during the reorganization process described in a.

Fig. 4.

Fig. 4.

Two-dimensional projections of the folding network. (a) Calculated φ values for the wild-type (blue line) and perturbed (black line) contact matrices are shown together with the wild-type melting values (green line). Measured φ values for surface (open symbols) and core (filled symbols) residues are from ref. 16. The level of agreement between the perturbed and measured φ values is about the same as in ref. 29. (b and c) Free energy landscapes for two contact models described in Methods. The landscapes are a projection of the folding network onto the coordinates qN = Σi≤N/2 qi and qC = Σi>N/2 qi, where n = 60. (b) The wild-type NMR structure. (c) The same structure weighted by the function (1 + 5/rij)/2 for ξ corresponding roughly to 2 M GdmCl at 25°C (levels on the landscapes are separated by ∼0.1kT).

Discussion

Recently, Itoh and Sasai (29) provided a thorough description of BdpA folding by using an exact Muñoz–Eaton model. Here we have followed their approach of fitting ΔFD-N to stability measurements to allow a clear comparison of results. Their model predicts that the symmetry of the transition state is broken as ξ is brought through the melting point. However, this process is accompanied by large changes in φ values that are not then observed in the φi(T) measurements of Sato and Fersht (17). Actually, all of the models above appear to be describing a similar effect, namely weak frustration between two forms of the protein (22, 34). What separates our model is basically that (i) the contact matrix is calculated to reflect the change in exposed surface area (32), and (ii) the largest contribution to φi consists of the diagrams Γn≠0 that allow unfolded loops.

The discrepancies in ΔF‡-D seen here and in the α-carbon model (42) seem to indicate missing solvent/cooperative effects; however, the all-atom model folds across a barrier of only ∼1–2kT at midpoint. This situation can be explained somewhat by the conditions of the simulation (13), but it still seems to allow for barrierless folding (28, 49) at high stability, and a calculation of the diffusion constant along the lines of ref. 43 would be helpful here. At the same time, it would be interesting to explore different extensions of the n-cluster model (couplings between Qij and space-local order) as a means to fit the system to chevron data, and it seems likely that this approach would improve the level of agreement with other measured properties of the free energy landscape.

Methods

To calibrate the model, we applied it to several small proteins where the landscape had been described by (i) all-atom simulations or (ii) otherwise accurate models where φ values were available (13, 14, 5052). Figs. 24 were calculated by using δ = 0, Δs = 1k (29), and segments of two residues. The results were insensitive to δ and weakly dependent on the ln(l) part of the entropy cost [adjusting this term to account for excluded volume (36) affects mainly the size of the barrier], but neglecting the extension term r2/la2 (25) can lead to more qualitative changes in the landscape (see SI Text and SI Fig. 7).

The contact matrix is calculated to reflect the change in exposed surface area on folding (32). A contact is registered in QijN between a pair of atoms in residues i and j if the atoms are within a distance rc in the NMR structure of the protein and |ij| > 1. The contact radii are selected to allow a water molecule between atoms (25): rc = 6 Å for pairs of heavy atoms, 5 Å for heavy and hydrogen atoms, and 4 Å for hydrogens [including hydrogens (53) leads to more cooperative folding]. The unperturbed contact matrix is calculated from the wild-type NMR structure for comparison with ref. 29. In Fig. 4c we weight QijN with the function (1 + 5/rij)/2 to improve the φ values at 2 M GdmCl. This model is similar in all respects the model in Figs. 24b except for the details of the transition. The better resolved NMR structure of the Y15W mutant (16) leads to more of an imbalance between H1 and H3 φ values, but the folded region of the landscape then contains two native-like ensembles similar to what is observed in the all-atom simulations (13). Weighting these contacts by inverse Cα distance did not influence the balance between H1 and H3.

Supplementary Material

Supporting Information

ACKNOWLEDGMENTS.

We thank Javier Sancho for helpful comments contributing to this work.

Footnotes

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

This article contains supporting information online at www.pnas.org/cgi/content/full/0705707105/DC1.

References

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supporting Information
pnas_0705707105_1.pdf (5.5MB, pdf)
pnas_0705707105_4.pdf (39.9KB, pdf)
pnas_0705707105_2.pdf (1.6MB, pdf)
pnas_0705707105_3.pdf (1.6MB, pdf)

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES