Abstract
Previously, we employed a Maxwell counting distance constraint model (McDCM) to describe α-helix formation in polypeptides. Unlike classical helix-coil transition theories, the folding mechanism derives from nonadditivity in conformational entropy caused by rigidification of molecular structure as intramolecular cross-linking interactions form along the backbone. For example, when a hydrogen bond forms within a flexible region, both energy and conformational entropy decrease. However, no conformational entropy is lost when the region is already rigid because atomic motions are not constrained further. Unlike classical zipper models, the same mechanism also describes a coil-to-β-hairpin transition. Special topological features of the helix and hairpin structures allow the McDCM to be solved exactly. Taking full advantage of the fact that Maxwell constraint counting is a mean field approximation applied to the distribution of cross-linking interactions, we present an exact transfer matrix method that does not require any special topological feature. Upon application of the model to proteins, cooperativity within the folding transition is yet again appropriately described. Notwithstanding other contributing factors such as the hydrophobic effect, this simple model identifies a universal mechanism for cooperativity within polypeptide and protein-folding transitions, and it elucidates scaling laws describing hydrogen-bond patterns observed in secondary structure. In particular, the native state should have roughly twice as many constraints as there are degrees of freedom in the coil state to ensure high fidelity in two-state folding cooperativity, which is empirically observed.
Introduction
The past decade has seen an explosion in experimental (1–5) and computational (6,7) studies on protein dynamics and thermodynamic populations for better understanding of functional mechanisms. From these studies, it is clear entropy plays a key mechanistic role for cooperative structural transitions in proteins regarding allosteric events (8). Moreover, entropy can also modulate cooperativity in ligand binding by shifting the thermodynamic stability of a complex through the accessible conformational ensemble (9). It is known that changes in entropy and rigidification of a protein are both modulated by osmolytes and other solvation effects (10). Unfortunately, fundamental relationships among entropy, thermodynamic stability, molecular rigidity, and cooperativity remain poorly understood (11–14). In this report, the thermodynamic nature of folding cooperativity is addressed quantitatively.
Identification of the relevant degrees of freedom (DOF) and constraints within a polypeptide chain is fundamental to a complete understanding of the origin of cooperativity. A completely unfolded chain constrained only by quenched covalent bonding defines a flexible coil state. The number of DOF decreases as the chain folds into a compact structure because new constraints form due to cross-linking hydrogen bonds (H-bonds) and packing. Levinthal's paradox explains why folding cooperativity is necessary for proteins to fold on observed timescales, rather than timescales that would span longer than the age of the universe in its absence (15). Folding cooperativity implies the progress of a chain segment to fold into specific native structure depends on the folding progress of other segments.
Specific microscopic mechanisms are commonly invoked to model the folding process under native state conditions. The process of a polypeptide to transition from a coil to helix is modeled as a coupling between neighboring amino acids that bias the conformation toward the α-helix state through a nucleation/propagation process (16,17). To fold proteins that are soluble in aqueous solution requires a hydrophobic collapse (18,19), and this mechanism is commonly thought to be the primary thermodynamic driving force. However, the hydrophobic effect also applies to the stability of the nonspecific molten globule collapsed state, meaning it cannot be the sole driving force. Folding processes require specific arrangement of atomic interactions. Consequently, additional mechanisms must be invoked to describe the self-organization of structure, such as diffusion-collision (20,21) or posit of a hierarchical sequence of events starting from coil to the onset of secondary structure to tertiary structure (22–24).
A fundamental question is whether different microscopic processes for self-organization share an underlying mechanism that serves as the origin of folding cooperativity. Manifested as different folding processes that could dramatically affect kinetics (25), we present a simple model that suggests the origin of folding cooperativity is a thermodynamic driving force caused by the link between nonadditivity in conformational entropy and flexibility/rigidity within molecular structure. By employing an all-atom-based distance constraint model (DCM) within a mean field approximation, the essential features of structural phase transitions observed in proteins and polypeptides are captured.
Based on profound insight of Maxwell (26), mechanical stability of a network with uniformly distributed constraints can be characterized by constraint counting, requiring no detailed knowledge of the network. In modern terms, Maxwell constraint counting is a mean field approximation that treats the density of constraints within a network as uniform, where local constraint density fluctuations are ignored. Applied to proteins, Maxwell counting qualitatively describes the unfolding process as rigidity is lost (27,28).
Although the connection between the rigidity transition and two-state behavior was previously made using an athermal model (28), the McDCM differs fundamentally because the quantitative link between independent constraints and conformational entropy (29) allows a statistical mechanics treatment for the calculation of thermodynamic stability and thermodynamic response functions, such as heat capacity. Recently (27), we demonstrated that the Maxwell counting DCM (McDCM) well describes cooperativity within the α-helix/coil transition, thus indicating that this approximation is not too severe to capture essential features of two-state folding.
Because atomic interactions form within the molecular structure, this corresponds to constraints being added to a network. Under the Maxwell counting approximation, constraints added anywhere to the network remove DOF associated with a drop in conformational entropy until the network becomes rigid. Thereafter, additional constraints are redundant and pay no entropic price. As a result, Fig. 1 shows the expected competition between two free energy basins that emerge at intermediate temperatures corresponding to the folded and unfolded states. An important aspect of Fig. 1 is that it directly relates the microscopic interactions that form within a protein to its thermodynamic properties. In this report, we demonstrate that the McDCM appropriately describes cooperativity within both polypeptide and protein-folding transitions. Moreover, our results uncover scaling laws that maintain the universal applicability of the approach.
Distance Constraint Model
Based on a free energy decomposition scheme combined with constraint theory (29), a distance constraint model (DCM) is a statistical mechanical model that explicitly accounts for nonadditivity in conformational entropy by modeling atomic interactions as a network of distance constraints. Fluctuating interactions are each associated with a component energy and entropy. Although total energy of a given network is the sum of constituent energy components, the conformational entropy is nonadditive (30,31) due to correlated motions.
Additive models neglect correlations in atomic motion that extend throughout the protein and therefore overestimate conformational entropy. In response, the DCM relates nonadditivity of conformational entropy to rigid and flexible regions within molecular structure depending on the numbers and types of interactions that form. For example, cross-linking disulfide bonds and H-bonds critically affect the formation of flexible and rigid regions. Of particular importance are interactions that break and form due to thermodynamic fluctuations. A reduction in conformational entropy does not occur when atomic interactions form within rigid regions because atomic motions are not further restricted. The DCM provides a fast estimate for the total conformational entropy of the macromolecule (described below) without recourse to computationally expensive simulations that sample the available phase space.
Because atomic interactions form and break due to thermal fluctuations, an ensemble of mechanical frameworks must be considered. Each framework, F, is composed of A atoms and C distance constraints. In the simplest case, an energy of formation, ɛk, and maximal entropy cost, Rγk, are assigned to the kth distance constraint (for 1 ≤ k ≤ C). Here, R is the universal gas constant and γk is a dimensionless pure entropy. The energy of a framework, E(F), is obtained by summing over all C energy contributions. The entropy of a framework is given by
where So is the reference entropy of a polypeptide chain in the coil state, and Sc(F) is the entropic cost of adding constraints. The conformational entropy that the chain loses is Sc(F) = ln (Ωc), where Ωc is the excluded phase space because of constraints placed on atomic motions. In the case of one constraint per interaction, it is convenient to define ηk(F) = 1 or 0 when the kth constraint is present or not, respectively. Whenever a constraint is present, it can be independent, νk(F) = 1, or redundant, νk(F) = 0. Thus, (0,0), (1,0), and (1,1) are valid combinations for (ηk, νk). Quenched constraints that model covalent bonds are critical to the properties of rigidity, but their energy and entropy contributions are constant across all terms in the partition function, and therefore factor out. Consequently, the free energy
for framework F relative to the coil state is of interest, and it is expressed in terms of fluctuating distance constraints only, given by
(1) |
The first term in Eq. 1 is linear over all distance constraints. The second term in Eq. 1 is nonlinear because the kth distance constraint does not reduce conformational entropy when it is placed in a rigid region (i.e., νk(F) = 0). The formation of a rigid region will depend upon all other distance constraints within the network through a nonlocal collective effect. The details regarding size, shape, and location of rigid and flexible regions depend on the number of constraints, C(F), and how they are distributed within framework F.
All possible arrangements of constraints define the ensemble of mechanical frameworks. Independent and redundant constraints are identified within a given framework using graph-rigidity algorithms (32,33). Summing over the entropy contributions from independent distance constraints (i.e., only for the terms with νk(F) = 1) yields an upper bound estimate for S(F). However, in graph-rigidity algorithms, the order that constraints are placed in the network affects which constraint is identified as independent or redundant. As such, many different upper-bound estimates can be obtained. The best estimate that can be made is to determine the lowest possible upper-bound estimate.
The DCM obtains the lowest possible upper-bound estimate for conformational entropy by first sort-ordering the constraints in terms of their maximal entropy cost, γk, from greatest to smallest. Second, in the graph-rigidity algorithm, distance constraints are placed one at a time in the network with the preferential ordering that γk ≥ γk+1. Details can be found in Jacobs et al. (29). Note that although the mathematical process of preferentially placing constraints in the network is required to estimate a thermodynamic quantity, the kinetics of constraints forming is irrelevant. Because conformational entropy is a thermodynamic state function, all that matters is which constraints are present. The complex issue of accounting for nonadditivity in conformational entropy, S(F), is rendered to a sorting process involving all constraints within the network, and summing only over the greatest entropy loss contributions from Ci independent constraints (Ci ≤ C). From constraint theory (34), the number of internal DOF, J, in a system of A atoms in d dimensions with C constraints, of which Cr constraints are redundant, is given by
(2) |
Note that Ci = C – Cr. Graph-rigidity algorithms calculate Ci based on recursively adding constraints to a network one at a time (32,33), which conveniently couples to the presorting process. Consequently, S(F) decreases as the number of DOF decreases.
Using standard formulas from statistical mechanics, the probability for framework F is tied to its Boltzmann weight normalized by the partition function, Q, given by
(3) |
where β = 1/RT. Calculating Q is a formidable task due to the astronomical number of graph-rigidity calculations on all accessible mechanical frameworks. Nevertheless, the most probable state (or competitive states across a first-order transition) dominates the total contribution of the sum. Hence, only a tiny fraction of frameworks have appreciable contribution to the partition function, and characterize physical observables. For this reason, a template structure is used to define fluctuating interactions, Nf, that are allowed to break and form independently to create a subensemble of 2Nf accessible frameworks. The native structure is appropriate to use as a template for describing protein stability and the folding/unfolding process (25). Using this native state ensemble, efficient computational methods were developed to calculate Q accurately (29,35). With phenomenological parameters, the DCM robustly reproduces excess heat capacity curves for protein folding (36). In addition, myriad applications involving studies of protein stability, flexibility, and their relationships lead to successful comparison with experiments (37–42).
Despite demonstrated utility of the DCM, an important question is whether the lowest upper-bound estimate for conformational entropy provides sufficient accuracy to predict protein stability. This concern is warranted because the model requires fitting parameters, similar to how the Lifson and Roig model (16) is applied to the helix-coil transition. However, this concern is mitigated by the robustness of the fitting parameters across a diverse set of systems. Moreover, by comparing to an exact geometry-based rigidity calculation (43) , it was shown that the key approximation involving the lowest upper bound estimate for conformational entropy involves systematic errors that are compensated by transferable parameters. In short, the DCM captures essential mechanisms of cooperativity, and it is not simply a matter of curve fitting.
Maxwell Counting
In previous works the approach described above was shown to be quantitatively robust in modeling folding cooperativity across a variety of polypeptide and protein length scales, including: the α-helix/coil transition (27,29,44), the β-turn/coil transition (45), and folding in globular protein structures (35,36). However, due to varying scope, these investigations decomposed molecular structure into mechanical frameworks differently, and used different methods to calculate the partition function. For example, a transfer matrix method was developed for the α-helix/coil transition due to its one-dimensional character (29), whereas a hybrid method combining mean-field theory with Monte Carlo sampling was developed for three-dimensional protein structures (35).
The process of characterizing network rigidity is greatly simplified within the McDCM because its mean-field character allows J to be determined by Maxwell counting (26) by replacing Eq. 2 with the formula
(4) |
where d = 3 reflects three-dimensional space. The meaning of Eq. 4 is that all constraints are assumed to be independent until the entire molecular structure is globally rigid, at which point all further constraints are redundant. Consequently, the number of independent DOF within the network is a decreasing linear function in the number of constraints, C, present in the network until the number of internal DOF is zero, as shown in Fig. 1 A. We define the Maxwell level, M, as the minimum value of C that rigidifies the network corresponding to when J = 0 and Cr = 0. The network is flexible when C < M, just rigid at C = M, and overconstrained when C > M. When the network is overconstrained, the number of redundant constraints is given as Cr = C – M. All the constraints that are recursively added one at a time to the network before the Maxwell level is reached are independent, whereas the constraints added thereafter are redundant.
Ignoring local constraint density fluctuations, Maxwell counting approximates the structure as globally flexible or globally rigid, rendering the graph-rigidity calculation into a counting exercise. Combining the preferential rank ordering of entropies, and the Maxwell level as a global criterion for when constraints are independent (flexible state) or redundant (rigid state), the concept of an entropy spectrum defined by accessible {γk}, where the indexing is such that γk ≥ γk+1 for all interactions proves convenient (27). The McDCM is translated into a process that fills entropy levels within a spectrum from bottom to top as constraints are added. With the entropy spectrum and constraint filling procedure, a transfer matrix method to calculate Q for arbitrary molecular geometries becomes possible.
Transfer Matrix Method
The partition function for a molecular system with Nf number of fluctuating interactions can be calculated exactly using a transfer matrix method within the McDCM. To this end, we define the statistical weights, Bk, for independent constraints, and bk, for redundant constraints, based on the Boltzmann factors given by
(5) |
where redundant constraints do not have an entropic penalty. We also introduce the generating function
(6) |
where 1 represents no constraint. For one distance constraint per interaction, fgen expands into 2Nf terms, each of which is a product of C statistical weights where 0 ≤ C ≤ Nf. For a network with Nf ≤ M, all constraints are independent, and Q = fgen. For an overconstrained network with Nf > M, any term with C > M will contain redundant constraints, causing Q ≠ fgen.
To obtain Q, nonadditivity in entropy is accounted for by replacing Bk → bk for all the constraints added beyond the Maxwell level. Fig. 2 illustrates the procedure for a network with three fluctuating interactions (i.e., Nf = 3), and thus Q will consist of eight terms. The number of Bk → bk substitutions in fgen increases exponentially as a function of Cr. The transfer matrix method facilitates exact summation over all terms in the partition function in ∼(Nf)2 operations (vs. ∼ 2Nf).
Three possible cases can occur when the kth constraint is considered; the constraint is 1), not present, or when the constraint is present, then it is 2), independent or it is 3), redundant. To generate all three cases, a transfer matrix, Tk, is constructed as
(7) |
where 1 is the unit matrix. The matrix, Wk, is sparse with nonzero statistical weights that are offset below the diagonal. It is useful to work within a subspace defined by fluctuating constraints of dimension Df. For one distance constraint per interaction, the dimension of the subspace is Df = Nf + 1. In the example shown in Fig. 2, Df = 4, M = 2, and the macrostate vector is generated using the matrices
(8) |
where k = 1, 2, 3 labels the constraints with γ1 > γ2 > γ3. The Bk weights defined in Eq. 5 appear along the offset diagonal in Wk until M is reached; thereafter the constraints are redundant with bk weights. Subsequently, the partition function for the example in Fig. 2 is given as
(9) |
where the multiplication order of the transfer matrix for kth constraint inherits its entropy-rank ordering. When a distance constraint is added to a network with C constraints, the new partition function, Q(C + 1), is related to the current partition function, Q(C), through matrix multiplication that accounts for all possible ways that C can increment by 1.
In general, an interaction may consist of more than one distance constraint. In this case, the transfer matrix represents an interaction rather than a single distance constraint. Then, when the ith interaction forms, associated with ci distance constraints, the transfer matrix, Ti, is employed to relate the partition function Q(C + ci) relative to the current partition function Q(C). For a general molecular structure with Nf fluctuating interactions, the partition function is given as
(10) |
where all entries in the left row vector, 〈l|, are one, and for the right column vector, |r〉, all entries are zero, except the first entry is one. The ith interaction has a formation energy, ɛi, and each independent distance constraint contributes the same maximal entropy loss, γi. The product of Nf number of Ti transfer matrices are ordered such that γi > γi+1. To fill the appropriate levels of the entropy spectrum by constraint filling, the matrix elements of Wi are given as
(11) |
where δk′, k is the Krönecker delta. Again Wi is sparse, where all matrix elements are zero except those that increase the number of constraints in the system by ci when the interaction forms. If C ≤ M − ci, then all the distance constraints are independent. When the number of constraints within the network is within ci constraints from the Maxwell level, it can happen that one or more of these constraints will be redundant. The second exponential term in Eq. 11. accounts for all the possible ways the total number of distance constraints can exceed the Maxwell level. The dimension of the state space (length of the macrostate vector) is given by
The calculation of Q for a 1000-residue protein (i.e., DF ∼ 104) takes just a fraction of a second.
Results and Discussion
Here, the McDCM uses effective parameters to keep it identical to the model employed in Vorov et al. (27) so that transferability of parameters between polypeptides and proteins can be assessed. Namely, 1), native torsions and 2), H-bonds are modeled. In the McDCM, H-bonds include salt bridges. Packing is indirectly modeled through the locking of torsion angles into a native conformation that reduces the available phase space. Native torsions are modeled as one distance constraint with degenerate parameters (ɛnt, γnt). H-bonds are modeled as three distance constraints with degenerate parameters (ɛhb, γhb). The physical regime corresponds to γhb > γnt. These four effective parameters, {ɛnt, γnt, ɛhb, γhb}, represent the smallest number of fitting parameters that are found to robustly fit to excess heat capacity curves. The coupling between these two modes of competing interactions suggest there may be a scaling law that controls their relative numbers in structural motifs as a function of number of residues.
Scaling laws
The free chain reference state has the maximum number of DOF. Formation of native torsions or H-bonds will lower energy, and reduce conformational entropy as the number of DOF decrease. Assuming all native torsion interactions are present, but no H-bonds are formed, the structure will be marginally rigid. Adding native H-bonds to this state will overconstrain the network. The balance between the number of constraints and DOF determines, respectively, the parameters Df and M. For polypeptides, the number of native (backbone) H-bonds is related to the number of residues, Nr, by Nhb = Nr − 4, and the number of backbone ϕ and ψ torsions is given by Nnt = 2Nr. This gives Df ≈ 5Nr and M ≈ 0.40 Df. For the α-helix geometry, side-chain interactions are not included because they do not substantially participate in cross-linking interactions with limited side-chain packing.
Conversely, side-chain interactions must be considered in proteins due to the native structure's tight atomic packing. From sequence alone, M is readily determined as the number of free dihedral angles assuming no cross-linking, M = Nnt, which now includes side-chain dihedrals. It is found that Nnt ∼ 4.8 Nr for each of the four proteins analyzed here, and generally across globular proteins. This scaling estimate is then used as input to the McDCM. Similarly, the number of H-bonds within the native state scales as Nhb ≈ 1/3 M, where the proportionality constant of 1/3 appears in globular protein structures. These relationships lead to Df ≈ 9 Nr and M ≈ 0.53 Df. That is, the Maxwell level, compared to the total number of possible native distance constraints, is approximately half-filling in both proteins and polypeptides.
McDCM parameters
The McDCM incorporates the scaling relations found above to set the subspace dimension and the Maxwell level. The only information about the protein that determines the transfer matrix is the number of residues, and the specific type of structural motif (i.e., α-helix or globular protein). The McDCM is applied to four single-domain proteins. Model parameters are determined by fitting to differential scanning calorimetry Cp data (46–50) shown in Fig. 3 A. The McDCM reproduces the Cp curves markedly well only when the fitting parameters have physically realistic values, although the parameters are not expected to be transferable. Nevertheless, parameter values are consistent across the four considered proteins (compare to Table 1 and Fig. S1 in the Supporting Material). Including our prior results on four polypeptides that undergo the α-helix to coil transition (27), the three parameters {ɛnt, γnt, γhb} are seen to be qualitatively conserved.
Table 1.
Protein | Nres | Nhb | Nnt | ɛhb | ɛnt | γhb | γnt |
---|---|---|---|---|---|---|---|
Protein G | 56 | 269 | 90 | 2.10 | 0.70 | 1.15 | 0.59 |
Ubiquitin | 76 | 365 | 122 | 2.30 | 0.78 | 1.05 | 0.49 |
Thioredoxin | 108 | 518 | 173 | 2.15 | 0.42 | 1.20 | 0.42 |
Lysozyme | 130 | 624 | 208 | 2.17 | 0.56 | 1.25 | 0.40 |
Percent deviation | 3.9 | 25.8 | 7.3 | 18.1 |
Energy parameters have units of kcal/mol, and entropy parameters are dimensionless. The last row gives the percent deviation in the model parameters.
As expected, the least transferable parameter is ɛhb, because it reflects an overall increase in residue solvent accessibility in a helical polypeptide compared to a protein. This effective H-bond energy simultaneously accounts for the stability gained upon formation of an intramolecular H-bond, ɛi, and the cost of breaking competing interactions with solvent, u. As such, the net H-bond energy is given by ɛhb = ɛi − u. The decrease in ɛhb between polypeptides and proteins is explained by differences within u. In polypeptides and protein positions near the solvent interface, the stability gained from an intramolecular H-bond is similar to the energetic cost of breaking solvent H-bonds (51), meaning ɛi ∼ u. However, within the core, compensating H-bonds to solvent are unlikely to occur. Consequently, the average solvent H-bond energy is larger in protein structures, uprot > upoly, as the double-sided arrow in Fig. S1 shows.
Two-state behavior from nonadditivity
Although the fit to experimental Cp curves parameterize the model (compare to Fig. 3 A), this result is not a demonstration of folding cooperativity. Rather, two-state behavior in the free energy landscape is required to show coexistence of the folded and unfolded states near the melting temperature, Tm. As an exemplar case, the free energy of protein G is shown in Fig. 3 B as a function of number of constraints formed, where G(C) = −RT ln [Q(C)] and Q(C) is an entry in the macrostate vector component as Fig. 2 B depicts. At low temperature, the folded state is stabilized by an increase in the number of constraints by lowering energy. At high temperature, the entropy cost of forming interactions surpasses the energetic loss, leading to an unfolded conformational state. One caveat is that if the cross links are too weak they will not reduce atomic motions appreciably, and folding cooperativity can be substantially reduced or eliminated. Thus, the strength of an atomic interaction is important, where (ɛi, γi) represents (depth, width) of a potential energy well. Due to the narrow range of physically realistic parameter values, the most interesting aspects of folding cooperativity derive from properties of constraint networks.
Nonadditivity in conformational entropy occurs because of two options in constraint topology. Either the network is globally flexible with all independent constraints, or it is globally rigid when the number of constraints exceeds the Maxwell level. As previously shown, the McDCM cannot exhibit two-state behavior when there is an insufficient number of cross links for a rigidity transition to take place (27). The empirical scaling laws described above ensure a sufficient number of constraints to facilitate a rigidity transition from a globally floppy to rigid network. In particular, the ratio indicates that proteins and polypeptides are well balanced in the number of native distance constraints for cooperative behavior. As a mathematical result, assuming M is fixed, with Nnt and Nhb as independent variables, the transfer matrix yields maximum cooperativity at , and as shown previously for the helix-coil transition (27), folding cooperativity is lost under the extreme conditions where or . Therefore, the degree of nonadditivity is linked to the relationship between M and Nf. However, M and Nf cannot be independently varied, because the number of fluctuating constraints, Nf, depends on the size of the polypeptide or protein. Recall that
where Nnt is fixed based on the size of the protein, and M ≡ Nnt. Adjusting can be achieved by changing the number of native H-bonds. Because a native structure supports more H-bonds, the ratio will decrease. However, even in the limit that , not all constraints will be redundant, because at least M number of constraints will always be independent.
Interestingly, for the α-helix and for globular proteins are markedly close to 0.5 for maximum cooperativity. As deviates away from empirically observed values, McDCM cannot reproduce Cp with physically realistic parameters, if at all. These results suggest that the nature of H-bond patterns within proteins and polypeptides have evolved to favor a high-degree of cooperativity through structural characteristics that support an optimal balance between constraints and DOF.
For the same size protein (fixed M), it is interesting to consider what happens when the number of H-bonds in the native structure deviates from the observed scaling law. As discussed above, the ratio will depend on the number of native H-bonds. Using the fixed parameters in Table 1, Fig. 4 demonstrates that the free energy barrier between the native and unfolded basins decreases as increases (lower number of H-bonds), and eventually the two basins gradually merge together until no transition occurs. Of course, the Cp curves do not match experiment when we deviate from the empirically determined ratio. Conversely, as more native H-bonds are added (), the free energy barrier increases as the two basins are separated further. At some point, an intermediate state forms (compare to insets in Fig. 4), where multiphasic behavior can appear by reducing the ratio by as little as 15%. However, upon initial onset of the intermediate state, the barrier flattens and the transition quantitatively responds as a two-state folding process. This result is consistent with observation that single domain globular proteins are typically two-state folders.
There are other important features in the free energy landscapes. Unfolded proteins are not predicted to be random coils, meaning C ≠ 0 within the unfolded basin. For example, protein G in the unfolded state at Tm has ∼220 less DOF than the reference coil state. Similarly, proteins are predicted to have many native interactions broken in the folded state. Both ΔG and ΔG‡ values that occur at intermediate temperatures are appropriately small compared to the overall scale. These results underscore the importance of fluctuations in the number of constraints, and suggest that nonadditivity in conformational entropy upon molecular rigidification is a universal mechanism affecting folding cooperativity. Although solvation effects contribute to the folding process (i.e., hydrophobic collapse), it is a mechanism that does not depend on the formation of specific intramolecular interactions characteristic of atomic structure. Conversely, molecular rigidification explains two-state behavior in both polypeptide and protein systems related to the dramatic loss in conformational entropy.
The McDCM in perspective
The McDCM captures the essential physics of folding cooperativity, and due to its simplicity, it takes less than a second of CPU-time for a 1000-residue protein. Improvements can be made by using energy/entropy parameters that depend on local environments in the template structure, use more than one template structure, and/or generalize the transfer matrix to couple constraint density to solvent exposure. In related published works (35–42), the more accurate minimal DCM (mDCM) that employs graph-rigidity algorithms has been extensively used. Before closing, results from the McDCM are compared to those from mDCM, and the need for an extended DCM is discussed.
The most severe approximation in the McDCM is that distance constraints are uniformly distributed within molecular structure. For this discussion, assume favorable interactions readily lower energy and conformational entropy. At low constraint density, the drop in energy from favorable interactions cannot overcome the concomitant drop in conformational entropy. At high constraint density, a large decrease in energy occurs without a concomitant drop in conformational entropy due to redundancy. This means that nucleation of a rigid structure occurs when there is sufficient number of redundant constraints. Separation of regions of low and high constraint density promotes folding cooperativity as an emergent property caused by enthalpy-entropy compensation among molecular interactions that are coupled through network rigidity. In proteins, secondary structure is rich in constraints whereas loops will have lower than average constraint density. The same enthalpy-entropy compensation mechanism applies in the heterogeneously distributed regions that are flexible or rigid, leading to altered properties of the transition state (36). What significance does variation in atomic-interaction effectiveness and fluctuations in spatial constraint density play?
The McDCM predicts the rigidity transition defined by the Maxwell level, M, to coincide with the thermodynamic transition state, and GTS = −RT ln [Q(M)] is at the cusp that separates two free energy wells, as shown in Fig. 3 B for protein. Rader et al. (28) suggested that globular proteins exhibit this coincidence. The McDCM reveals that this coincidence occurs whenever there is nearly perfect two-state folding cooperativity. More generally, the rigidity transition and transition state do not coincide as we reported previously (35,36,40) because the size, shape, and locations of rigid clusters fluctuate within a protein as atomic interactions break and form due to thermal fluctuations. Fluctuations in rigid cluster size reach a maximum at the rigidity transition, similar to the peak in heat capacity that is used to define the melting temperature. The mDCM shows that heterogeneity in the effectiveness of atomic interactions and spatial variation in rigid and flexible regions affects the transition state barrier's height and location. In the mDCM, the rigidity transition can take place before or after the transition state as more distance constraints are added. Interestingly, the folded basin can occur before the rigidity transition (40), indicating that the mDCM allows a protein in its native state to have a flexible folding core. This latter effect has been observed experimentally (12).
A drawback of the McDCM is that detailed information about the rigid cluster decomposition and identification of regions of correlated motion is lost due to the mean field approximation. Nevertheless, it demonstrates that a critical aspect of modeling folding cooperativity is to account for nonadditivity in conformational entropy that links to flexibility and rigidity properties within molecular structure. Whereas the hydrophobic effect (52) in aqueous solution thermodynamically drives globular compaction, rigidification of molecular structure is a universal mechanism that thermodynamically drives self-organization in the folding process. Consequently, modeling only solvation effects yields an incomplete picture of folding cooperativity because it does not discriminate between specific and nonspecific (molten globule) collapse (53).
Due to elective simplifications, the McDCM and mDCM are also incomplete models because they implicitly model solvation effects using nontransferable phenomenological parameters. Although McDCM is an oversimplified version of mDCM, it is important because it highlights an essential mechanical mechanism generally ignored in free energy decomposition schemes. Going forward, network rigidity will be coupled with explicit solvation effects using a more sophisticated DCM that explicitly accounts for solvation, pH, hydrophobic effects, and rigidity (unpublished).
Conclusions
A transfer matrix method was developed to calculate the partition function of a DCM within a mean field approximation where molecular structure is described by uniform flexibility or rigidity characteristics. This approximation replaces the graph-rigidity algorithm by Maxwell constraint counting. Based on this simplification, an entropy spectrum is defined by the native state where entropy levels are occupied or unoccupied because distance constraints representing native interactions are present or not, respectively. This Maxwell counting DCM, referred to as McDCM, underscores the link between nonadditivity in conformational entropy to flexibility and rigidity in molecular structure. Moreover, this effect is unique in its ability to appropriately model cooperativity across length and complexity scales, as illustrated by the helix-coil and protein folding transitions, thus suggesting an important physical significance. For example, scaling laws are found that connect H-bond patterns observed in secondary structure to folding cooperativity. A ratio near 1:2 for the DOF in a coil state to maximum distance constraints in the native state will have a high fidelity two-state folding cooperativity. We suggest maintaining this balance is an evolutionary constraint that leads to the ubiquity of cooperativity.
Acknowledgments
We thank Dr. Hui Wang insightful comments regarding this manuscript.
This work is supported by National Institutes of Health grant No. R01 GM070382.
Contributor Information
Dennis R. Livesay, Email: drlivesa@uncc.edu.
Donald J. Jacobs, Email: djacobs1@uncc.edu.
Supporting Material
References
- 1.Bartlett A.I., Radford S.E. An expanding arsenal of experimental methods yields an explosion of insights into protein folding mechanisms. Nat. Struct. Mol. Biol. 2009;16:582–588. doi: 10.1038/nsmb.1592. [DOI] [PubMed] [Google Scholar]
- 2.Baldwin A.J., Kay L.E. NMR spectroscopy brings invisible protein states into focus. Nat. Chem. Biol. 2009;5:808–814. doi: 10.1038/nchembio.238. [DOI] [PubMed] [Google Scholar]
- 3.Grzesiek S., Sass H.J. From biomolecular structure to functional understanding: new NMR developments narrow the gap. Curr. Opin. Struct. Biol. 2009;19:585–595. doi: 10.1016/j.sbi.2009.07.015. [DOI] [PubMed] [Google Scholar]
- 4.Schuler B., Eaton W.A. Protein folding studied by single-molecule FRET. Curr. Opin. Struct. Biol. 2008;18:16–26. doi: 10.1016/j.sbi.2007.12.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Neylon C. Small angle neutron and x-ray scattering in structural biology: recent examples from the literature. Eur. Biophys. J. 2008;37:531–541. doi: 10.1007/s00249-008-0259-2. [DOI] [PubMed] [Google Scholar]
- 6.Klepeis J.L., Lindorff-Larsen K., Shaw D.E. Long-timescale molecular dynamics simulations of protein structure and function. Curr. Opin. Struct. Biol. 2009;19:120–127. doi: 10.1016/j.sbi.2009.03.004. [DOI] [PubMed] [Google Scholar]
- 7.Sherwood P., Brooks B.R., Sansom M.S. Multiscale methods for macromolecular simulations. Curr. Opin. Struct. Biol. 2008;18:630–640. doi: 10.1016/j.sbi.2008.07.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Itoh K., Sasai M. Entropic mechanism of large fluctuation in allosteric transition. Proc. Natl. Acad. Sci. USA. 2010;107:7775–7780. doi: 10.1073/pnas.0912978107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Kale S., Jordan F. Conformational ensemble modulates cooperativity in the rate-determining catalytic step in the E1 component of the Escherichia coli pyruvate dehydrogenase multienzyme complex. J. Biol. Chem. 2009;284:33122–33129. doi: 10.1074/jbc.M109.065508. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Pais T.M., Lamosa P., Santos H. Relationship between protein stabilization and protein rigidification induced by mannosylglycerate. J. Mol. Biol. 2009;394:237–250. doi: 10.1016/j.jmb.2009.09.012. [DOI] [PubMed] [Google Scholar]
- 11.Kamerzell T.J., Middaugh C.R. The complex inter-relationships between protein flexibility and stability. J. Pharm. Sci. 2008;97:3494–3517. doi: 10.1002/jps.21269. [DOI] [PubMed] [Google Scholar]
- 12.LeMaster D.M., Tang J., Hernández G. Enhanced thermal stability achieved without increased conformational rigidity at physiological temperatures: spatial propagation of differential flexibility in rubredoxin hybrids. Proteins. 2005;61:608–616. doi: 10.1002/prot.20594. [DOI] [PubMed] [Google Scholar]
- 13.Liu F., Maynard C., Gruebele M. A natural missing link between activated and downhill protein folding scenarios. Phys. Chem. Chem. Phys. 2010;12:3542–3549. doi: 10.1039/b925033f. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Whitty A. Cooperativity and biological complexity. Nat. Chem. Biol. 2008;4:435–439. doi: 10.1038/nchembio0808-435. [DOI] [PubMed] [Google Scholar]
- 15.Levinthal C. Are there pathways to protein folding? J. Chim. Phys. 1968;65:44–45. [Google Scholar]
- 16.Lifson S., Roig A. On the theory of helix-coil transitions in polypeptides. J. Chem. Phys. 1961;34:1963–1974. [Google Scholar]
- 17.Zimm B.H., Bragg J.K. Theory of the phase transition between helix and random coil in polypeptide chains. J. Chem. Phys. 1959;31:526–535. [Google Scholar]
- 18.Miranker A.D., Dobson C.M. Collapse and cooperativity in protein folding. Curr. Opin. Struct. Biol. 1996;6:31–42. doi: 10.1016/s0959-440x(96)80092-3. [DOI] [PubMed] [Google Scholar]
- 19.Sadqi M., Lapidus L.J., Muñoz V. How fast is protein hydrophobic collapse? Proc. Natl. Acad. Sci. USA. 2003;100:12117–12122. doi: 10.1073/pnas.2033863100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Karplus M., Weaver D.L. Protein folding dynamics: the diffusion-collision model and experimental data. Protein Sci. 1994;3:650–668. doi: 10.1002/pro.5560030413. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Weikl T.R., Palassini M., Dill K.A. Cooperativity in two-state protein folding kinetics. Protein Sci. 2004;13:822–829. doi: 10.1110/ps.03403604. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Daggett V., Fersht A.R. Is there a unifying mechanism for protein folding? Trends Biochem. Sci. 2003;28:18–25. doi: 10.1016/s0968-0004(02)00012-9. [DOI] [PubMed] [Google Scholar]
- 23.Dill K.A., Ozkan S.B., Voelz V.A. The protein folding problem: when will it be solved? Curr. Opin. Struct. Biol. 2007;17:342–346. doi: 10.1016/j.sbi.2007.06.001. [DOI] [PubMed] [Google Scholar]
- 24.Uzawa T., Nishimura C., Wright P.E. Hierarchical folding mechanism of apomyoglobin revealed by ultra-fast H/D exchange coupled with 2D NMR. Proc. Natl. Acad. Sci. USA. 2008;105:13859–13864. doi: 10.1073/pnas.0804033105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Portman J.J. Cooperativity and protein folding rates. Curr. Opin. Struct. Biol. 2010;20:11–15. doi: 10.1016/j.sbi.2009.12.013. [DOI] [PubMed] [Google Scholar]
- 26.Maxwell J.C. On the calculation of the equilibrium and stiffness of frames. Philos. Mag. 1864;27:294–299. [Google Scholar]
- 27.Vorov O.K., Livesay D.R., Jacobs D.J. Helix/coil nucleation: a local response to global demands. Biophys. J. 2009;97:3000–3009. doi: 10.1016/j.bpj.2009.09.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Rader A.J., Hespenheide B.M., Thorpe M.F. Protein unfolding: rigidity lost. Proc. Natl. Acad. Sci. USA. 2002;99:3540–3545. doi: 10.1073/pnas.062492699. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Jacobs D.J., Dallakyan S., Heckathorne A. Network rigidity at finite temperature: relationships between thermodynamic stability, the nonadditivity of entropy, and cooperativity in molecular systems. Phys. Rev. E Stat. Nonlin. Soft Matter Phys. 2003;68:061109. doi: 10.1103/PhysRevE.68.061109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Dill K.A. Additivity principles in biochemistry. J. Biol. Chem. 1997;272:701–704. doi: 10.1074/jbc.272.2.701. [DOI] [PubMed] [Google Scholar]
- 31.Mark A.E., van Gunsteren W.F. Decomposition of the free energy of a system in terms of specific interactions. Implications for theoretical and experimental studies. J. Mol. Biol. 1994;240:167–176. doi: 10.1006/jmbi.1994.1430. [DOI] [PubMed] [Google Scholar]
- 32.Jacobs D.J., Rader A.J., Thorpe M.F. Protein flexibility predictions using graph theory. Proteins. 2001;44:150–165. doi: 10.1002/prot.1081. [DOI] [PubMed] [Google Scholar]
- 33.Jacobs D.J., Thorpe M.F. Generic rigidity percolation: the pebble game. Phys. Rev. Lett. 1995;75:4051–4054. doi: 10.1103/PhysRevLett.75.4051. [DOI] [PubMed] [Google Scholar]
- 34.Thorpe M.F., Duxbury P.M. Kluwer Academic/Plenum Publishers; New York: 1999. Rigidity Theory and Applications. [Google Scholar]
- 35.Jacobs D.J., Dallakyan S. Elucidating protein thermodynamics from the three-dimensional structure of the native state using network rigidity. Biophys. J. 2005;88:903–915. doi: 10.1529/biophysj.104.048496. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Livesay D.R., Dallakyan S., Jacobs D.J. A flexible approach for understanding protein stability. FEBS Lett. 2004;576:468–476. doi: 10.1016/j.febslet.2004.09.057. [DOI] [PubMed] [Google Scholar]
- 37.Livesay D.R., Jacobs D.J. Conserved quantitative stability/flexibility relationships (QSFR) in an orthologous RNase H pair. Proteins. 2006;62:130–143. doi: 10.1002/prot.20745. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Jacobs D.J., Livesay D.R., Tasayco M.L. Elucidating quantitative stability/flexibility relationships within thioredoxin and its fragments using a distance constraint model. J. Mol. Biol. 2006;358:882–904. doi: 10.1016/j.jmb.2006.02.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Livesay D.R., Huynh D.H., Jacobs D.J. Hydrogen bond networks determine emergent mechanical and thermodynamic properties across a protein family. Chem. Cent. J. 2008;2:17. doi: 10.1186/1752-153X-2-17. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Mottonen J.M., Xu M., Livesay D.R. Unifying mechanical and thermodynamic descriptions across the thioredoxin protein family. Proteins. 2009;75:610–627. doi: 10.1002/prot.22273. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Mottonen J.M., Jacobs D.J., Livesay D.R. Allosteric response is both conserved and variable across three CheY orthologs. Biophys. J. 2010;99:2245–2254. doi: 10.1016/j.bpj.2010.07.043. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Verma D., Jacobs D.J., Livesay D.R. Predicting the melting point of human C-type lysozyme mutants. Curr. Protein Pept. Sci. 2010;11:562–572. doi: 10.2174/138920310794109210. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Vorov O.K., Livesay D.R., Jacobs D.J. Conformational entropy of an ideal cross-linking polymer chain. Entropy (Basel) 2008;10:285–308. doi: 10.3390/e10030285. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Jacobs D.J., Wood G.G. Understanding the α-helix to coil transition in polypeptides using network rigidity: predicting heat and cold denaturation in mixed solvent conditions. Biopolymers. 2004;75:1–31. doi: 10.1002/bip.20102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Jacobs D.J., Fairchild M.J. Progress in Biopolymer Research. Nova Science Publishers; Hauppauge, NY: 2007. Thermodynamics of a β-hairpin to coil transition: application of free energy decomposition and constraint theory; pp. 45–76. [Google Scholar]
- 46.Richardson J.M., Makhatadze G.I. Temperature dependence of the thermodynamics of helix-coil transition. J. Mol. Biol. 2004;335:1029–1037. doi: 10.1016/j.jmb.2003.11.027. [DOI] [PubMed] [Google Scholar]
- 47.Takano K., Ogasahara K., Yutani K. Contribution of hydrophobic residues to the stability of human lysozyme: calorimetric studies and x-ray structural analysis of the five isoleucine to valine mutants. J. Mol. Biol. 1995;254:62–76. doi: 10.1006/jmbi.1995.0599. [DOI] [PubMed] [Google Scholar]
- 48.Honda S., Kobayashi N., Uedaira H. Fragment reconstitution of a small protein: folding energetics of the reconstituted immunoglobulin binding domain B1 of streptococcal protein G. Biochemistry. 1999;38:1203–1213. doi: 10.1021/bi982271g. [DOI] [PubMed] [Google Scholar]
- 49.Georgescu R.E., Garcia-Mira M.M., Sanchez-Ruiz J.M. Heat capacity analysis of oxidized Escherichia coli thioredoxin fragments (1–73, 74–108) and their noncovalent complex. Evidence for the burial of apolar surface in protein unfolded states. Eur. J. Biochem. 2001;268:1477–1485. doi: 10.1046/j.1432-1327.2001.02014.x. [DOI] [PubMed] [Google Scholar]
- 50.Wintrode P.L., Makhatadze G.I., Privalov P.L. Thermodynamics of ubiquitin unfolding. Proteins. 1994;18:246–253. doi: 10.1002/prot.340180305. [DOI] [PubMed] [Google Scholar]
- 51.Efimov A.V., Brazhnikov E.V. Relationship between intramolecular hydrogen bonding and solvent accessibility of side-chain donors and acceptors in proteins. FEBS Lett. 2003;554:389–393. doi: 10.1016/s0014-5793(03)01189-x. [DOI] [PubMed] [Google Scholar]
- 52.Chan H.S., Bromberg S., Dill K.A. Models of cooperativity in protein folding. Philos. Trans. R. Soc. Lond. B Biol. Sci. 1995;348:61–70. doi: 10.1098/rstb.1995.0046. [DOI] [PubMed] [Google Scholar]
- 53.Bryngelson J.D., Onuchic J.N., Wolynes P.G. Funnels, pathways, and the energy landscape of protein folding: a synthesis. Proteins. 1995;21:167–195. doi: 10.1002/prot.340210302. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.