Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2012 Nov 2.
Published in final edited form as: J Am Chem Soc. 2011 Oct 6;133(43):17463–17472. doi: 10.1021/ja207506z

Prediction of Native State Hydrogen Exchange from Perfectly Funneled Energy Landscapes

Patricio O Craig †,, Joachim Lätzer #, Patrick Weinkam , Ryan MB Hoffman †,, Diego U Ferreiro |, Elizabeth A Komives , Peter G Wolynes †,‡,*,§
PMCID: PMC3203634  NIHMSID: NIHMS325921  PMID: 21913704

Abstract

Simulations based on perfectly funneled energy landscapes often capture many of the kinetic features of protein folding. We examined whether simulations based on funneled energy functions can also describe fluctuations in native state protein ensembles. We quantitatively compared the site-specific local stability determined from structure-based folding simulations, with HX protection factors measured experimentally for ubiquitin, CI2, and Staphylococcal nuclease. Different structural definitions for the open and closed states based on the number of native contacts for each residue, as well as the hydrogen bonding state or a combination of both criteria were evaluated. The predicted exchange patterns agree with the experiments under native conditions indicating that protein topology indeed has a dominant effect on the exchange kinetics. Insights into the simplest mechanistic interpretation of the amide exchange process were thus obtained.

Keywords: Structure-based models, Hydrogen exchange, Energy landscapes, Protein dynamics, Folding

Introduction

The energy landscape theory of protein folding and the minimum frustration principle establish a powerful framework for understanding the dynamic behavior of macromolecules and the processes in which they are involved1,2. Simulations based on perfectly funneled landscapes using coarse grained models (often referred to as Gō models) have been able to capture many of the fundamental aspects of the folding process3-6. The structures of transition state ensembles 7-9, folding intermediates 7,10, and the mechanisms of dimerization 11,12 and domain swapping 13 have all been well predicted by models where frustration has been entirely removed and topological information about the native state is the sole input. Although these results validate the global accuracy of funneled landscapes, it remains unclear whether such simple models can capture the finer details of dynamic motions in proteins. One test of such a simple landscape description is provided by the quantitative comparison between simulation results and detailed experimental parameters measured at the single residue level. Phi value analysis and hydrogen exchange measurements are some of the experimental data that provide this type of information at different depths on the folding funnel.

Hydrogen/deuterium exchange of backbone amides (HX) is well known to depend on both the structural and dynamic properties of proteins. Amide protons of the polypeptide backbone that are involved in hydrogen-bonds and/or are sequestered within the protein structure have solvent exchange rates that are orders of magnitude slower than exchange rates in an unfolded polypeptide14-16. By analyzing the HX rate of individual amide protons, detailed structural and dynamic information can be obtained for many locations along the polypeptide chain. In the local unfolding model of HX, the amides alternate between closed (exchange incompetent) and open (exchange competent) states depending on fluctuations of their local environment 17-19.

[ClosedH]kclkop[OpenH]kch[OpenD] (1)

According to this model, under steady state conditions (kclkop and/or kclkch) the experimentally observable H/D exchange rate (kex) is expressed as a function of the opening, and closing rates (kop and kcl, respectively), and the intrinsic rate of exchange (kch) of the amide when it is unprotected in an ideal standard unfolded and thus solvent exposed state20

kex=kop.kchkop+kcl+kch (2)

The protection factor (Pf) is the ratio kch/kex and is a measurement of the decrease in the exchange rate of a residue in the protein structure compared to its intrinsic rate in an unfolded state. By rearrangement of Eq. 2 we obtain:

Pf=1+kch+kclkop (3)

In the EX1 limit (high pH, high temperature, or low stability) kcl ≪ kch, and Eq. 3 reduces to:

Pf=1+kchkop (4)

While in the EX2 limit (low pH and temperature) kcl ≫ kch and then:

Pf=1+kclkop (5)

In particular, when stable conformations are analyzed and kop ≪ kcl, Eq. 4 and 5 are further reduced to Eq. 6 and 7 for the EX1 and EX2 limits, respectively:

kex=kop (6)
Pf=kclkop=1Kop (7)

Assuming an EX2 mechanism and a stable conformation, protection factors directly scale with the inverse of the equilibrium constant of the unfolding transition required for exchange to occur 15,21, providing therefore information about the free energy change associated with this process.

ΔGHX=kBTlnKop=kBTlnPf

The size of the structural fluctuations for the opening reaction range between local, subglobal and global unfolding transitions. Protection factors less than those predicted from the global stability arise from partial unfolding or local fluctuations of the backbone 15,21. In principle, all possible states are accessible at any experimental condition, but the relative populations, given by the Boltzmann distribution, depend on the free energy of each state.

In the past, there have been attempts to correlate the hydrogen exchange protection factors with the fluctuations observed in molecular dynamic simulations22. However, most of these analyses used full atomistic models that are limited to the analysis of small fluctuations around the native state. In such analyses, little or no information is obtained about large scale unfolding transitions to partially or totally unfolded states associated with the exchange process preventing a thorough evaluation of the structural features of the open states. The use of coarse grain structure based models simulations represents an advantage over all atom molecular mechanics simulations since they allow an intensive sampling of the full range of conformational fluctuations.

In this work we compare the conformational dynamics of protein structures simulated using perfectly funneled structure-based models, and the HX protection factors of the backbone amide hydrogens measured under native conditions giving insights into the structure of the open (exchange competent) state at the residue level in whole proteins.

Results and Discussion

Analysis of the protection factor from simulations

With the aim of obtaining a detailed interpretation of the backbone amide HX process measured under native conditions, we carried out simulations of a set of three proteins for which NMR-based HX data are available both for fast and slowly exchanging amides (human ubiquitin, chymotrypsin inhibitor 2 (CI2), and Staphylococcal nuclease (SN)). The simulations used a coarse grained structure-based model with a perfectly funneled landscape based on a homogeneous contact potential and a moderate amount of non additivity (for more details see Methods). We first focused on ubiquitin to develop the analysis. Ubiquitin is a small cytoplasmic protein (76 residues, MW: 8565 Da) that has been extensively studied by HX under native conditions both by D2O dilution experiments followed by HSQC at pH 3.523, 5.824, and 6.225 (that measures rates of the slowly exchanging amide protons) and phase-modulated clean chemical exchange (CLEANEX-PM) in the 5.5-10.0 pH range26 (that measures rates of the rapidly exchanging amides). These data were combined as described in the Methods section.

An initial short simulation was carried out to determine the folding temperature (Tf) at which the free energy of the folded and unfolded state basins are equivalent. Then a longer simulation with umbrella sampling at the Tf (0.90 in reduced temperature units ()) was used to obtain a good sampling of the configurational space. The weighted histogram analysis method (WHAM)27 was used to obtain the proper Boltzmann distribution and free energy of all the sampled states as a function of temperature. Figure 1 shows the temperature dependence of the free energy of ubiquitin as a function of the global order parameter Qw which describes the foldedness of the protein structure and ranges between 0 for the totally unfolded state and 1 for the reference model structure. For an accurate comparison of the simulations and the experimental data, the simulations were extrapolated to a temperature somewhat less than the folding temperature at which the simulated global stability matched the experimental one. Based on a measured stability of 11-12 kBT reported for ubiqutin in conditions similar to those used in the HX measurements (11 kBT at pH 528, and 12 kBT at pH 729), we extrapolated the simulations to the temperature 0.868 at which the difference in free energy between the native and unfolded state basins best matches the experimental conditions.

Figure 1.

Figure 1

Temperature dependence of the free energy profile of ubiquitin as a function of the global reaction coordinate Qw. Simulations were carried out at the Tf (reduced temperature: 0.9, green line) and extrapolated to different temperatures. The curve in red color is the one in which the difference in Free energy between the folded (Qw∼0.90) and unfolded (Qw∼0.10) states basins (∼11.5 kBT) equals the global stability of the protein under the conditions used in the HX experiments. The structures at the top illustrate the configuration of the protein ensemble at different values of Qw (from left to right correspond to one representative structure at Qw 0.10, three structures at Qw 0.45, six structures at Qw 0.70 and six structures at Qw 0.90). The structures are colored blue to green from the N-terminal to the C-terminal.

After this temperature calibration, the protection factors were obtained from the simulations by evaluation of the probabilities of the open and closed conformations for each residue assuming an EX2 mechanism (see Methods) using a variety of structural definitions of local order.

Criteria for defining the open and closed conformations

The simulations provided a large ensemble of structures for evaluation of local order parameters. Presently, there is no consensus about the appropriate definition of the open or closed state, nevertheless it is clear that the accessibility of residues in the protein structure and their H-bonding state are most important. We therefore used these parameters to structurally define the exchange competent and incompetent states. The accessibility of each residue was assessed by the number of native contacts (Qi), and the H-bonded state was evaluated by assessing the increase in the distance between H-bonded residues in the native state compared to the distance in each structure in the ensemble. The ability of these criteria, either singly or jointly, to reproduce experimental data was evaluated.

The accessibility criterion was defined using a cut-off of 6.5 Å between Cβ atoms for a native contact, and simulations were evaluated using a definition of the exchange competent state as having 0, or less than 1, 2, 3, or 4 remaining native contacts per residue. We then evaluated how well each of these accessibility criteria could recapitulate the experimental results (Figure 2). The protection predicted for each residue increased as the criteria used to define the open state got more stringent. The comparison between predicted and experimental data was assessed both by the correlation coefficient (Figure 2B) as well as by the sum of the residuals (Figure 2C). For ubiquitin, the best agreement was obtained when either 0 or at most 1 contact remained in the open state. This result suggests that significant local destabilization has to occur by moving the Cβ of practically all the locally interacting residues more than 6.5 Å away to allow exchange of the amide proton. The fact that similar results were then obtained for the other proteins (see below) suggested that this criterion was robust for evaluation of different proteins.

Figure 2.

Figure 2

Comparison between the experimental and predicted protection factors for ubiquitin (accessibility criteria). Panel A shows the experimental ln Pf (green line) and the predicted ln Pf (red range lines) calculated allowing an increasing number of contacts from 0 to 4 in the open state. Proline residues are labeled with a black asterisk. Panel B shows the correlation coefficient of the experimental and predicted values as a function of the number of contacts in the open state. Panel C shows the average residuals of all residues.

For completeness, we also evaluated other threshold rules for discriminating between open and closed conformations. Specifically, we evaluated rules in which the open state was defined as having less than half of the total native contacts, or more than a single native contact lost, or a number of native contacts lower than that of the most probable contact-state. However, the performance of each of these criteria was worse than the optimum definitions discussed above.

We next developed a criterion for whether a residue was H-bonded. Given the limitation of the coarsegrained model, residues H-bonded in the native structure were considered as having preserved their H-bond along the simulations if the pairwise distance between the Cβs at each snapshot did not increase more than a specified amount. Figure 3 shows the comparison between the experimental and predicted HX patterns of ubiquitin calculated using different H-bonding criteria, defined as the amount of Cβ displacement, for structurally defining the open state.

Figure 3.

Figure 3

Comparison between the experimental and predicted protection factors for ubiquitin (H-bonding criteria). Panel A shows the experimental ln Pf (green line) and the predicted ln Pf (red range lines) calculated for increasing displacements from the native distance (0.2 to 6 Å). Proline residues are labeled with a black asterisk. Panel B shows the correlation coefficient between the experimental and predicted values as a function of the displacement used to define the open state. Panel C shows the average residuals of all residues.

The magnitude of the predicted protection factor varied significantly with increasing distance from the native Cβ-Cβ distance. The optimal H-bonding criterion was defined as an increase of > 2.0 Å from the Cβ distance between the H-bonded interacting pairs. This result agrees with previous studies that estimated that the distance between H-bonded residues must increase by 2-3 Å for a successful exchange event 30,31.

Comparison of the predicted and experimental protection factors for Ubiquitin, CI2, and SN

Two other proteins for which NMR-based HX data are available both for slow and fast exchanging backbone amide H (CI226,32, and SN33,34) were analyzed using the optimum accessibility and H-bonding definitions established for ubiquitin. The optimal accessibility criterion allowed up to 1 contact in the open state, and the criterion for a broken H-bond was a Cβ−Cβ displacement of > 2.0 Å from the distance in the native state. In addition, we also evaluated the combined criteria in which the residues were considered to be in the closed state in a particular conformation if they had either more than 1 native contact or maintained an H-bond according to the above criterion and otherwise they were considered to be in the open state. The comparative performance of the different predictions was evaluated by the average of the residuals between predicted and experimental ln Pf values (Figure 4). In general the performance of predictions using individually the accessibility and H-bonding criteria was similar for each protein. However, in all cases the combined criteria performed somewhat better (Figure 4). This parameter ranged between 2 and 3 kBT for the three proteins. These results buttress the idea that both the accessibility and H-bonding properties are important in determining the exchange kinetics30,35-39.

Figure 4.

Figure 4

Performance of criteria used to predict the protection factors for ubiquitin, CI2, and SN. The evaluation was carried out based on the average of the residual between the experimental and predicted ln Pf for all residues within each protein. The figures show the comparison between the predictions using solely the accessibility criteria (open state: <= 1 native contact, white bars) or the H-bonding criteria (open state: displacement > 2.0 Å, grey bars), and the combined criteria that uses both properties (black bars).

Figure 5 shows the comparison between the experimental and predicted ln Pf patterns for the three proteins, calculated at the experimental temperature using the combined criteria.

Figure 5.

Figure 5

Simulated and experimental properties of ubiquitin (panels A-C), CI2 (panels D-E), and SN (panels G-I). The first column shows the free energy as a function of the global reaction coordinate Qw derived from the simulations of each protein at the Tf and extrapolated to different temperatures (green line is the sampling temperature whereas the red line represents the extrapolated experimental temperature). The plots in the second column show the comparison of the experimental (blue line) and predicted (purple line) HX patterns for each protein at the experimental temperature. Proline residues are labeled with a black asterisk. The third column shows the structure of each protein colored by the value of the experimental and predicted ln Pf (blue: high protection, white: medium protection, red: low or no protection. The residues colored in black represents the ones with not reported experimental Pf. Sampling was carried out at or near the Tf, and the results were extrapolated to the experimental temperature (Simulation temperature|Extrapolated temperature in reduced units: 0.900|0.868, 0.890|0.850, and 0.730|0.715, for ubiquitin, CI2, and SN, respectively). Calculation of the predicted protection factor was done as described in Methods using the combined accessibility and H-bonding criteria. The analysis corresponds to the same simulations as the ones presented in Figure 4.

The agreement between the predicted and experimental ln Pf was good for the three proteins, in spite of the simplicity of the model used. The model was able to predict the distribution of highly and poorly protected regions along the structure of the proteins studied, as evidenced in Figure 5. We were able to properly discriminate ∼80% of the residues with ln Pf higher and lower than 5 kBT. The correlation coefficient (R) between the predicted and experimental parameters was ∼0.7 for all proteins (Figure 6). To evaluate the quality of these correlations, we compared the three ubiquitin experimental data sets to each other to determine the variability of the experimental measurements. This analysis gave correlation coefficients (R) that ranged between 0.86 and 0.96.

Figure 6.

Figure 6

Linear correlation analysis between the experimental and predicted protection factors of ubiquitin (panel A), CI2 (panel B), and SN (panel C). The data correspond to the one used in Figure 5. Only residues with reported experimental Pf were used for the analysis.

We expect that the performance of the predictions will depend to some extent on the level of description i.e. coarse graining used in the models. In principle, all atom models could allow the analysis of solvent accessibility and H bonding parameters in a more precise manner than our coarse grained model. These improvements, however, would be obtained at the expenses of simulation time and sampling completeness.

In the simulations, WHAM was used to obtain the proper Boltzmann distribution of all the sampled states as a function of temperature. The highest ln Pf predicted for most residues at a given temperature was lower or equal to the global stability of each protein at the same temperature. For example, compare the free energy difference between the native and unfolded state basins of ubiquitin at the experimental temperature (red curve in Figure 5A), with the predicted ln Pf pattern of the protein at the same temperature (magenta curve in Figure 5B). This result, that all residues become open in the unfolded state, sets an upper limit for the free energy difference between the open and closed states40.

Structural properties of residues correctly versus not correctly predicted

In spite of the similarity between the experimental and predicted Pf patterns of the three proteins, we observed some quantitative differences in particular regions. In order to evaluate the reasons for these anomalies, we divided the residues in categories depending on the difference between their experimental and predicted protection factor, and compared structural properties between each category. Residues were classified as anomalous (∼20%) if their predicted ln Pf varied by more than 4 kBT from the experimental value, and further subdivided in two groups depending on whether the protection factor was under or overestimated (colored in red and yellow, respectively, in Figure 7). The well-predicted residues (∼80%) were also subdivided into two groups; those with an experimental ln Pf below 5 kBT, and those with an experimental ln Pf above 5kBT (colored in cyan and blue, respectively, in Figure 7).

Figure 7.

Figure 7

Structure of Ubi, CI2, and SN colored by categories defined based on the difference between predicted and experimental protection factors (panel a). Residues were classified as non-anomalous if their predicted ln Pf was within 4 kBT of the experimental value, and anomalous otherwise. The non-anomalous residues were subdivided in two groups corresponding to low/none protection (cyan) and moderate/high protection (blue), depending on whether the experimental ln Pf was lower or higher than 5 kBT. The anomalous category was subdivided in two groups depending if the predicted ln Pf was below (red) or above (yellow) 4 kBT from the experimental value. The predicted values correspond to the ones obtained from the analysis using the combined accessibility and H-bonding criteria (same as Figure 5). Residues colored in black correspond to residues with no reported protection factor. A variety of structural properties measured for the residues corresponding to each category are evaluated in panels B, C, and D. The bars are colored depending on the color used for each category in panel A. The Figure shows the % of residues in each category (panel B), the average sequence distance between contacting residues (panel C), and the average contact energy corresponding to the AMW potential (panel D). The contact energy of each category is expressed relative to the energy of the blue category. Higher energy means more stable.

The average structural properties of the residues in each category were analyzed (Figure 7). This analysis revealed that the residues for which the protection factor was under predicted (red category) have a high contact order compared to the other groups (Figure 7C). These residues, found mainly in β-sheet regions in ubiquitin, and CI2 (Figure 7A), seem to be more stable than our simulations indicate. The reason for this behavior might be a relative penalization of long range over short range interactions produced as a consequence of an increased backbone flexibility in the model used.

On the other hand, residues with over estimated protection factors showed some heterogeneity and uneven distribution of contact energies along the protein structure. In this regard, it has to be noted that the Hamiltonian used in the predictions had a homogeneous contact potential such that all the native contacts were assigned the same strength. This kind of model was chosen to evaluate the contribution of the topology to the dynamics of the protein structure. To assess the contribution of relative contact energy, we evaluated the average strength of the contacts within each group using the AMW interaction parameters, based on a statistical contact potential optimized by maximization of the energy gap between native and molten globule like configurations of a set of training proteins41. This analysis indicated that the residues with over predicted protection (shown in yellow in Figure 7A) correspond to residues with a contact energy lower than the average (Figure 7D). The stability as well as the predicted protection factor of many of these residues, found in loops and irregular backbone configurations that are most likely dynamic, would decrease if heterogeneity was taken into account.

The analysis presented above illustrates how the comparison between the predicted and experimental protection factors could guide improvements and tuning of the energy functions and parameters used in the simulations.

Conclusions

In this work we examined a simple model to connect the structural fluctuations observed in molecular dynamics simulations of perfectly funneled energy landscapes and experimental data of the dynamics for HX. Coarse grained structure-based model simulations allow an extensive and fast sampling of the conformational space. Using these models we were able to simulate entire folding trajectories from fully unfolded configurations to the native state. This sampling advantage over full atomistic and frustrated models was even further exploited by using the umbrella sampling technique. The results allow a direct assessment of the free energy cost for local unfolding events connected both to small and large conformational fluctuations, and the calculation of protection factors based on a probabilistic analysis of the exchange competent and incompetent states of each residue assuming an EX2 mechanism.

We were able to predict the general features of the HX pattern under highly native conditions of ubiquitin, CI2, and SN. The agreement between the predicted and experimental results was striking despite the simplicity of the model in which only a homogenous contact potential was used. Only native interactions were considered whereas non-native interactions were not allowed, preventing local energy traps. Therefore, the folding mechanism as well as the global and local fluctuations only depend on topological factors. The results of the predictions presented in this work suggest that the protein topology plays a dominant role in determining the local stability of proteins. This seems to be a consequence of the minimal energetic frustration of the overall energy landscape.

A variety of computational approaches have been developed that use HX data to help the modeling of protein ensembles 37,42-45. Others have predicted protection factors based on structural properties 46,47 and have attempted to correlate the fluctuations observed in molecular dynamics simulations with HX data 48,49. In our work, we have attempted to predict HX protection factors based on the evaluation of the probability of the open and closed states of each residue in simulations that use an extremely simple energy function. Vendruscolo and colleagues took an inverse approach and used the experimental protection factors as restraints to guide Monte Carlo37 and molecular dynamics simulations43 for the characterization of structural intermediates, based on an empirical correlation between the protection factors and the number of contacts along with H-bonds of native proteins. Similarly, Dixon, Dokholyan and colleagues used hydrogen exchange data to tune the strength of the interaction parameters of the focal adhesion targeting domain on a structure-based model simulation45. In both of these approaches, the predicted protection factors were calculated based on average structural properties of each residue along the simulations (i.e. the number of contacts and H-bonds or the energy of the interactions). In our approach we calculate the probability of open and closed states directly from simple structure-based models that allow sampling of rare opening events. The approach presented in this paper also differs from that described by Hilser using COREX47 in the sampling procedure and the method used to evaluate the probability of the states. Hilser and colleagues used a combinatorial algorithm to systematically generate a large ensemble of conformational states, and an empirical parameterization of free energy to evaluate the probability of each microstate. By contrast, the method presented in this paper uses a coarse grained purely structure-based simulation to sample the conformational space on a funneled and totally unfrustrated landscape. Our analysis allows a direct evaluation of the probability of open and closed states from the trajectories using WHAM to obtain the Boltzmann distributions.

The results from simulations using different criteria based on the accessibility (approximated by the number of contacts of each residue) and the distance increase between H-bonded residues clearly show that a significant distortion of the local environment of the residues must occur to generate the exchange-competent state. Indeed, the Cβ of almost all neighboring residues must move more than 6.5 Å apart, and the distance between the Cβs of residues H-bonded in the native state must increase more than 2 Å. In spite of the well documented effect of hydrogen bonding, it is important to note that we were able to accurately predict the general features of the protection pattern without any explicit consideration of H-bond energy or geometry. This result suggests that the structural change needed for the H-bonded residues to become exchange-competent depends much more on the stability of the local environment than it does on the detailed energetic and microscopic fluctuations of the H-bonded interaction per se. Our analysis supports the idea that the environment has the dominant role in determining the stability of the H-bond. This approach may allow a deeper understanding of the HX process that goes beyond structural categorization yielding new information about the kinetics of formation of the open (exchange-competent) state and the size of the associated fluctuations.

Methods

HX experimental data

Ubiquitin has been extensively studied by HX under native conditions both by D2O dilution experiments followed by HSQC at pH 3.523, 5.824, and 6.225 (measuring rates for the slowly exchanging amide protons) and phase-modulated clean chemical exchange (CLEANEX-PM) in the 5.5-10.0 pH range26 (measuring rates for the rapidly exchanging amides). For comparison with our simulations, all the experimental results at different pHs were scaled to pH 5.8 to account for the different experimental conditions. We then averaged the ln Pf of the three sets and included the values obtained from CLEANEX-PM experiments in the 5.5-10.0 pH range. For SN and CI2 we also combined results for slow and fast exchanging amide protons. We supplemented the information obtained from HSQC proton-deuterium exchange experiments (measured at pH 5.5 for SN33, and pH 5.3-6.8 for CI232) with WEX II-FHSQC (pH 6.03-7.03 for SN34) and CLEANEX-PM measurements (pH 5.5-10.0 for CI226), assuming no significant change in the local stability of the proteins in the pH range used. In SN, residues 33, 77, 98, 113, 115, and 120 were discarded from the analysis because of possible contribution of alternative magnetization transfer pathways. The global stability of the proteins used in our analysis was based on the global stability reported for each protein at the conditions of the HSQC D2O dilution experiments or similar (Ubiquitin: 11-12 kBT 28,29, SN: 10 kbT 33, and CI2: 11.5 kBT 32.

Model - AMH-Gō Hamiltonian

The Gō-style Hamiltonian that we used for the simulations has been described previously50,51. It has two energy terms that evaluate the geometry of the backbone and the contacts between residues.

H=Hbackbone+Hna

This energy function applies to a reduced set of coordinates of the heavy atoms of the backbone, Cα, Cβ, and O. In this reduced description, the position of the N and C' carbons can be calculated assuming ideal protein backbone geometry.

The backbone potential described in detail previously50 includes many terms that ensure that the backbone adopts physically allowable conformations.

The Hna energy depends on Gaussian interaction terms for native contact pairs, and is given by

Hna=12i|Ei|p

where

Ei=jεij(rij)=j|εa|1/pθ(rcrijN)γij×exp((rijrijN)22σij2)

The indices i and j run over Cα and Cβ atoms, and rij is the distance between atoms i and j. The parameter p is the power of nonadditivity. Increasing p tends to more extensively many body interactions, resulting in additional cooperativity and an increased barrier of the folding transition. In our simulations we used a moderate value for the non additivity parameter which was previously shown to increase the correlation of simulations with experimental measurement of kinetics52. We used p=2.5 for all proteins. The rc, cut-off parameter, ensures via a step-function, θ(rcrijN), that there are only interactions between sites closer than this distance in the native structure (we used rc = 8 Å). The well width, σij = |i − j|0.15 Å, was set to give slightly broader wells for interactions between sites separated in sequence. In the homogenous model used in this work all interaction weights γij where set to 1. The unit of energy is denoted ε, and is defined in terms of the native state energy excluding backbone contributions via: ε= Hna/4N, where N is the number of residues. This last equivalence is ensured if the normalization constant a is defined as

a=18Ni|jγijθ(rcrijN)|p

Temperatures are quoted in terms of the reduced temperature T̃ = kBT/ε. Distances are in units of angstroms.

The simulations were based on the PDB IDs 1UBQ53 chain A (human ubiquitin), 2CI254 chain I (chymotrypsin inhibitor 2 from barley seeds), 1SNP55 chain A (Staphylococcal nuclease) including missing residues 45-50 repaired with Jackal56.

Sampling

To analyze the local stability of proteins and compare this property with HX data, we performed molecular dynamics simulations and analyzed the probability and free energy of microstates using the multiple histogram technique27,57. For an efficient sampling of the phase space of the protein conformations, we performed umbrella sampling by setting up of a series of runs using different biasing potentials added to the Hamiltonian, each acting to constrain the protein to a chosen region of the conformational space.

Hs=H+Vs(Qw)

The functions Vs(Qw) are well-shaped potentials centered on different values of the order parameter Qw to give a good sampling of phase space along the reaction coordinate of interest, with care taken that “adjacent” simulations have overlap in the regions sampled. We used Vs(Qw)=5×104 ε (Qw-Q0)4, with Q0=0.05, 0.10, …, 1.00. The reaction coordinate used, Qw, involves a sum over all pairs (except nearest-neighbors) of Cα atoms, and ranges between 0 (completely unfolded) and 1 (native reference model). Note that in many cases the intermediate values of Qw are slightly lower (∼0.1 unit) than the frequently used global order parameter Q that represents the fraction of native contacts.

Qw=2(N1)(N2)i<j1exp((rijrijN)22σij2)

The simulations centered on values of Q0 between 0.05 and 0.50 were started from extended conformations, whereas the ones centered on values of Q0 above 0.50 were started from the native structure. Typically, a set of simulations included 20 runs centered at different values of Q0.

In a first stage of the analysis a set of short simulations of 10000 time steps were carried out at different temperatures (0.80, 0.85, 0.90, 0.95, 1.00 ), to approximately locate the folding temperature, Tf. Then, at the putative Tf, 10 sets of simulations of 20000 time steps were performed. The biasing energy applied was taken into account in the free energy calculations (see below) and unbiased probabilities were thus obtained.

Free energy calculations

We used the WHAM analysis27. During each simulation Nsobs sample structures were taken at regular time intervals (50 time steps). For each simulation a total of 400 samples were taken. The first 40 samples of each simulation run were discarded to help ensure that the system reached the equilibrium before samples were entered into the free energy calculation. Given that each set of umbrella sampling included 20 simulations centered at different values of Q, over 7200 independent samples were measured per set. A histogram Ns(H,Q) of all simulations was created, and the density of states n(H,Q) of the system was calculated from the histograms

n(H,Q)=sws(H,Q)Ns(H,Q)NsobsZs(βs)exp(βs(Vs(Q)+H))

Here s labels the simulation, βs = 1/kBTs is the inverse of the simulation temperature, and Ws represents a weighting function defined as

ws=As2mAm2As2=n(H,Q)NsobsZs(βs)exp(βs(Vs(Q)+H))

In this function m runs over all simulations. The density of states and the weighting function are functions of the partition function Zs. The partition function, on the other hand, is also a function of the density of states,

Zs(βs)=H,Qn(H,Q)exp(βs(Vs(Q)+H))

This set of equations for n(H,Q)and Zs self consistently determine n(H,Q) to within a multiplicative constant, and hence the free energy to within an additive constant.

F(Q,T)=kBTlog(H,Qn(H,Q)exp(HkBT))

Evaluation of the open and closed conformations

For the evaluation of the open and closed conformations of each residue along the simulations we evaluated both the number of native contacts (as an approximation of the accessibility of residues) and the distance between residues H-bonded in the native state (as an approximation of the H-bond fluctuations). These criteria were applied independently or in combination.

1. Accessibility criteria

Residues were evaluated for accessibility at each snapshot during the simulations as approximated by the calculation of their total number of native contacts (Qi). This parameter is inversely related to the global accessibility of the residues.

Qi=jqijqij=0.5(1+tanh(5(rcutoffrij)))

The total number of native contacts for residue i in any given conformation corresponds to the sum of the pairwise contacts (qij), evaluated for all residues j in contact with residue i in the native structure. The qij value was calculated using a tanh function that switches between 0 and 1 in a very steep manner depending if the pairwise distance between the Cβ of the interacting residues is lower than a certain cut-off established as a threshold. In the case of Gly residues the qij contact was evaluated from their Cα. The evaluation was done solely for native interactions defined by the analysis of the corresponding structure using a cut-off distance of 6.5 Å.

The analysis of Qi along the simulations allowed the calculation of the probability of native contacts for every residue. The different structural criteria used to discriminate open and closed conformations (from 0 to 4 native contacts allowed in the open state) were applied homogenously to all residues.

In our model, only native interactions are favorable, non native interactions do not contribute any stabilizing energy so residues that are not making contacts in the native state collide with each other but do not remain bound by any force. Therefore non native contacts are very improbable, and are not expected to significantly contribute to the predicted protection factors in case they were considered.

2. H-bonding criteria

For the evaluation of the H-bond fluctuations we first identified the backbone amide H-bonds in the native structure of the protein using the program, Chimera 58. For each snapshot, the distance between the Cβ of the interacting pair was compared to their distance in the native structure. For Gly residues the distance was evaluated from their Cα. The “displacement” was defined as the increase or decrease in the H-bond distance from the native state.

displacement=rijrijN

The analysis of the displacement along the simulations allowed the calculation of the probability of the residues to be in the open or closed state. When the displacement was higher than a certain amount, the H-bonded interaction was assumed to be broken and the residue to be in the open state, and in the closed state otherwise. A variety of different displacements from 0.2 to 6 Å were evaluated for their performance.

3. Combination of H-bonding and accessibility criteria

Residues were considered to be in the closed state either when they correspond to an H-bonded pair when the Cβ-Cβ distance had not increased more than 2Å, or if they retained more than one native contact.

Calculation of the predicted protection factor

For evaluation of the probability of the open and closed states for each residue we carried out a two dimensional WHAM over Qw and the local order parameters Li (Qi or displacementi). The joint probabilities P(Qw,Li) thus obtained were then integrated over the global reaction coordinate and the regions of the local order parameter corresponding to the open or closed states (defined by the structural criteria chosen) obtaining the probability of the residue to be in the open or closed conformation. Then, the predicted protection factor was calculated by

lnPfpred=ln(1+PclPop)

This equation is valid under the steady state approximation (kcl>kch and/or kcl>kop) and EX2 (kcl>kch) conditions. This is a good assumption for native proteins below or near neutral pH where kcl > kch for most residues. However, care should be taken in the analysis of unstable regions where this assumption is uncertain.

Average sequence distance

The average sequence distance per contact59 for regions along the structure of Ubi, CI2 and SN was calculated by:

<Sequence Distance>n=1cΔcSi,j

where c is the total number of contacts of region n, and ΔSi,j is the sequence separation, in residues, for all contacting residues i and j. The analysis was based on the 6.5 Å radial distribution of Cβ-Cβ contacts in the native structure. The contacts for glycine residues were calculated from the Cα atom.

Acknowledgments

We thank Joe Hegler and Vanessa Oklejas for useful discussions and technical help, the NIH for financial support (NIH grants P01 GM071862 and R01 GM44557), and the Center for Theoretical Biological Physics (CTBP).

Abbreviations

HX

Hydrogen exchange

Pf

protection factor

NMR

nuclear magnetic resonance

CI2

chymotrypsin inhibitor 2 from barley seeds

SN

Staphylococcal nuclease

Tf

folding temperature

WHAM

weighted histogram analysis method

AMH

Associative Memory Hamiltonian

AMW

AMH with water mediated contacts

Qi

number of native contacts per residue

References

  • 1.Bryngelson JD, Wolynes PG. Proc Natl Acad Sci U S A. 1987;84:7524–7528. doi: 10.1073/pnas.84.21.7524. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Bryngelson JD, Onuchic JN, Socci ND, Wolynes PG. Proteins: Struct, Funct, Genet. 1995;21:167–195. doi: 10.1002/prot.340210302. [DOI] [PubMed] [Google Scholar]
  • 3.Oliveberg M, Wolynes PG. Q Rev Biophys. 2005;38:245–288. doi: 10.1017/S0033583506004185. [DOI] [PubMed] [Google Scholar]
  • 4.Levy Y, Cho SS, Shen T, Onuchic JN, Wolynes PG. Proc Natl Acad Sci U S A. 2005;102:2373–2378. doi: 10.1073/pnas.0409572102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Cho SS, Weinkam P, Wolynes PG. Proc Natl Acad Sci U S A. 2008;105:118–123. doi: 10.1073/pnas.0709376104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Cho SS, Levy Y, Wolynes PG. Proc Natl Acad Sci U S A. 2009;106:434–439. doi: 10.1073/pnas.0810218105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Clementi C, Nymeyer H, Onuchic JN. J Mol Biol. 2000;298:937–53. doi: 10.1006/jmbi.2000.3693. [DOI] [PubMed] [Google Scholar]
  • 8.Koga N, Takada S. J Mol Biol. 2001;313:171–80. doi: 10.1006/jmbi.2001.5037. [DOI] [PubMed] [Google Scholar]
  • 9.Shoemaker BA, Wang J, Wolynes PG. J Mol Biol. 1999;287:675–694. doi: 10.1006/jmbi.1999.2613. [DOI] [PubMed] [Google Scholar]
  • 10.Shoemaker BA, Wolynes PG. J Mol Biol. 1999;287:657–674. doi: 10.1006/jmbi.1999.2612. [DOI] [PubMed] [Google Scholar]
  • 11.Levy Y, Cho SS, Onuchic JN, Wolynes PG. J Mol Biol. 2005;346:1121–1145. doi: 10.1016/j.jmb.2004.12.021. [DOI] [PubMed] [Google Scholar]
  • 12.Levy Y, Wolynes PG, Onuchic JN. Proc Natl Acad Sci U S A. 2004;101:511–516. doi: 10.1073/pnas.2534828100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Yang SC, Cho SS, Levy Y, Cheung MS, Levine H, Wolynes PG, Onuchic JN. Proc Natl Acad Sci U S A. 2004;101:13786–13791. doi: 10.1073/pnas.0403724101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Woodward C, Simon I, Tuchsen E. Mol Cell Biochem. 1982;48:135–60. doi: 10.1007/BF00421225. [DOI] [PubMed] [Google Scholar]
  • 15.Englander SW, Kallenbach NR. Q Rev Biophys. 1983;16:521–655. doi: 10.1017/s0033583500005217. [DOI] [PubMed] [Google Scholar]
  • 16.Rashin AA. J Mol Biol. 1987;198:339–49. doi: 10.1016/0022-2836(87)90317-2. [DOI] [PubMed] [Google Scholar]
  • 17.Linderstromlang K. Chem Ind. 1955:503–503. [Google Scholar]
  • 18.Hvidt A, Nielsen SO. Adv Protein Chem. 1966;21:287–386. doi: 10.1016/s0065-3233(08)60129-1. [DOI] [PubMed] [Google Scholar]
  • 19.Krishna MM, Hoang L, Lin Y, Englander SW. Methods. 2004;34:51–64. doi: 10.1016/j.ymeth.2004.03.005. [DOI] [PubMed] [Google Scholar]
  • 20.Bai Y, Milne JS, Mayne L, Englander SW. Proteins. 1993;17:75–86. doi: 10.1002/prot.340170110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Chamberlain AK, Handel TM, Marqusee S. Nat Struct Biol. 1996;3:782–7. doi: 10.1038/nsb0996-782. [DOI] [PubMed] [Google Scholar]
  • 22.Dempsey CE. Prog Nucl Magn Reson Spectrosc. 2001;39:135–170. [Google Scholar]
  • 23.Pan Y, Briggs MS. Biochemistry. 1992;31:11405–12. doi: 10.1021/bi00161a019. [DOI] [PubMed] [Google Scholar]
  • 24.Johnson EC, Lazar GA, Desjarlais JR, Handel TM. Structure. 1999;7:967–76. doi: 10.1016/s0969-2126(99)80123-3. [DOI] [PubMed] [Google Scholar]
  • 25.Bougault C, Feng L, Glushka J, Kupce E, Prestegard JH. J Biomol NMR. 2004;28:385–90. doi: 10.1023/B:JNMR.0000015406.66725.30. [DOI] [PubMed] [Google Scholar]
  • 26.Hernandez G, Anderson JS, LeMaster DM. Biochemistry. 2009;48:6482–94. doi: 10.1021/bi900526z. [DOI] [PubMed] [Google Scholar]
  • 27.Kumar S, Bouzida D, Swendsen RH, Kollman PA, Rosenberg JM. J Comput Chem. 1992;13:1011–1021. [Google Scholar]
  • 28.Khorasanizadeh S, Peters ID, Butt TR, Roder H. Biochemistry. 1993;32:7054–63. doi: 10.1021/bi00078a034. [DOI] [PubMed] [Google Scholar]
  • 29.Lazar GA, Desjarlais JR, Handel TM. Protein Sci. 1997;6:1167–78. doi: 10.1002/pro.5560060605. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Milne JS, Mayne L, Roder H, Wand AJ, Englander SW. Protein Sci. 1998;7:739–45. doi: 10.1002/pro.5560070323. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Sessions RB, Gibbs N, Dempsey CE. Biophys J. 1998;74:138–52. doi: 10.1016/S0006-3495(98)77775-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Neira JL, Itzhaki LS, Otzen DE, Davis B, Fersht AR. J Mol Biol. 1997;270:99–110. doi: 10.1006/jmbi.1997.1088. [DOI] [PubMed] [Google Scholar]
  • 33.Loh SN, Prehoda KE, Wang J, Markley JL. Biochemistry. 1993;32:11022–8. doi: 10.1021/bi00092a011. [DOI] [PubMed] [Google Scholar]
  • 34.Mori S, Abeygunawardana C, Berg JM, van Zijl PCM. J Am Chem Soc. 1997;119:6844–6852. [Google Scholar]
  • 35.Clarke J, Itzhaki LS, Fersht AR. Trends Biochem Sci. 1997;22:284–7. doi: 10.1016/s0968-0004(97)01087-6. [DOI] [PubMed] [Google Scholar]
  • 36.Miller DW, Dill KA. Protein Sci. 1995;4:1860–73. doi: 10.1002/pro.5560040921. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Vendruscolo M, Paci E, Dobson CM, Karplus M. J Am Chem Soc. 2003;125:15686–7. doi: 10.1021/ja036523z. [DOI] [PubMed] [Google Scholar]
  • 38.Truhlar SM, Croy CH, Torpey JW, Koeppe JR, Komives EA. J Am Soc Mass Spectrom. 2006;17:1490–7. doi: 10.1016/j.jasms.2006.07.023. [DOI] [PubMed] [Google Scholar]
  • 39.Bentley GA, Delepierre M, Dobson CM, Wedin RE, Mason SA, Poulsen FM. J Mol Biol. 1983;170:243–7. doi: 10.1016/s0022-2836(83)80235-6. [DOI] [PubMed] [Google Scholar]
  • 40.Haglund E, Lind J, Oman T, Ohman A, Maler L, Oliveberg M. Proc Natl Acad Sci U S A. 2009;106:21619–24. doi: 10.1073/pnas.0907665106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Papoian GA, Ulander J, Eastwood MP, Luthey-Schulten Z, Wolynes PG. Proc Natl Acad Sci U S A. 2004;101:3352–3357. doi: 10.1073/pnas.0307851100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Viguera AR, Serrano L. Proc Natl Acad Sci U S A. 2003;100:5730–5. doi: 10.1073/pnas.0837456100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Gsponer J, Hopearuoho H, Whittaker SB, Spence GR, Moore GR, Paci E, Radford SE, Vendruscolo M. Proc Natl Acad Sci U S A. 2006;103:99–104. doi: 10.1073/pnas.0508667102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Best RB, Vendruscolo M. Structure. 2006;14:97–106. doi: 10.1016/j.str.2005.09.012. [DOI] [PubMed] [Google Scholar]
  • 45.Dixon RD, Chen Y, Ding F, Khare SD, Prutzman KC, Schaller MD, Campbell SL, Dokholyan NV. Structure. 2004;12:2161–71. doi: 10.1016/j.str.2004.09.011. [DOI] [PubMed] [Google Scholar]
  • 46.Bahar I, Wallqvist A, Covell DG, Jernigan RL. Biochemistry. 1998;37:1067–75. doi: 10.1021/bi9720641. [DOI] [PubMed] [Google Scholar]
  • 47.Hilser VJ, Freire E. J Mol Biol. 1996;262:756–72. doi: 10.1006/jmbi.1996.0550. [DOI] [PubMed] [Google Scholar]
  • 48.Sheinerman FB, Brooks CL., 3rd Proc Natl Acad Sci U S A. 1998;95:1562–7. doi: 10.1073/pnas.95.4.1562. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Garcia AE, Hummer G. Proteins. 1999;36:175–91. [PubMed] [Google Scholar]
  • 50.Eastwood MP, Wolynes PG. J Chem Phys. 2001;114:4702–4716. [Google Scholar]
  • 51.Latzer J, Eastwood MP, Wolynes PG. J Chem Phys. 2006;125:214905. doi: 10.1063/1.2375121. [DOI] [PubMed] [Google Scholar]
  • 52.Ejtehadi MR, Avall SP, Plotkin SS. Proc Natl Acad Sci U S A. 2004;101:15088–93. doi: 10.1073/pnas.0403486101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Vijay-Kumar S, Bugg CE, Cook WJ. J Mol Biol. 1987;194:531–44. doi: 10.1016/0022-2836(87)90679-6. [DOI] [PubMed] [Google Scholar]
  • 54.McPhalen CA, James MN. Biochemistry. 1987;26:261–9. doi: 10.1021/bi00375a036. [DOI] [PubMed] [Google Scholar]
  • 55.Truckses DM, Somoza JR, Prehoda KE, Miller SC, Markley JL. Protein Sci. 1996;5:1907–16. doi: 10.1002/pro.5560050917. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.http://wiki.c2b2.columbia.edu/honiglab_public/index.php/Software:Jackal
  • 57.Ferrenberg AM, Swendsen RH. Phys Rev Lett. 1989;63:1195–1198. doi: 10.1103/PhysRevLett.63.1195. [DOI] [PubMed] [Google Scholar]
  • 58.Pettersen EF, Goddard TD, Huang CC, Couch GS, Greenblatt DM, Meng EC, Ferrin TE. J Comput Chem. 2004;25:1605–12. doi: 10.1002/jcc.20084. [DOI] [PubMed] [Google Scholar]
  • 59.Plaxco KW, Simons KT, Baker D. J Mol Biol. 1998;277:985–94. doi: 10.1006/jmbi.1998.1645. [DOI] [PubMed] [Google Scholar]

RESOURCES