Abstract
The ability to predict the absolute stability of proteins based on their corresponding sequence and structure is a problem of great fundamental and practical importance. In this work, we report an extensive, refinement and validation of our recent approach (Roca et al., FEBS Lett 2007;581:2065–2071) for predicting absolute values of protein stability ΔGfold. This approach employs the semimacroscopic protein dipole Langevin dipole method in its linear response approximation version (PDLD/S-LRA) while using the best fitted values of the dielectric constants and for the self energy and charge–charge interactions, respectively. The method is validated on a diverse set of 45 proteins. It is found that the best fitted values of both dielectric constants are around 40. However, the self energy of internal residues and the charge–charge interactions of Lys have to be treated with care, using a somewhat lower values of and . The predictions of ΔGfold reported here, have an average error of only 1.8 kcal/mole compared to the observed values, making our method very promising for estimating protein stability. It also provides valuable insight into the complex electrostatic phenomena taking place in folded proteins.
Keywords: protein stability, folding energy, dielectric constants, electrostatics in proteins
INTRODUCTION
The ability to predict physical and chemical properties of proteins, given their sequence and folded tertiary structure, is of crucial importance for the study of enzymes.1–3 Computational approaches for predicting the thermal stability of proteins, the difference of free energy between their folded and unfolded state, have yet to emerge and validate. That is, despite the progress in the development of models for studying the folding of proteins4–12 there are still major problems in predicting protein stability by either microscopic or macroscopic models. For example, we lack a clear understanding of the magnitude of electrostatic contributions to thermal stability and to the overall folding free energy. Similarly, the values of the dielectric constants to be used and the contribution of the ionizable residues to the stability of a folded protein are only a few of the questions still unanswered.
Discretized continuum studies13,14 have suggested that charged and polar groups lead to destabilization of a folded protein. Other studies, however, have indicated that protein stability is far more complicated than originally thought, and that charged residues do not necessarily destabilize the protein core. Quite the opposite, they tend to increase protein stability.15–18 Even the general idea in continuum studies that internal ionizable residues tend to destabilize the protein core is now under reevaluation.19 Overall there is a growing realization that electrostatic energy is related to stability in general, and that electrostatic interactions usually stabilize the native states of proteins quite significantly. However, the exact role of electrostatic interactions in protein stability is obscured by the competition between desolvation penalties, stabilization by local protein dipoles, and charge–charge interaction.
Recently, we introduced an approach20 that evaluates the electrostatic contribution to protein stability by selecting relatively high values of dielectric constants for charge–charge interactions (εeff) and for self energy (εp). This approach determines the absolute stability of a given protein based on (a) the use of the semimacroscopic protein dipole Langevin dipole (PDLD/S) in its linear response approximation version (the PDLD/S-LRA method) for self energy and (b) the usage of high values εp and εeff. Our preliminary studies20 indicated that a major part of the absolute folding energy is reproduced by the electrostatic energy evaluated with large εeff and more surprisingly by a large εp. The contribution of the screened charge–charge interactions is consistent with the recent findings (e.g., Makhatadze and coworkers21–23 as well as Garcia Moreno and coworkers24,25 and Pace and coworkers26–28), as far as the relative contribution of surface groups is concerned. However, our finding appeared to be more general with implications about the energetics of internal groups. The corresponding physical picture appeared to be quite intriguing but more validation is essential. Moreover, the predictive value of our approach has not been fully validated. The present work explores this model in a much more extensive and systematic way and establishes its general validity. We focus on the challenge of finding the best fitted values for the dielectric constants and on the effectiveness of the method in predicting protein stability.
METHODS
The method described in this work focuses on the electrostatic contribution to the folding free energy, assuming that this is the main contribution. The method was described previously by Roca et al.,20 and we consider below the main points.
Our starting point is the folding paths of Figure 1, where the system can move from the unfolded state (A) to the folded state (E) in two pathways. In one of them (A) → (B) → (C) → (D) → (E), the unfolded protein becomes uncharged (transferring the charges to solution on the proper proton transfer process), folds to its neutral folded form (C), and then move to the final folded charged form (E) through an uncharged structure that is similar to that of (E). In the second path, the charged protein folds directly by moving from (A) to (E). The electrostatic contributions to folding, , of a given protein at a specific ionization state and at a given pH, is evaluated by29:
| (1) |
where uf and f designate, the unfolded and folded states, Qi is the charge of the ith residue, in the folding (f) or the unfolding (uf) state, εp is the dielectric constant used in the semimacroscopic calculation of intrinsic pKa, is the intrinsic pKa of the ith residue in its given protein state when all other residues are neutral, at a given εp. Finally rij is the distance between residues i and j, while εeff is the effective dielectric for charge–charge interactions inside the protein and εwater is the effective dielectric constant for charge–charge interactions in water.
Figure 1.
The thermodynamic cycle for the folding of a charged protein. Steps A → B and B →C correspond, respectively, to the uncharging of the unfolded protein and the folding of the uncharged protein. Step C → D corresponds to the change of the unfolded protein from its equilibrium structure to the structure of the folded charged protein. Step A → E corresponds to a direct folding of the ionized protein. The figure defines the free energy terms used in this work.
As clarified elsewhere,29,30 εp determines the “self energy” and the corresponding intrinsic pKa of each charged group. This parameter is not related to the response of the protein to electric field but to the method used in the calculations and to the elements included explicitly in the simulation system. Basically, εp reflects all the effects that are not included explicitly in the calculations of the self energy.30 Similarly, εeff is a parameter that reflects the compensation of the gas phase charge–charge interaction by the reorganization of the solvent and the protein.29
The folding free energy also includes non-electrostatic contributions such as configuration entropy and hydrophobic contributions. These contributions depend on the path used in Figure 1. For example, we can write according to Figure 1,
| (2) |
In this description, the electrostatic terms represent the charging process in the folded state and the uncharging in the unfolded state, while the non-electrostatic is entirely associated with the folding of the fully uncharged protein. The use of this equation for mutations of ionized residues of a given protein allows us to focus only on .
Eq. (1) describes the electrostatic contribution to the folding energy of the given protein, and the corresponding value depends, of course, upon the choice of εp and εeff. Concerning the non-electrostatic term , it may well be of different value for different proteins, and there is no current computational method that can provide a reliable estimate of the quantities. However, if we use the direct (A) → (E) path of Figure 1 we get a different picture of the calculate ΔGfold, since now the entire folding process can be described in terms of the corresponding change in charge–charge interactions, using
| (3) |
That is, now the non-electrostatic effects may be absorbed in the effective dielectric for the work of bringing the charged groups from the unfolded to the folded state in the (A) → (E) path of Figure 1. In other words, the (B) → (D) step involves non-electrostatic contributions that may be significant, while the step (D) → (E) is formally a well defined electrostatic step, which includes, of course, structural reorganization. The dielectric that reflects the structural reorganization in the (D) → (E) step may be modified by considering the free energy of the (B) → (D) step and requiring that this free energy will be reproduced by Eq. (1). Now if the effect of the (B) → (D) step is not large or if it is somehow correlated with the trend in the electrostatic free energy, we will have a uniform dielectric that works for all proteins. Conversely, if this is not the case we will not be able to use such approximation. It is this philosophy, though with some assumptions, that leads us to the next important step of our method. Since our concept of a dielectric constant is somewhat complex we refer the reader to Refs. 29–31 for further clarification.
If the term is somehow small, reflecting compensation between the entropy and the hydrophobic effect, we may examine the approximation:
| (4) |
where is the set of εp and εeff that successfully reproduce the folding energy. In this case we obtain:
| (5) |
Where: uf and f designate, the unfolded and folded states, Qi is the charge of the ith residue, in the folding (f) state only, is the intrinsic pKa of the ith residue calculated from the chosen and not the original εp, and is the pKa of the i residue in water. Rigorously, we should use but since the contribution from the unfolded state will be neglected here we leave the notation Qi. We also use 80 instead of the εwater since the difference is trivial and the effect of this term is small. Finally, rij is the distance between residues i and j, in the folding (f) and unfolding (uf) state. Here, is the chosen dielectric effective constant for the charge–charge interactions. It should be pointed out that in Eq. (5) is taken here as independent of the distance rij. In later sections, we discuss exceptions, where the choice of for certain cases is affected by both the residues i and j as well as its distance rij. At any rate, the first term in Eq. (5) represents the change of self energy upon moving a charge from water to its site in the folded protein: That is,
| (6) |
The second term in Eq. (5) represents the effect of charge–charge interaction.
Adopting the approximation of Eq. (5) reduces the complicated and difficult problem of calculating absolute folding energy of a protein, to the finding of the proper, best fitted values of and , which provides the best estimate of ΔGfold. This idea is based on the fact that semimacroscopic models can represent implicitly physical and chemical properties even though they lack explicit representation of the phenomena taking place. Of course, such an approach can work only if the given model retains the main physics of the real system. Overall, we expect the best fitted values of and to be higher than the values of 4 to 8, values used for example to calculate accurate values of pKa’s in proteins.32
Equation (5) can be further simplified, by assuming that is in general much larger than , and that is smaller than 80. Thus, Eq. (5) is finally approximated by:
| (7) |
In Eq. (7) we drop the notation (f) from the variables and , and we will use rij and as the distance and the corresponding dielectric constant for interactions between residues i and j in the folding state. This simplification may overlook cases where the contribution from the unfolded protein is significant. This might be the case when the unfolded protein has a compact structure, as implied by the studies of Raleigh and coworkers33 or when electrostatic interaction in the unfolded state play a significant role for folding kinetics as suggested by Pace, Trefethen, and coworkers.26,27 However, it is possible that the contribution from the unfolded configuration is relatively small and the best way to judge this issue is to explore the validity of Eq. (7) in a large number of test cases.
The evaluation of the intrinsic pKa’s were done by using the PDLD/S-LRA method according to standard protocol using the POLARIS module and the ENZYMIX force field of MOLARIS program.20,36 Each protein studied in this work was first solvated by the surface constrained all atom solvent (SCAAS) model,35,36 and all the ionizable groups (Asp, Glu, Lys, Arg, but not His) at pH = 7 were assigned a charge which is 50% of their full charge at the ionized state (this was considered as the best procedure for the initial relaxation). The resulting system (protein and waters) was equilibrated by running a 100 ps molecular dynamics simulation with 1 fs time step at 300 K. The subsequent PDLD/S-LRA calculations of the of each ionizable residue started with 10 ps equilibration run, followed by 25 2 ps runs (starting from different configurations) on both the charged and uncharged state of the given residue. The resulted calculated pKi,int were then averaged to evaluate the final . The total simulation time depends on the size and the number of ionizable groups of the given protein. For example it took 22 hours on 4 dual core nodes (Dual Intel P4 3.0 GHz, 2 GB Memory) to evaluate all the of snase (protein 1EY0, size 136 residues, which contains 48 ionizable residues).
Once the intrinsic pKa’s were evaluated, the ionization states of the protein residues at pH = 7 were determined, by using a Monte Carlo approach described previously.36 This procedure was repeated at different and provides the charges (the Qi) of the ionized residues as a function of the given dielectric constants. Using the and the Qi in Eq. (7) provided as a function of and .
RESULTS
Initial estimate of the best fitted values of and
In this work, we moved to a much more extensive benchmark than the one used in our previous work,20 considering all the proteins listed in Table I. Our test set included 45 proteins with various sequence sizes, folds, and function.
Table I.
The benchmark considered in this work
| No. | Name | PDB ID |
Number of residues |
Type | Source |
|---|---|---|---|---|---|
| 1 | SSO7d | 1SSO | 62 | monomer | 37 |
| 2 | Thioredoxin | 2TRX | 108 | monomer | 37 |
| 3 | Barstar | 1A19 | 89 | monomer | 37 |
| 4 | Apoflavodoxin | 1FTG | 168 | monomer | 37 |
| 5 | λ-Repressor | 1LMB | 87 | monomer | 37 |
| 6 | Snase | 1EY0 | 136 | monomer | 37 |
| 7 | BsHpr, Phosphotransferase | 2HID | 87 | monomer | 37 |
| 8 | FeCyt b562 | 1QPU | 106 | monomer | 37,38 |
| 9 | Arc Repressor | 1ARQ | 53 | dimer | 37,39 |
| 10 | Aspartate Amilotransferase | 1VPE | 398 | monomer | 3 |
| 11 | Chey | 1TMY | 118 | monomer | 3 |
| 12 | GDH Domain II | 1B26 (A02) | 234 | monomer | 2,3,37 |
| 13 | Histidine Phosphocarrier | 1Y4Y | 87 | monomer | 3 |
| 14 | Phosphotransferase, Histidine containing protein | 2HPR | 87 | monomer | 3 |
| 15 | Ribosomal | 1H7M | 99 | monomer | 34 |
| 16 | RNase H* | 1JXB | 147 | monomer | 3,40 |
| 17 | Ferridoxin | 1VJW | 59 | monomer | 3 |
| 18 | O-Methyl Guanine DNA methyltransferase | 1MGT | 169 | monomer | 3 |
| 19 | Sac7d | 1WD0 | 66 | monomer | 2,3 |
| 20 | aHistone | 1BFM | 134 | dimer | 2,41 |
| 21 | PFRD-XC4 | 1QCV | 53 | monomer | 42 |
| 22 | aStaphylococcal Nuclease I92K | 1TR5 | 130 | monomer | 43 |
| 23 | Staphylococcal Nuclease I92E | 1TR5 | 130 | monomer | 43 |
| 24 | aStaphylococcal Nuclease | 1TR5 | 130 | monomer | 43 |
| 25 | Bs CSP | 1CSP | 67 | monomer | 20,44 |
| 26 | Bc CSP | 1C9O | 66 | monomer | 20,45 |
| 27 | Tm CSP | 1G6P | 61 | monomer | 2,20 |
| 28 | aUbiquitin D21N | 1AAR | 76 | monomer | 20,46 |
| 29 | Ubiquitin F45W | 1AAR | 76 | monomer | 20,47 |
| 30 | aUbiquitin K27A | 1AAR | 76 | monomer | 20,46 |
| 31 | Tm DHFR | 1CZ3 | 164 | dimer | 20,48 |
| 32 | Ec DHFR | 1RX2 | 159 | monomer | 20 |
| 33 | Ribonuclease | 1X1P | 212 | monomer | 2,49 |
| 34 | Glucanase C | 1CX1 | 153 | monomer | 50 |
| 35 | Phospholipid A2 | 1P2P | 124 | monomer | 51 |
| 36 | Trypsin Proteinase | 2CI2 | 65 | monomer | 52 |
| 37 | Interleukin | 1IOB | 153 | monomer | 53 |
| 38 | aAdhesion transferase Y92E | 1K40 | 126 | monomer | 54 |
| 39 | Complement | 1GKG | 136 | monomer | 4 |
| 40 | Cytochrome b5 Rat | 1B5M | 84 | monomer | 55 |
| 41 | Gene V DNA Binding | 1VQB | 86 | monomer | 56 |
| 42 | Glu Transferase | 1GSD | 208 | dimer | 57 |
| 43 | Growth Factor | 1FGA | 124 | monomer | 58 |
| 44 | Isomerase | 1HTI | 248 | dimer | 59 |
| 45 | Ribosomal S6 | 1RIS | 97 | dimer | 60 |
The structure used to calculate , was obtained by mutating the corresponding PDB structure reported in the third column of this table.
We started with a rough estimate of the best fitted set of for the prediction of ΔGfold. This was done first for a single protein (L-Repressor, PDB ID 1LMB) by the approach illustrated in Figure 2.
Figure 2.
The calculated values of as a function of and for L-Repressor. The best fitted values of and are taken as the values that give the best agreement between and the observed ΔGfold. Observed stability for this protein is −4.6 kcal/mole. As seen from the figure, these values are around and .
In order to clarify the nature of Figure 2, it is useful to start, for example, with the evaluation of at a specific point (e.g., and ). In this case, we first used and evaluate the intrinsic pKa values for all the ionizable residues. In the next step, we used Eq. (7) with and obtained . This value was then assigned in the corresponding point in Figure 2. Now the same procedure was repeated for other values of and and the figure was completed. Next, we identified the set of that gave the best agreement with the observed ΔGfold (around −4.6 kcal/mole for L-Repressor) and identified the optimal dielectric constants. In the case of Figure 2, the best fitted values were found to be in the vicinity of .
After applying the above approach to all the 45 proteins of our test case, we considered several alternative ways of analyzing the corresponding results. The first and simplest analysis which is summarized in Figure 3, was based on allowing both and to be either 35 or 40, and choosing the value of that minimizes the difference . The corresponding results were used to generate Figure 3(A). This type of treatment will be referred here as getting the “best fitted” . The same procedure was followed in the generation of Figure 3(B), but this time with and .
Figure 3.
The correlation between the best fitted and ΔGfold for best fitted values of and in a restricted range. The calculations considered all the proteins in the benchmark of Table I and take the value of that gives best fitted results for the two allowed values of and (see text for the procedure used for selecting the best fitted values). The two values considered in Figure 3(A) are εp, εeff = 35, 40 whereas in Figure 3(B) we consider εp, εeff = 8, 20. As discussed in the text, the calculated best fitted values of diverge greatly from the corresponding observed values. The average error between observed ΔGfold and best fitted is 3.5 kcal/mole, and 7.3 kcal/mole for Figure 3(A) and 3(B), respectively. Some points of Figure 3(B) are outside the scale of this figure.
As seen from both Figures 3(A) and 3(B), there is significant disagreement between the best fitted and observed ΔGfold for many proteins. However, from these initial trials it appears that with the use of high values of and of around 40, the absolute difference between observed and calculated stability is significantly smaller than the difference obtained with other dielectric constants (see example demonstrated in Figure 3(B), where the discrepancy is much larger at the region when ).
Improving the model
In the next stage of the refinement of our model, we tried to further search for the best fitted values for the dielectric constants. This was done by considering the dielectrics that minimize the function
| (8) |
Where i runs on all 45 test proteins, and is the sum of the absolute value of errors between calculated and observed stabilities, for a specific value of and . The resulting surface is shown in Figures 4(A) and 4(B).
Figure 4.
The surface (A) and contour plots (B) of . In these plots it is clearly shown that the best fitted values for the dielectric constants follow approximately the line , and that for and above, the error reaches its minimum plateau.
As seen from the figures, the best fitted values of and follow approximately a line where , and the global minimum occurs approximately where . Thus, it is concluded that the best fitted dielectrics needed for prediction of protein stability is the one around .
To improve the performance of our model, we started with a global analysis of the reasons for the disagreement between the calculated and observed values shown in Figure 3. This analysis focused first on the location of the ionized residues in the cases that have not produced satisfactory results. This was done by evaluating the distance between the ionized groups to the closest water molecule using the grid created by MOLARIS in the initial process of generating the SCAAS water sphere (see Refs. 29 and 36 for earlier studies along this line). The use of water grid in the identification of the internal/external residues is illustrated in Figure 5.
Figure 5.
Illustrating our procedure for defining internal groups. The protein is surrounded by a cubic grid of water molecules and the distance between the water oxygen and the given group is used to determine whether the group is internal or external.
The above grid analysis indicated that the main problem is associated with internal groups, and led us to modify our approach, with the working hypothesis that in the case of truly internal groups it is better to use lower while still using a large . This was done by defining an internal residue when the following conditions occur: (a) the shortest distance between a grid point and a heavy atom of the given residue is higher or equal to the threshold value reported in Table II, depending upon the type of the residue, and (b) the given residue has a low number of water molecules within a radius of 5 Å (five or less water molecules) from its geometrical center. Condition (b) was used mainly as a verification of condition (a).
Table II.
The Threshold Distances Between the Terminal Heavy Atom of the Given Amino Acid and the Closest Grid Point (The Closest Water Molecule)
| Residue type | Threshold (Å) |
|---|---|
| ASP | 5.0 |
| GLU | 5.2 |
| HIS | 5.2 |
| LYS | 5.5 |
| ARG | 5.7 |
For example, a Glu residue is considered to be internal when there are five or less water molecules within a distance of 5 Å from its geometrical center, and the closest water is 5.2 Å or more from its furthest heavy atom, which in this case is one of the two oxygen atoms off its side chain.
If a single ionizable residue is identified as an internal residue, then the intrinsic pKa for this residue is chosen to take a value between the calculated for and the for . For example, if a tested protein has 20 ionizable residues, and residue 17 is identified as internal, then in Eq. (7), for residue i = 17 the pKa value used will be:
| (9) |
Now Eq. (7) for this case (protein with 20 ionizable residues and one internal residue) becomes:
| (10) |
In some cases we identified internal ionizable residues with a very large difference between calculated and . In these cases we found it useful to use an even lower value of . Thus we further refined our procedure and used an intrinsic pKa subscript that corresponds to when the difference for values of between 4 and 8 was positive and higher than 4 for Asp and Glu, and negative and lower than −3 for Lys and Arg.
Assigning lower values and using the corresponding for the internal ionizable residues, resulted in a significant increase in the accuracy of the calculated ΔGfold for our protein test set. Now the majority of the proteins were showing excellent ΔGfold predictions for dielectrics of . However, we also observed that a significant fraction of the tested proteins give more accurate results for and . The result for both cases are shown in Figures 6(A) and 6(B), respectively.
Figure 6.
The correlation between and after using lower for the internal residues. Both Figures 6(A) and 6(B) use the same representation as in Figure 3. In this case, the test proteins fall into two main categories: (a) the one shown in Figure 6(A) where the best fitted is in the range of , and (b) the one shown in Figure 6(B) where the best fitted is in the range . The majority of the test proteins fall into the first category.
Some of the results obtained after the specialized treatment of the internal residues were still puzzling. That is, although the majority of the proteins tests were giving excellent results for , there was still a significant number of proteins where was substantially lower than ΔGfold, (see Fig. 6B). However, the fact that in both cases the value of was around 40, indicated that the problem is due to , that produced — in certain cases—the underestimation of ΔGfold. To decipher this non obvious behavior of , we concentrated on proteins cases where point mutation resulted in “shifting” the best fitted values of and from the and range to the and range, and vice versa. This was found in the case of the CSP where we examined examined three proteins with high sequence similarities and only a few point mutations. In the case of bs-CSP we obtained the best results in the and range, in the case of bc-CSP we obtained the best results in the and range, while in the case of tm-CSP we obtained the best results in the and range. The analysis of this data indicated that the interaction of Lys residue with charged residues at 10 Å or less leads to the above shift. Thus, we modified the for the interaction of Lys with negatively charged residues by using:
| (11) |
for a distance rij ≤ 10 Å.
This special treatment is rationalized by the fact that Lys residues can move their center quite easily and thus can adjust their interaction with counter ions (see analysis in our previous work61).
Using both the modified and for Lys allowed us finally to obtain much better results, as shown in Figure 7, and in the corresponding columns in Table III. Now the agreement between the best fitted and ΔGfold is excellent.
Figure 7.
The best fitted and observed ΔGfold for the tested proteins, after the use of lower for the internal residues and lower for the Lys with rij ≤ 10 Å. In this case, all proteins show best fitted values of when dielectric constants are in the range of . The difference between best fitted and observed ΔGfold has significantly decreased, compared to the previous cases of Figures 3 and 5. (For 45 proteins, the average difference is reduced to 0.7 kcal/mole.)
Table III.
Predicted ΔGfold for the Series of Proteins Considered in this Worka
| No. | Protein |
|
|Error| | Best εp | Best εeff | |Error| | |||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | SSO7d | 7.9 | 8.0 | 0.1 | 40 | 40 | 8.5 | 0.5 | |||
| 2 | Thioredoxinb | 4.4 | 9.0 | 4.6 | 40 | 35 | 3.1 | 5.9 | |||
| 3 | Barstar | 5.9 | 5.7 | 0.2 | 35 | 35 | 4.9 | 0.8 | |||
| 4 | Apoflavodoxin | 4.3 | 4.3 | 0.0 | 35 | 40 | 5.0 | 0.7 | |||
| 5 | λ-Repressor | 4.5 | 4.6 | 0.1 | 40 | 40 | 4.8 | 0.2 | |||
| 6 | Snase | 6.3 | 6.2 | 0.1 | 35 | 40 | 8.0 | 1.8 | |||
| 7 | BsHpr, Phosphotransferase | 3.1 | 4.0 | 0.9 | 40 | 35 | 2.1 | 1.9 | |||
| 8 | FeCyt b562 | 7.7 | 5.2 | 2.5 | 35 | 40 | 9.7 | 4.5 | |||
| 9 | Arc Repressor | 4.3 | 4.6 | 0.3 | 40 | 35 | 1.7 | 2.9 | |||
| 10 | Aspartate Amilotransferase | 26.8 | 28.9 | 2.1 | 40 | 40 | 31.0 | 2.1 | |||
| 11 | Chey | 11.6 | 9.5 | 2.1 | 35 | 40 | 13.5 | 4.0 | |||
| 12 | GDH Domain II | 5.6 | 4.9 | 0.7 | 35 | 35 | 4.2 | 0.7 | |||
| 13 | Histidine Phosphocarrier | 5.4 | 8.2 | 2.8 | 40 | 35 | 4.5 | 3.7 | |||
| 14 | Phosphotransferase, Histidine containing protein | 5.3 | 5.2 | 0.1 | 40 | 35 | 4.0 | 1.2 | |||
| 15 | Ribosomal | 7.4 | 10.7 | 3.3 | 40 | 35 | 6.4 | 4.3 | |||
| 16 | RNase H* | 7.1 | 7.5 | 0.4 | 40 | 35 | 6.5 | 1.0 | |||
| 17 | Ferridoxinb | 3.8 | 9.3 | 5.5 | 40 | 35 | 3.2 | 6.1 | |||
| 18 | O-Methyl Guanine DNA methyltransferase | 10.2 | 10.2 | 0.0 | 35 | 35 | 9.1 | 1.1 | |||
| 19 | Sac7d | 7.8 | 7.4 | 0.4 | 40 | 40 | 9.0 | 1.6 | |||
| 20 | Histone | 7.8 | 7.2 | 0.6 | 40 | 35 | 7.6 | 0.4 | |||
| 21 | PFRD-XC4 | 2.1 | 3.2 | 1.1 | 40 | 35 | 1.2 | 2.0 | |||
| 22 | Staphylococcal Nuclease I92K | 1.5 | 2.3 | 0.8 | 40 | 40 | 2.8 | 0.5 | |||
| 23 | Staphylococcal Nuclease I92E | 3.8 | 4.2 | 0.4 | 40 | 35 | 2.2 | 2 | |||
| 24 | Staphylococcal Nuclease | 11.5 | 11.5 | 0.0 | 35 | 35 | 10.3 | 1.2 | |||
| 25 | Bs CSP | 1.1 | 1.2 | 0.1 | 35 | 35 | 0.8 | 0.4 | |||
| 26 | Bc CSP | 5.2 | 5.0 | 0.2 | 40 | 35 | 4.2 | 0.8 | |||
| 27 | Tm CSP | 6.8 | 6.5 | 0.3 | 35 | 40 | 8.0 | 1.5 | |||
| 28 | Ubiquitin D21N | 5.1 | 6.1 | 1.0 | 40 | 35 | 3.9 | 2.2 | |||
| 29 | Ubiquitin F45W | 6.9 | 7.4 | 0.5 | 40 | 35 | 5.9 | 1.5 | |||
| 30 | Ubiquitin K27A | 4.3 | 4.4 | 0.1 | 40 | 40 | 4.8 | 0.4 | |||
| 31 | Tm DHFR | 30.4 | 30.1 | 0.3 | 40 | 35 | 25.9 | 4.2 | |||
| 32 | Ec DHFR | 6.5 | 6.0 | 0.5 | 35 | 35 | 4.9 | 1.1 | |||
| 33 | Ribonuclease | 11.2 | 10.5 | 0.7 | 40 | 40 | 13.1 | 2.6 | |||
| 34 | Glucanase Cb | 0.0 | 0.0 | 0.0 | – | – | 0.0 | 0.0 | |||
| 35 | Phospholipid A2 | 7.8 | 6.5 | 1.3 | 35 | 40 | 9.1 | 2.6 | |||
| 36 | Trypsin Proteinase | 7.7 | 7.5 | 0.2 | 35 | 35 | 6.8 | 0.7 | |||
| 37 | Interleukin | 9.7 | 9.1 | 0.6 | 35 | 35 | 8.9 | 0.2 | |||
| 38 | Adhesion transferase Y92E | 7.5 | 8.0 | 0.5 | 40 | 40 | 8.4 | 0.4 | |||
| 39 | Complement | 2.6 | 2.7 | 0.1 | 35 | 35 | 2.5 | 0.2 | |||
| 40 | Cytochrome b5 Rat | 3.5 | 3.2 | 0.3 | 35 | 35 | 2.3 | 0.9 | |||
| 41 | Gene V DNA Binding | 7.6 | 9.0 | 1.4 | 40 | 35 | 6.6 | 2.4 | |||
| 42 | Glu Transferase | 13.6 | 14.4 | 0.8 | 35 | 35 | 10.2 | 4.2 | |||
| 43 | Growth Factor | 0.0 | 0.0 | – | – | – | 0.0 | 0.0 | |||
| 44 | Isomerase | 19.2 | 19.3 | 0.1 | 40 | 35 | 13.7 | 5.6 | |||
| 45 | Ribosomal S6 | 11.1 | 8.0 | 3.1 | 35 | 40 | 12.9 | 4.9 |
All free energies reported are in kcal/mol. The third column reports the experimental values of the absolute stabilities of the tested proteins. The last two columns give the predicted stability according to Eq. (12) and the corresponding error. The sources of the observed values are given in Table I.
Proteins containing a disulfide bond (which is not treated by the present method) where the corresponding “missing” energy is around 5 kcal/mole.
Using the results described in Figure 7 to calculate the sum of errors introduced in Eq. (8), we evaluated again the best fitted values for the dielectric constants by determining the region of where is minimized. This treatment led to the surface described in Figures 8(A) and 8(B). The best fitted region follows again an approximate line where . The main difference between Figures 4 and 8 is that the region with corresponds to much deeper valley (indicating a minimum in the function ).
Figure 8.
The surface (A) and contour plots (B) of obtained after refinement of for internal groups and for Lys. The plots show clearly that the best fitted values for the dielectric constants follow approximately the line , with a global minimum of around . This means that values of proximal to ΔGfold can be acquired when have high values, and .
Table III summarizes the calculated results for the best fitted dielectric constants. As seen from the Table, we obtained quite accurate results. Most of the protein tested show error of around 1 kcal/mole, and only a few have errors above 2 kcal/mole, while still not exceeding 3 kcal/mole. The average error for the best fitted values of (shown on the forth column of Table III) is only 0.7 kcal/mole.
The ΔGfold values reported in Table III for Glucanase C is zero, since the protein is unstable according to Ref. 50. In this case, it is found experimentally that without the Cys6–Cys27 disulfide bond, the protein is unstable, thus our finding that is positive for all dielectric constants, is consistent with the experimental observation, since our calculations do not include the disulfide bond. Protein acidic fibroblast growth factor is also unstable under physiological conditions. It is reported62,63 that the protein without certain polyanions, remains relatively unstable.
Concerning the two proteins of Thioredoxin64,37 and Ferridoxin,3,65 our method underestimated the observed stability by 5 kcal/mole. However, those two proteins contain one disulfide bond, and since a disulfide bond stabilizes a protein with an amount of approximately 5 kcal/mole,66,67 we consider our method to be successful in predicting their absolute stabilities.
Predicting absolute stabilities of proteins
This work so far has shown that the calculated absolute stability of a protein can be correlated quite accurately with the corresponding observed value, by choosing best fitted values of and in the region .
This however is not fully unbiased since the “best fitted” values are taken as the values that give the best result. Thus, it is important to select a well defined scheme for actually predicting the relevant stability. After considering several options, we concluded that a reasonable prediction can be obtained by using:
| (12) |
The corresponding result for are shown on the right columns of Table III and in Figure 9. As seen from the figure, the agreement is reasonable (an average error of 1.8 kcal/mole) although less impressive than that of Figure 7. However, in this case, we are not selecting any best fitted values and have a robust way of predicting protein stability.
Figure 9.
Calculated and observed folding energies for all our benchmark. The calculations are done using Eq. (12), rather than by choosing the best fitted values of , as was in Figure 7.
Dependence of absolute stabilities on the size and fold of the protein
During the validation of our model we examined the possibile influence/dependence of factors such as protein size and fold on absolute stability. To investigate such dependence, we chose the protein test set that included proteins of largely different size and fold. Comparisons of both the observed and the calculated stability with size and fold of proteins didn’t show any particular correlation. For example, there are many small proteins in the test set shown in Table I, which have relatively large values of observed and calculated absolute stabilities (for example SSO7D, Histidine Phosphocarrier and Sac7d) but at the same time there are many that show low values (for example Barstar, PFRD-X64, and Bs CSP). The same trend is observed for large proteins (for example staphylococcal nuclease, isomerase, Tm DHFR, and aspartate amilotransferase show large values of absolute stability, while GDH Domain II, staphylococcal nuclease I92K, and Ec DHFR have low values for absolute stability). Comparison of fold and absolute stability also has not showed any correlation, i.e., whether the structures were mainly alpha, mainly beta, amorphous, or alpha-beta, didn’t show an important difference in absolute stability.
Finally, no correlation between best fitted values of and and the size or fold has been observed.
A practical and powerful approach for stabilizing proteins
The present approach should provide a powerful new way of stabilizing proteins by systematic mutations and this is expected to help augmenting existing strategies (e.g., Reetz et al.68). We can of course just use Eq. (7) and examine systematically different mutant. This is done as a demonstration in Table IV. However, a more effective approach is obtained by differentiating Eq. (7) and evaluating the gradient of the free energy with regards to the charges of the protein residues.
| (13) |
Table IV.
Calculated for the Set of Dielectrics and for Different Proposed Mutations in the Wild Type Lip Aa
| Ionized to non polar mutations of the wild type Lip A | ||
| Structures | ||
| Wild type | −7.62b | |
| Arg142Ala | −9.27 | |
| Lys23Ala | −9.61 | |
| Lys70Ala | −9.65 | |
| Lys88Ala | −9.80 | |
| Arg107Ala | −10.15 | |
| Arg33Ala | −10.31 | |
| Lys23Ala//Arg33Ala | −10.64 | |
| Lys23Ala//Arg33Ala//Arg107Ala | −12.43 | |
| Non polar to ionized mutations of the wild type Lip A | ||
| Structures | ||
| Wild type | −7.62b | |
| Ala38Asp | −8.53 | |
| Ala132Asp | −8.96 | |
| Ala105Asp | −9.39 | |
| Ala15Asp | −9.52 | |
| Ala75Asp | −9.77 | |
| Ala81Asp | −9.97 | |
| Ala20Asp | −10.78 | |
| Ala68Asp | −10.89 | |
| Ala146Asp | −11.03 | |
| Ala97Asp | −11.21 | |
| Ala113Asp | −11.63 | |
| Charged to non polar mutations of the most stable mutant (variant XI) of Lip A | ||
| Structures | ||
| Variant XI | −10.30b | |
| Lys32Ala | −10.71 | |
| Asp133Ala | −11.08 | |
| Lys70Ala | −11.16 | |
| Lys23Ala//Lys70Ala | −11.32 |
Energies in kcal/mol.
Taken from Ref. 69.
The gradient can then be used in predicting the change of charges that will stabilize the protein, using
| (14) |
(where α is a scaling factor) and repeating the procedure iteratively we can predict the charge configurations that will increase the stability. Our preliminary validations of the above approach included a demonstration that we can reproduce the energetics of known mutants starting from the native protein, or from other mutants. Experimental validation of our approach should be extremely useful.
DISCUSSION
This work attempted to develop and validate a method for predicting absolute folding free energies. The main idea behind our approach is the hypothesis that the folding free energy is mainly determined by electrostatic free energies. Thus, the working hypothesis is that fitting the relevant dielectric constants will give a highly predicted model. The specific strategy involved evaluation of the intrinsic pKa by the PDLD/S–LRA method with a given and evaluating charge–charge interactions with a dielectric .
The overall best fitted dielectric constants were found to be in the range of 35–40 for both and . However, a significant improvement was obtained by using a lower value of for internal groups and by reducing for Lys groups.
As seen from Figure 9, the agreement between calculated and observed stability is reasonable although slightly less impressive than those of Figure 7. However, in this case, we are not selecting any best fitted values of and , and we have a robust way of predicting protein stability. In fact, it is quite possible that the present method can be further improved by, for example, reducing for internal ion pairs, or by other physically based refinements.
The validity of our method can be best judged from its performance on the diverse test case used. To the best of our knowledge, such a performance in prediction of absolute folding energies has not been obtained by other approaches. Here, one can argue that the method is empirical and thus may reflect effects that are not related to the electrostatic free energy. We believe, however, that the method captures consistently electrostatic effects and discusses these issues in our previous work (Ref. 20). That is, using Figure 1 we rationalize the high even for internal groups by pointing out that going on A → E path corresponds to a folding of polyelectrolyte where the energetics of buried ionized group in the protein interior is reflected in the effective dielectric for charge–charge interactions. We also note that using a large is fully consistent with our previous concepts.29 Furthermore, we addressed the puzzling finding that εp for calculating pKa’s is small (when evaluated consistently while considering the local protein relaxation30) while for folding calculations is relatively large. This issue was explored in the specific case of ubiquitin where it was concluded that εp in pKa calculations reflects only the relaxation in the D → E step (the protein is near its structure in the charged configuration). On the other hand, should reflect implicitly the compensating protein pre-organization in the C → E process (see Figure 5 in Ref. 20). Apparently, since this effect is not included explicitly, we have to use a large value of .
The present work involved much more extensive benchmark than the one used in Ref. 20 and thus allowed us to refine our ideas about and . We found that the idea of using large dielectrics is confirmed as much as surface groups are concerned. However, for internal groups, it appears that a compromise that involves ≤ 20 (which is still quite large) is required.
One may argue that the identification of our calculated folding free energy with electrostatic energy is not fully justified since the effective dielectric reflects non-electrostatic contributions. However, this argument overlooks the logical definition of energy contributions in biophysics. That is, if a given effect scales according to the charges of the system, it can and should be defined as an electrostatic contribution. For example, the free energy change associated with turning a charge on or off (e.g., in redox and pKa processes) is rigorously defined as electrostatic free energy, despite the fact that the corresponding charging process is associated with a response of the system that reflects all the forces and interactions in the system. The same is true for the charging of an ion in water, which reflects the reorientation of the water molecules and even the water–water Van der Walls interactions. On the other hand, if the folding energy is due to the size of different residues, rather than their charges, it would be logical to state that the folding energy reflects steric repulsion. Alternatively, if we will be able to correlate the absolute folding free energy with the hydrophobicity of the residues, we will be able to say that the folding free energy is determined by the hydrophobic effect. The same is true with regards to configurational entropy effects. This does not mean that hydrophobic and entropic effects do not contribute to the folding free energy. However, the present study indicates that theses contributions may cancel each other in some way. More specifically, it has been long argued that hydrophobic effects contribute in a major way to protein stability (e.g., Refs. 70, 71). While there are overwhelming evidences that such effects are a crucial part of the folding free energy, we are not aware of any clear correlation between the number or position of the hydrophobic residues and the observed absolute protein stability. Thus, although the reason for the luck of correlation is unclear (it may be due to a compensation by configuartional entropy) it seems to us that (until proved otherwise) we should accept the fact that the absolute protein stability is not correlated with the markers of hydrophobic free energy.
One may also claim that our effective dielectric reflect non-electrostatic contributions such as the hydrophobic contributions in the (B) → (D) step of our cycle. This argument is reasonable as much as the cycle (A) → (E) is concerned but less reasonable when Eq. (3) captures the trend in the observed folding energy. At any rate, as long as we cannot establish correlation between hydrophobic markers and absolute stability it would be hard to attribute the folding energy to such effects.
It must be stated here that the present work has not established that the correlation with the electrostatic free energy is perfect. Obvious examples are the effects of SS bonds in the proteins thioredoxin, ferredoxin, and glucanace C. Cases of interaction with the ligand were not explored here carefully by considering the relevant electrostatic contributions. Another interesting case is observed for example in the case of residues of staphylococcal nuclease,43 where the deviation between calculated and observed folding energy is due most probably to the hydrophobic contribution. However, the overall trend is clearly best and overwhelmingly correlated with the electrostatic free energy.
We would also like to mention that it is very likely that we will identify cases where the dielectric constant will not follow our idealized roles. For example, there could be cases where we will have to use higher , when we have a large number of neighboring charge residues, that will create the equivalent of a high ionic strength. There could be also cases where at close distance will have to be decreased (this may be handled by employing a distance dependent dielectric30). To explore and analyze such cases, we will have to focus on more comparative mutational studies, in particular between thermophile and mesophile enzymes. Obviously, further experiments and theoretical studies are clearly needed. Fortunately, recent experimental studies provide remarkable benchmark for such detailed studies.19 At any rate, regardless of the formal justification of our approach, its power can be exploited in predicting and analyzing protein stabilities.
As mentioned in the results-section, we have not found a correlation between the ΔGfold and the size of the protein or other characteristic properties. We did observe, however, that regardless of the size or the fold of a protein, a single mutation that turns a charge on or off may have a large impact on stability. This is clearly shown in the case of the staphylococcal nuclease mutations,43 where a single mutation of a non-charged residue into Lysine, reduced the protein’s stability by the enormous amount of around 9 kcal/mole. Therefore, we consider the charges of a protein as a more important factor in determining the overall stability, than the size or the fold. Obviously, various protein sizes and folds may reflect the amount and type of certain ionizable residues within a protein, but in the end it is the charges of those ionizable residues that define the stability.
It may be useful to point out here that Gurd and coworkers72 invoked the idea that the stability of proteins could be evaluated by considering the ΔGij obtained with of 40 or larger. However, this was an almost obvious conclusion for those who accepted the Tanford Kirkwood model73 which overlooked the self energy term and the destabilization expected from this term in non polar regions of the protein (see discussion in Warshel et al.74). The interesting finding that seems to emerge from the present and other works is the observation that despite the fact that charges should not be stable in non polar environments the effective environment around charges in proteins behave as a relatively polar environment.
It is significant, however, to note that Eq. (3) does not produce perfect results [and the same is true for Eq. (7)] when and are equal. In some cases, we had to use , which is significantly smaller that 40 and this reflects the effect of the self energy. Yet this effect is much smaller than what would be deduced from the assumption that the charged groups are in partially non polar environment. In fact, as established in our previous work (Ref. 20) and the present work the best fitted is much larger than the one that is requires to obtain the observed pKa’s. The reasons for this have been discussed in great length in Ref. 20 and it is associated with the compensation in the (D) → (E) step in Figure 1.
Another interesting issue is the acid unfolding effect.75 Here, again we are dealing with a full electrostatic effect exactly as is the case in redox processes since the unfolding upon charging of ionizable residues reflects exactly the free energy of the charging process, which always involves structural rearrangements. In other words, it does not matter that the unfolding process is opposed by hydrophobic forces, the overall energetics is rigorously the charging free energy and thus electrostatic free energy. This is very different than the (B) → (D) free energy which is a non-electrostatic energy as it does not involve any change in charge. At any rate, predicting the energetics of the acid unfolding is clearly a task that should be accomplished by our formulation.
The present approach focuses in absolute folding energy and is likely to work well on the less challenging problem of exploring mutational effects. The limited test cases of mutational effects examined in this present work provides encouraging results, but further studies are needed. Here, we have a very extensive set of relevant experiments (e.g., Ref. 43–46, 58).
ACKNOWLEDGMENTS
The authors thank USC’s High Performance Computing and Communication Center (HPCC) for computer time.
Grant sponsor: NIH; Grant number: GM24492
REFERENCES
- 1.Jaenicke R. Protein stability and molecular adaptation to extreme conditions. Eur J Biochem. 1991;202:715–728. doi: 10.1111/j.1432-1033.1991.tb16426.x. [DOI] [PubMed] [Google Scholar]
- 2.Luke KA, Higgins CL, Wittung-Stafshede P. Thermodynamic stability and folding of proteins from hyperthermophilic organisms. FEBS J. 2007;274:4023–4033. doi: 10.1111/j.1742-4658.2007.05955.x. [DOI] [PubMed] [Google Scholar]
- 3.Razvi A, Scholtz JM. Lessons in stability from thermophilic proteins. Protein Sci. 2006;15:1569–1578. doi: 10.1110/ps.062130306. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Clark NS, Dodd I, Mossakowska DE, Smith RA, Gore MG. Folding and conformational studies on SCR1-3 domains of human complement receptor 1. Protein Eng. 1996;9:877–884. doi: 10.1093/protein/9.10.877. [DOI] [PubMed] [Google Scholar]
- 5.Dill KA. Dominant forces in protein folding. Biochemistry. 1990;29:7133–7155. doi: 10.1021/bi00483a001. [DOI] [PubMed] [Google Scholar]
- 6.Fan ZZ, Hwang JK, Warshel A. Using simplified protein representation as a reference potential for all-atom calculations of folding free energy. Theor Chem Acc. 1999;103:77–80. [Google Scholar]
- 7.Go N. Theoretical-studies of protein folding. Annu Rev Biophys Bioeng. 1983;12:183–210. doi: 10.1146/annurev.bb.12.060183.001151. [DOI] [PubMed] [Google Scholar]
- 8.Karanicolas J, Brooks CL. The structural basis for biphasic kinetics in the folding of the WW domain from a formin-binding protein: Lessons for protein design? Proc Natl Acad Sci USA. 2003;100:3954–3959. doi: 10.1073/pnas.0731771100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Khalili M, Liwo A, Scheraga HA. Kinetic studies of folding of the B-domain of staphylococcal protein A with molecular dynamics and a united-residue (UNRES) model of polypeptide chains. J Mol Biol. 2006;355:536–547. doi: 10.1016/j.jmb.2005.10.056. [DOI] [PubMed] [Google Scholar]
- 10.Levitt M, Warshel A. Computer-simulation of protein folding. Nature. 1975;253:694–698. doi: 10.1038/253694a0. [DOI] [PubMed] [Google Scholar]
- 11.Onuchic JN, Wolynes PG, Lutheyschulten Z, Socci ND. Toward an outline of the topography of a realistic protein-folding funnel. Proc Natl Acad Sci USA. 1995;92:3626–3630. doi: 10.1073/pnas.92.8.3626. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Snow CD, Nguyen N, Pande VS, Gruebele M. Absolute comparison of simulated and experimental protein-folding dynamics. Nature. 2002;420:102–106. doi: 10.1038/nature01160. [DOI] [PubMed] [Google Scholar]
- 13.Hendsch ZS, Tidor B. Do salt bridges stabilize proteins—a continuum electrostatic analysis. Protein Sci. 1994;3:211–226. doi: 10.1002/pro.5560030206. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Honig B, Yang AS. Free-energy balance in protein-folding. Adv Protein Chem. 1995;46:27–58. doi: 10.1016/s0065-3233(08)60331-9. [DOI] [PubMed] [Google Scholar]
- 15.Schwehm JM, Fitch CA, Dang BN, Garcia-Moreno EB, Stites WE. Changes in stability upon charge reversal and neutralization substitution in staphylococcal nuclease are dominated by favorable electrostatic effects. Biochemistry. 2003;42:1118–1128. doi: 10.1021/bi0266434. [DOI] [PubMed] [Google Scholar]
- 16.Whitten ST, Garcia-Moreno EB. pH dependence of stability of staphylococcal nuclease: evidence of substantial electrostatic interactions in the denatured state. Biochemistry. 2000;39:14292–14304. doi: 10.1021/bi001015c. [DOI] [PubMed] [Google Scholar]
- 17.Giletto A, Pace CN. Buried, charged, non-ion-paired aspartic acid 76 contributes favorably to the conformational stability of ribonuclease T-1. Biochemistry. 1999;38:13379–13384. doi: 10.1021/bi991422s. [DOI] [PubMed] [Google Scholar]
- 18.Xiao L, Honig B. Electrostatic contributions to the stability of hyperthermophilic proteins. J Mol Biol. 1999;289:1435–1444. doi: 10.1006/jmbi.1999.2810. [DOI] [PubMed] [Google Scholar]
- 19.Isom DG, Cannon BR, Castaneda CA, Robinson A, Garcia-Moreno EB. High tolerance for ionizable residues in the hydrophobic interior of proteins. Proc Natl Acad Sci USA. 2008;105:17784–17788. doi: 10.1073/pnas.0805113105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Roca M, Messer B, Warshel A. Electrostatic contributions to protein stability and folding energy. FEBS Lett. 2007;581:2065–2071. doi: 10.1016/j.febslet.2007.04.025. [DOI] [PubMed] [Google Scholar]
- 21.Strickler SS, Gribenko AV, Gribenko AV, Keiffer TR, Tomlinson J, Reihle T, Loladze VV, Makhatadze GI. Protein stability and surface electrostatics: a charged relationship. Biochemistry. 2006;45:2761–2766. doi: 10.1021/bi0600143. [DOI] [PubMed] [Google Scholar]
- 22.Gribenko AV, Makhatadze GI. Role of the charge-charge interactions in defining stability and halophilicity of the CspB proteins. J Mol Biol. 2007;366:842–856. doi: 10.1016/j.jmb.2006.11.061. [DOI] [PubMed] [Google Scholar]
- 23.Schweiker KL, Zarrine-Afsar A, Davidson AR, Makhatadze GI. Computational design of the Fyn SH3 domain with increased stability through optimization of surface charge charge interactions. Protein Sci. 2007;16:2694–2702. doi: 10.1110/ps.073091607. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Baran KL, Chimenti MS, Schlessman JL, Fitch CA, Herbst KJ, Garcia-Moreno BE. Electrostatic effects in a network of polar and ionizable groups in staphylococcal nuclease. J Mol Biol. 2008;379:1045–1062. doi: 10.1016/j.jmb.2008.04.021. [DOI] [PubMed] [Google Scholar]
- 25.Lee KK, Fitch CA, Garcia-Moreno EB. Distance dependence and salt sensitivity of pairwise, coulombic interactions in a protein. Protein Sci. 2002;11:1004–1016. doi: 10.1110/ps.4700102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Pace CN. Single surface stabilizer. Nat Struct Biol. 2000;7:345–346. doi: 10.1038/75100. [DOI] [PubMed] [Google Scholar]
- 27.Trefethen JM, Pace CN, Scholtz JM, Brems DN. Charge-charge interactions in the denatured state influence the folding kinetics of ribonuclease Sa. Protein Sci. 2005;14:1934–1938. doi: 10.1110/ps.051401905. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Trevino SR, Gokulan K, Newsom S, Thurlkill RL, Shaw KL, Mitkevich VA, Makarov AA, Sacchettini JC, Scholtz JM, Pace CN. Asp79 makes a large, unfavorable contribution to the stability of RNase Sa. J Mol Biol. 2005;354:967–978. doi: 10.1016/j.jmb.2005.09.091. [DOI] [PubMed] [Google Scholar]
- 29.Warshel A, Sharma PK, Kato M, Parson WW. Modeling electrostatic effects in proteins. Biochim Biophys Acta. 2006;1764:1647–1676. doi: 10.1016/j.bbapap.2006.08.007. [DOI] [PubMed] [Google Scholar]
- 30.Schutz CN, Warshel A. What are the dielectric ‘constants’ of proteins and how to validate electrostatic models. Proteins. 2001;44:400–417. doi: 10.1002/prot.1106. [DOI] [PubMed] [Google Scholar]
- 31.Sham YY, Muegge I, Warshel A. The effect of protein relaxation on charge-charge interactions and dielectric constants of proteins. Biophys J. 1998;74:1744–1753. doi: 10.1016/S0006-3495(98)77885-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Sham YY, Chu ZT, Warshel A. Consistent calculations of pKa’s of ionizable residues in proteins: semi-microscopic and microscopic approaches. J Phys Chem B. 1997;101:4458–4472. [Google Scholar]
- 33.Shan B, Bhattacharya S, Eliezer D, Raleigh DP. The low-pH unfolded state of the C-terminal domain of the ribosomal protein L9 contains significant secondary structure in the absence of denaturant but is no more compact than the low-pH urea unfolded state. Biochemistry. 2008;47:9565–9573. doi: 10.1021/bi8006862. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Wong KB, Lee CF, Chan SH, Leung TY, Chen YW, Bycroft M. Solution structure and thermal stability of ribosomal protein L30e from hyperthermophilic archaeon Thermococcus celer. Protein Sci. 2003;12:1483–1495. doi: 10.1110/ps.0302303. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.King G, Warshel A. A surface constrained all-atom solvent model for effective simulations of polar solutions. J Chem Phys. 1989;91:3647–3661. [Google Scholar]
- 36.Lee FS, Chu ZT, Warshel A. Microscopic and semimicroscopic calculations of electrostatic energies in proteins by the POLARIS and ENZYMIX programs. J Comp Chem. 1993;14:161–185. [Google Scholar]
- 37.Kumar S, Tsai CJ, Nussinov R. Maximal stabilities of reversible two-state proteins. Biochemistry. 2002;41:5359–5374. doi: 10.1021/bi012154c. [DOI] [PubMed] [Google Scholar]
- 38.Feng YQ, Sligar SG. Effect of heme binding on the structure and stability of Escherichia coli apocytochrome b562. Biochemistry. 1991;30:10150–10155. doi: 10.1021/bi00106a011. [DOI] [PubMed] [Google Scholar]
- 39.Bowie JU, Sauer RT. Equilibrium dissociation and unfolding of the Arc repressor dimer. Biochemistry. 1989;28:7139–7143. doi: 10.1021/bi00444a001. [DOI] [PubMed] [Google Scholar]
- 40.Hollien J, Marqusee S. A thermodynamic comparison of mesophilic and thermophilic ribonucleases H. Biochemistry. 1999;38:3831–3836. doi: 10.1021/bi982684h. [DOI] [PubMed] [Google Scholar]
- 41.Li WT, Grayling RA, Sandman K, Edmondson S, Shriver JW, Reeve JN. Thermodynamic stability of archaeal histones. Biochemistry. 1998;37:10563–10572. doi: 10.1021/bi973006i. [DOI] [PubMed] [Google Scholar]
- 42.Strop P, Mayo SL. Contribution of surface salt bridges to protein stability. Biochemistry. 2000;39:1251–1255. doi: 10.1021/bi992257j. [DOI] [PubMed] [Google Scholar]
- 43.Nguyen DM, Leila Reynald R, Gittis AG, Lattman EE. X-ray and thermodynamic studies of staphylococcal nuclease variants I92E and I92K: insights into polarity of the protein interior. J Mol Biol. 2004;341:565–574. doi: 10.1016/j.jmb.2004.05.066. [DOI] [PubMed] [Google Scholar]
- 44.Jacob M, Holtermann G, Perl D, Reinstein J, Schindler T, Geeves MA, Schmid FX. Microsecond folding of the cold shock protein measured by a pressure-jump technique. Biochemistry. 1999;38:2882–2891. doi: 10.1021/bi982487i. [DOI] [PubMed] [Google Scholar]
- 45.Mueller U, Perl D, Schmid FX, Heinemann U. Thermal stability and atomic-resolution crystal structure of the Bacillus caldolyticus cold shock protein. J Mol Biol. 2000;297:975–988. doi: 10.1006/jmbi.2000.3602. [DOI] [PubMed] [Google Scholar]
- 46.Went HM, Jackson SE. Ubiquitin folds through a highly polarized transition state. Protein Eng Des Sel. 2005;18:229–237. doi: 10.1093/protein/gzi025. [DOI] [PubMed] [Google Scholar]
- 47.Khorasanizadeh S, Peters ID, Butt TR, Roder H. Folding and stability of a tryptophan-containing mutant of ubiquitin. Biochemistry. 1993;32:7054–7063. doi: 10.1021/bi00078a034. [DOI] [PubMed] [Google Scholar]
- 48.Dams T, Jaenicke R. Stability and folding of dihydrofolate reductase from the hyperthermophilic bacterium Thermotoga maritima. Biochemistry. 1999;38:9169–9178. doi: 10.1021/bi990635e. [DOI] [PubMed] [Google Scholar]
- 49.Mukaiyama A, Takano K, Haruki M, Morikawa M, Kanaya S. Kinetically robust monomeric protein from a hyperthermophile. Biochemistry. 2004;43:13859–13866. doi: 10.1021/bi0487645. [DOI] [PubMed] [Google Scholar]
- 50.Creagh AL, Koska J, Johnson PE, Tomme P, Joshi MD, McIntosh LP, Kilburn DG, Haynes CA. Stability and oligosaccharide binding of the N1 cellulose-binding domain of Cellulomonas fimi endoglucanase CenC. Biochemistry. 1998;37:3529–3537. doi: 10.1021/bi971983o. [DOI] [PubMed] [Google Scholar]
- 51.Janssen MJ, van de Wiel WA, Beiboer SH, van Kampen MD, Verheij HM, Slotboom AJ, Egmond MR. Catalytic role of the active site histidine of porcine pancreatic phospholipase A2 probed by the variants H48Q, H48N and H48K. Protein Eng. 1999;12:497–503. doi: 10.1093/protein/12.6.497. [DOI] [PubMed] [Google Scholar]
- 52.Jackson SE, Moracci M, elMasry N, Johnson CM, Fersht AR. Effect of cavity-creating mutations in the hydrophobic core of chymotrypsin inhibitor 2. Biochemistry. 1993;32:11259–11269. doi: 10.1021/bi00093a001. [DOI] [PubMed] [Google Scholar]
- 53.Chrunyk BA, Evans J, Lillquist J, Young P, Wetzel R. Inclusion body formation and protein stability in sequence variants of interleukin-1 beta. J Biol Chem. 1993;268:18053–18061. [PubMed] [Google Scholar]
- 54.Zhou Z, Feng H, Bai Y. Detection of a hidden folding intermediate in the focal adhesion target domain: Implications for its function and folding. Proteins. 2006;65:259–265. doi: 10.1002/prot.21107. [DOI] [PubMed] [Google Scholar]
- 55.Cowley AB, Altuve A, Kuchment O, Terzyan S, Zhang X, Rivera M, Benson DR. Toward engineering the stability and hemin-binding properties of microsomal cytochromes b5 into rat outer mitochondrial membrane cytochrome b5: examining the influence of residues 25 and 71. Biochemistry. 2002;41:11566–11581. doi: 10.1021/bi026005l. [DOI] [PubMed] [Google Scholar]
- 56.Sandberg WS, Terwilliger TC. Engineering multiple properties of a protein by combinatorial mutagenesis. Proc Natl Acad Sci USA. 1993;90:8367–8371. doi: 10.1073/pnas.90.18.8367. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Hornby JA, Luo JK, Stevens JM, Wallace LA, Kaplan W, Armstrong RN, Dirr HW. Equilibrium folding of dimeric class mu glutathione transferases involves a stable monomeric intermediate. Biochemistry. 2000;39:12336–12344. doi: 10.1021/bi000176d. [DOI] [PubMed] [Google Scholar]
- 58.Mach H, Ryan JA, Burke CJ, Volkin DB, Middaugh CR. Partially structured self-associating states of acidic fibroblast growth factor. Biochemistry. 1993;32:7703–7711. doi: 10.1021/bi00081a015. [DOI] [PubMed] [Google Scholar]
- 59.Mainfroid V, Mande SC, Hol WG, Martial JA, Goraj K. Stabilization of human triosephosphate isomerase by improvement of the stability of individual alpha-helices in dimeric as well as monomeric forms of the protein. Biochemistry. 1996;35:4110–4117. doi: 10.1021/bi952692n. [DOI] [PubMed] [Google Scholar]
- 60.Chen L, Cabrita GJ, Otzen DE, Melo EP. Stabilization of the ribosomal protein S6 by trehalose is counterbalanced by the formation of a putative off-pathway species. J Mol Biol. 2005;351:402–416. doi: 10.1016/j.jmb.2005.05.056. [DOI] [PubMed] [Google Scholar]
- 61.Cutler RL, Davies AM, Creighton S, Warshel A, Moore GR, Smith M, Mauk AG. Role of arginine-38 in regulation of the cytochrome c oxidation-reduction equilibrium. Biochemistry. 1989;28:3188–3197. doi: 10.1021/bi00434a012. [DOI] [PubMed] [Google Scholar]
- 62.Copeland RA, Ji H, Halfpenny AJ, Williams RW, Thompson KC, Herber WK, Thomas KA, Bruner MW, Ryan JA, Marquis-Omer D, Sanyal G, Sitrin RD, Yamazaki S, Middaugh CR. The structure of human acidic fibroblast growth factor and its interaction with heparin. Arch Biochem Biophys. 1991;289:53–61. doi: 10.1016/0003-9861(91)90441-k. [DOI] [PubMed] [Google Scholar]
- 63.Gospodarowicz D, Cheng J. Heparin protects basic and acidic FGF from inactivation. J Cell Physiol. 1986;128:475–484. doi: 10.1002/jcp.1041280317. [DOI] [PubMed] [Google Scholar]
- 64.Katti SK, LeMaster DM, Eklund H. Crystal structure of thioredoxin from Escherichia coli at 1.68 A resolution. J Mol Biol. 1990;212:167–184. doi: 10.1016/0022-2836(90)90313-B. [DOI] [PubMed] [Google Scholar]
- 65.Macedo-Ribeiro S, Darimont B, Sterner R, Huber R. Small structural changes account for the high thermostability of 1[4Fe-4S] ferredoxin from the hyperthermophilic bacterium Thermotoga maritima. Structure. 1996;4:1291–1301. doi: 10.1016/s0969-2126(96)00137-2. [DOI] [PubMed] [Google Scholar]
- 66.Matsumura M, Matthews BW. Stabilization of functional proteins by introduction of multiple disulfide bonds. Methods Enzymol. 1991;202:336–356. doi: 10.1016/0076-6879(91)02018-5. [DOI] [PubMed] [Google Scholar]
- 67.Zavodszky P, Kardos J, Svingor A, Petsko GA. Adjustment of conformational flexibility is a key event in the thermal adaptation of proteins. Proc Natl Acad Sci USA. 1998;95:7406–7411. doi: 10.1073/pnas.95.13.7406. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Reetz MT, Carballeira JD, Vogel A. Iterative saturation mutagenesis on the basis of B factors as a strategy for increasing protein thermostability. Angew Chem Int Ed Engl. 2006;45:7745–7751. doi: 10.1002/anie.200602795. [DOI] [PubMed] [Google Scholar]
- 69.Jaeger KE, Dijkstra BW, Reetz MT. Bacterial biocatalysts: molecular biology, three-dimensional structures, and biotechnological applications of lipases. Annu Rev Microbiol. 1999;53:315–351. doi: 10.1146/annurev.micro.53.1.315. [DOI] [PubMed] [Google Scholar]
- 70.Privalov PL. Stability of proteins: small globular proteins. Adv Protein Chem. 1979;33:167–241. doi: 10.1016/s0065-3233(08)60460-x. [DOI] [PubMed] [Google Scholar]
- 71.Creighton TE. Protein folding. Biochem J. 1990;270:1–16. doi: 10.1042/bj2700001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Matthew JB, Gurd FR, Garcia-Moreno B, Flanagan MA, March KL, Shire SJ. pH-dependent processes in proteins. CRC Crit Rev Biochem. 1985;18:91–197. doi: 10.3109/10409238509085133. [DOI] [PubMed] [Google Scholar]
- 73.Tanford C, Kirkwood JG. Theory of protein titration curves. I. General equations for impenetrable spheres. J Am Chem Soc. 1957;79:5333. [Google Scholar]
- 74.Warshel A, Russell ST, Churg AK. Macroscopic models for studies of electrostatic interactions in proteins: limitations and applicability. Proc Natl Acad Sci USA. 1984;81:4785–4789. doi: 10.1073/pnas.81.15.4785. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Perutz MF. Electrostatic effects in proteins. Science. 1978;201:1187–1191. doi: 10.1126/science.694508. [DOI] [PubMed] [Google Scholar]









