Abstract
We attempt to understand the origin of enhanced stability in thermophilic proteins by analyzing thermodynamic data for 116 proteins, the largest data set achieved to date. We compute changes in entropy and enthalpy at the convergence temperature where different driving forces are maximally decoupled, in contrast to the majority of previous studies that were performed at the melting temperature. We find, on average, that the gain in enthalpy upon folding is lower in thermophiles than in mesophiles, whereas the loss in entropy upon folding is higher in mesophiles than in thermophiles. This implies that entropic stabilization may be responsible for the high melting temperature, and hints at residual structure or compactness of the denatured state in thermophiles. We find a similar trend by analyzing a homologous set of proteins classified based only on the optimum growth temperature of the organisms from which they were extracted. We find that the folding free energy at the temperature of maximal stability is significantly more favorable in thermophiles than in mesophiles, whereas the maximal stability temperature itself is similar between these two classes. Furthermore, we extend the thermodynamic analysis to model the entire proteome. The results explain the high optimal growth temperature in thermophilic organisms and are in excellent quantitative agreement with full thermal growth rate data obtained in a dozen thermophilic and mesophilic organisms.
Introduction
Thermophilic proteins denature at a much higher temperature than regular mesophilic proteins. Understanding the origin of this enhanced thermostability in such proteins has become a fundamental goal in the field of protein biochemistry. Studying different mechanisms by which proteins increase or decrease stability can teach us the fundamentals of protein thermodynamics and help us design new enzymes with desired stability. The vast majority of biophysical studies have been directed toward understanding the origin of enhanced stability in proteins under conditions of high temperature (1–6), and only a few studies have investigated acidophilic and halophilic (7) enzymes as well. The unusual tolerance to high temperature raises several interesting questions: What is responsible for the stability of thermophilic proteins? Is it the significant alteration of the average enthalpy, entropy, or specific heat, or a combination of all these? Is there any systematic principle that proteins may utilize to withstand such high temperatures? Does a high melting temperature, Tm, also imply a high maximal stability free energy?
Various researchers have tried to address these questions by directly comparing different homologs of proteins extracted from mesophilic and thermophilic organisms (1,2,8–14). However, it is not clear which of these mechanisms proteins adopt, or whether proteins adopt different mechanisms simultaneously to a different extent. It has been widely demonstrated that reduced leads to increased stability (11,13–15) by broadening the melting curve while keeping the location and magnitude of the maximum of the stability curve unchanged. A direct comparison between several thiermophilic and mesophilic homologs supports this observation (11). However, a significant number of studies suggested otherwise by demonstrating little dependence between Tm and the of unfolding (9,16). Furthermore, mounting evidence indicates a possible connection between the denatured state and enhanced stability (13,15,17–20). Several careful analyses have shown that denatured states are more compact in thermophiles than in mesophiles and may retain residual structure (13,18), indicating the role of entropy in determining stability. Anther possibility is that specific amino acid substitutions lead to reduced entropy in the unfolded state due to different degrees of flexibility associated with them (17,21). The effect of entropy on increased stability may also arise from different degrees of compactness in the native structure as a result of different mutations (22). The role of electrostatics has also been attributed to enhanced stability (23–30). Genome- and proteome-wide analyses have been carried out to elucidate the origin of stability (5,25,31–36).
These different and often contradictory studies make it very hard to identify the principle behind increased stability. Conclusions are very specific to proteins, and to date no systematic study (8) has employed a large data set of different protein families to explain increased stability. Sometimes the conclusions conflict depending on the list of proteins studied. We believe this is mainly due to the lack of 1), a systematic analysis of a large data set (as previous studies were mostly restricted to smaller sets of proteins); and 2), proper decoupling of different driving forces. The latter point is related to the fact that enthalpy and entropy changes are significantly temperature-dependent. Due to nonzero arising from the hydrophobic effect, both enthalpic and entropic changes are temperature-dependent in the following manner (37,38):
(1) |
where Th and Ts are two reference temperatures. This also raises the natural question: At what temperature should one compute these quantities for comparison? From Eq. 1 it is evident that both enthalpy and entropy changes have different hydrophobic contributions (due to nonzero ) at different temperatures. For example, the total change in entropy has a contribution due to the configurational entropy of the protein chain as well as the mixing entropy of amino acids. This makes it difficult to isolate and study the role of different driving forces separately.
To significantly decouple the hydrophobic effect from conformational entropy and purely enthalpic contribution (polar and van der Waals forces), one should compute and at temperatures where these effects are minimal. Extensive studies (37,39–42) showed the existence of a temperature at which enthalpy and entropy (per residue) for many different proteins converge. This temperature, known as the convergence temperature, is now believed to be the temperature at which hydrophobic effects are zero and the contribution to enthalpy is purely due to van der Waals or polar (hydrogen bond) interactions. Similarly, at the convergence temperature, entropy is primarily conformational in origin, because the hydrophobic contribution is minimal. Hydrocarbon-transfer experiments by Baldwin demonstrated the existence of a similar convergence temperature at which the transfer entropy is zero and very close to the protein convergence temperature (41). This finding strongly suggests that at this temperature, the protein chain conformational entropy is significantly decoupled from the solvent entropy. Extensive work by Robertson and Murphy (37) provided the most recent and reliable estimate of these convergence temperatures based on the largest set of proteins: for entropy, and K for enthalpy. Along with the original work by Robertson and Murphy (37), in a previous study (38) we showed that , can be very well approximated as a linear function of chain length N when computed at 373.5 K and 385 K, respectively. In fact, the correlation coefficient between the changes in enthalpy and entropy versus chain length is highest at these two temperatures (37). Based on all of these findings, and several other studies (43,44), it is clear that at these temperatures, sequence effects are minimal and the major contribution to enthalpy and entropy is due to the polymeric nature of the protein alone (37,45). Thus, the slope and intercept of the linear dependence of these properties on protein chain length give us an average estimate of changes in enthalpy, conformational entropy, and specific heat of a protein upon folding purely based on the chain length N. This defines an ideal thermal protein and can serve as a first estimate for the folding free energy when we do not have any information about the protein other than its chain length (38).
In this work, we compute and compare changes in entropy and enthalpy of thermophilic and mesophilic proteins at and , respectively, for two reasons: 1), at these convergence temperatures, the hydrophobic effect can be separated from enthalpy and conformation entropy, and thus different driving forces are maximally decoupled; and 2), it provides a common reference temperature to compare different proteins. This has not been explored before and is in striking contrast to common practice of computing thermodynamic properties at the Tm for comparison. However, it is not guaranteed that at Tm the hydrophobic effect is separate from the conformational entropy.
Below, we outline how we carry out the analysis to derive ideal-thermal-protein parameter values from the largest protein set achieved to date, almost doubling the previous largest set (37). We then proceed to further divide this set into two classes: one based on Tm-values, and a smaller set of homologous proteins based on the optimal growth temperature of the organism from which proteins were selected. From our classification scheme, we find new ideal-thermal-protein parameter values for thermophilic (high Tm) and mesophilic (low Tm) proteins. We carry out a statistical analysis on the distribution of entropy, enthalpy, and specific heat changes (per amino acid) between these two classes of proteins. The results reveal a new, to our knowledge, thermodynamic principle that proteins on average may adopt to withstand high temperature. In general, we find that lower entropic loss upon folding may be responsible for enhanced stability in thermophilic proteins. Finally, we show how, based on these new parameters, we can model the entire proteome of different organisms and compare the results with thermal growth rate data.
Materials and Methods
Model
We carried out a thermodynamic analysis of proteins based on a significantly large data set (116 proteins), almost doubling the number of proteins (n = 63) from the earlier work of Robertson and Murphy (37). In this analysis we compute the differences in entropy , enthalpy , and specific-heat between two states (denatured and native). Thus, is defined as where Su is the entropy of the unfolded state and Sf is the entropy of the folded state. This definition is used throughout for enthalpy, specific heat, and free-energy change as well.
As outlined above, it is most instructive to compute at 373.5 K and at 385 K to maximally decouple the effects of sequence and other driving forces. Furthermore, at these two special temperatures, K and K, the changes in enthalpy and entropy show a strong linear chain length dependence (37). One can compute these quantities from enthalpy and entropy values reported at Tm using the following equation:
(2) |
We assume that specific heat is independent of temperature (37,39). We removed all of the proteins that were either multimeric or non-two-state folders from the original list of the Robertson and Murphy (37) analysis. We added several new two-state folders for which we could find changes in enthalpy, entropy, and specific heat. Furthermore, our search was limited to only monomeric proteins under conditions where reversible transition was observed and closer to the isoelectric point to further decouple the electrostatic contributions. This resulted in a total of 116 strictly monomeric proteins, which are listed in Table S1 of the Supporting Material along with their sources. Our analysis also extends the range of applicability by including proteins with a longer chain length and a wider range of Tm-values than originally considered in the Robertson and Murphy (37) analysis.
However, because this set does not distinguish moderate-Tm (mesophilic) and high-Tm (thermophilic) proteins, which may have different thermodynamic properties, we further divided the set of 116 proteins into two sets. We achieve this by defining a cutoff Tm (Tc). For this classification, we revisited the analysis described above. We determined the optimal cutoff temperature Tc by minimizing the least-square error of fitting the chain-length-dependent linear equation for all three thermodynamic quantities and separately when proteins were subdivided into two classes: 1), proteins with Tm > Tc; and 2), proteins with Tm < Tc. This method yielded a choice of when the least-square error was minimized for enthalpy, entropy, and specific heat change independently. Thus, we determine as the Tm below which we identify proteins as mesophilic, and above which they are termed thermophilic for this analysis. We also note the least-square fitting error of these quantities with chain length was significantly reduced when proteins were subdivided into two families compared with the undivided set.
Results
We found that the linear correlation of the overall set was slightly lower than that reported by Robertson and Murphy (37), but the average thermodynamic parameters changed slightly. Fig. S1 shows the results of this analysis. The slopes and intercepts of these different thermodynamic quantities against chain length determine the properties of an ideal thermal protein (38).
Ideal mesophilic and thermophilic proteins have different thermodynamic properties
It is likely that proteins with higher Tm evolved with a different set of thermodynamic rules than their mesophilic counterparts. Thus, it is natural to think that thermophiles and mesophiles would have significantly different ideal-thermal-protein parameter values. Several indirect experimental results support this. For example, by comparing thermodynamic properties between thermophilic and mesophilic homologs based on a native-state hydrogen exchange, Hollien and Marqusee (46) demonstrated that the increased stability is not a result of localized effect, but is distributed throughout. A proportional increase in stability for all residues results in an overall enhanced stability, indicating the possible role played by simple properties (e.g., chain length) in stability determination, and highlighting the importance of studying average parameters. Prompted by this, we subdivided our master set into two classes as described in Materials and Methods. Based on the analysis outlined above, we find a good correlation between thermodynamic parameters and chain length for mesophilic proteins (see Fig. S2). These 59 proteins, out of a total of 116, have Tm-values below . Thus, based on this analysis of a modified data set of proteins, we find the new parameters for ideal mesophilic protein to be
(3) |
We classified the remaining 57 proteins with Tm ≥ as thermophilic proteins and carried out a similar analysis on this set (see Fig. S3). Once again, we find a good correlation between the thermodynamic parameters and the protein chain lengths for this thermophilic protein set. The new thermodynamic parameters thus obtained define an ideal thermophilic protein as
(4) |
In the absence of any information other than the chain length, one can estimate the thermophilic and mesophilic protein thermodynamic parameters based on Eqs. 3 and 4.
Specific enthalpy and entropy changes at the convergence temperature are lower in thermophiles than in mesophiles on average
Based on the slopes and intercepts reported above, it is clear that changes in entropy and enthalpy upon folding are lower in thermophiles than in mesophiles. Here, we use a slightly different and more rigorous approach to verify this finding. We compute changes in thermodynamic parameters per amino acid for mesophilic and thermophilic sets, and construct the distribution of , and . We compare the mean of these quantities between mesophiles and thermophiles. Our results are summarized in Table 1. From the numbers reported in the table, it is clear that thermophilic proteins on average have a lower value of enthalpic, entropic, and specific heat change per residue. We performed a two-sample t-test on these distributions, and the results indicate that thermophiles have a lower change in entropy per amino acid than mesophiles, with a p-value of 0.00002. We also find that changes in enthalpy per amino acid are lower for thermophiles than for mesophiles, with a p-value of 0.001, whereas for specific heat the p-value is 0.008.
Table 1.
Mean values of parameters | ||||
---|---|---|---|---|
Mesophile | Thermophile | p-Value | ||
5.18 | 4.52 | 0.001 | ||
16.97 | 14.12 | 0.00002 | ||
0.055 | 0.049 | 0.008 | ||
2.72 | 3.65 | 5.3 | ||
8.26 | 10.22 | 0.00003 | ||
281.6 | 284.2 | 0.3 | ||
0.21 | 0.40 | 3.3 |
Comparison at convergence temperatures reveals that the changes in enthalpy and entropy are smaller in thermophiles than in mesophiles. When we compare enthalpy and entropy changes at Tm-values, the thermophilic changes are greater. TS is the temperature of maximal stability that appears to be similar between thermophiles and mesophiles on average. However, free energy () at the temperature of maximal stability (per amino acid) is significantly more favorable in thermophiles than in mesophiles. The p-value is a measure of confidence from testing the difference of two means.
However, when enthalpy and entropy changes are computed at their respective Tm-values, we find the reverse effect. Based on the numbers reported in Table 1, we find that thermophiles have a higher change in specific entropy and enthalpy than mesophiles when computed at the Tm, in agreement with an earlier study (9).
Maximal stability free energy is more favorable in thermophiles than in mesophiles
Based on our two sets, we calculate the average of the temperature of maximal stability (Ts) and the free-energy change at this temperature (). We find that the temperature of maximal stability is similar between thermophiles and mesophiles. However, the free-energy change at this temperature is almost twice as favorable in thermophiles compared with mesophiles (see Table 1), in accordance with previous studies (9,47). This is also evident from a comparison of the stability curves of the ideal mesophilic (blue) and ideal thermophilic (red) proteins in Fig. 1.
Reduction in folding entropy (per amino acid) is responsible for high Tm
Using the thermodynamic parameters reported in Eqs. 3 and 4, we can compute the temperature-dependent free energy (in kJ/mol) of an ideal thermophilic and ideal mesophilic protein as
(5) |
(6) |
Using the average chain length () of our 116 protein data set in the equations above, and plotting as a function of temperature in Fig. 1, we can make the following points: 1) We see that the thermophilic curve (red) is broader and shifted upward and slightly to the right compared with the mesophilic curve (blue). 2) The thermophilic Tm is 25 K higher than the mesophilic Tm ( versus ), and these temperature ranges are approximately in the same range as reported earlier (47). 3) The cold denaturing temperature of thermophiles (239 K) is slightly colder than that of mesophiles (244K), as previously suggested (48). 4) When the mesophilic specific heat change is replaced by the corresponding thermophilic parameter, keeping others intact, we find almost negligible change in free energy with a slight destabilizing effect (orange curve). 5) If we substitute the mesophilic enthalpy change by the thermophilic value, keeping other parameters for mesophilic proteins intact, we find a strong destabilizing effect (green curve). 6) On the contrary, substituting only the mesophilic entropy by the thermophilic parameter, we find a significant stabilizing effect that shifts the Tm at a much higher value (black curve). Thus, varying all three parameters individually, we clearly demonstrate that an ideal thermophilic protein gains a high Tm by lowering the entropic loss upon folding.
Next, we directly compare all possible pairings of thermophilic with mesophilic parameters against the pairings' respective Tm-values. After decomposition, the above analysis yields 59 mesophiles and 57 thermophiles, leading to a total of 3363 comparable pairs. Also, from the above analysis, all thermophilic Tm-values are greater than the mesophilic values, so when we consider the difference in Tm-values of the ith thermophile to the jth mesophile, we get . Computing the difference in specific entropy for the same ith thermophile to jth mesophile pair gives
(7) |
Plotting versus for all 3363 possible i,j pairs, we see that of the pairs have a higher Tm associated with differences in entropic changes (per amino acid) that are less than zero. This implies that thermophilic is less than mesophilic for of possible pairings (see Fig. 2). We performed similar calculations for specific changes in enthalpy and specific heat. The difference in specific enthalpy change plot shows that of the pairings have a lower thermophilic enthalpic gain (per amino acid) than their mesophilic counterpart, and of the thermophilic is less than its mesophilic counterpart.
This analysis based on each protein pair shows a significant correlation between increased Tm and reduced entropy change upon folding, further justifying our claim based on the ideal-protein parameter comparison.
Homologous protein pairs reveal a similar trend
Thus far, our analysis has been based on a mean-field approach for a data set in which we classified proteins into thermophile and mesophile groups based on their respective Tm-values. Here we consider an alternate approach by constraining the data set to consist of only pairs, or groupings, of mesophile and thermophile homologs in which the classification is based on the organism from which the proteins have been extracted. Several proteins derived from thermophilic organisms have been studied (1,2,10,11) and compared with their homologs extracted from mesophilic organisms. The protein pairs considered here show either a low root mean-square deviation (RMSD) or a high sequence identity, and have been published as a relevant grouping based on their homology. Our 10 groupings of homologs include six thermophilic-mesophilic pairs and four groupings of at least four proteins, giving a total of 16 thermophiles and 17 mesophiles. The majority of the data shared at least sequence identity with the homologs and a backbone RMSD of <2 Å within each pairing or group. The following groups were published as being homologous, but were either below this criterion or unavailable to calculate: The pairing of MGMT-AdaC showed only sequence identity, but had a calculated backbone RMSD of 1.9 Å (49). The S16 pair had identity, but calculation of RMSD was unavailable (19). Calculation of sequence identity and RMSD was unavailable for Phycocyanin (50). Within the SH3 domain-containing group of eight proteins, certain pairs (e.g., Sac7d and Fyn) had an RMSD as high as 9.9 Å (9,16).
As described in the previous section, we compared each thermophilic protein with all other mesophilic proteins within the same group to compute differences in changes in specific entropy: , enthalpy , specific heat , and change in Tm (). When we directly compare these quantities only within groupings of homologs, we find that a high percentage of thermophilic entropy changes (per amino acid) are smaller than the mesophilic entropy changes (per amino acid) in the same group. Also, of changes in enthalpy (per amino acid), and of changes in specific heat (per amino acid) are smaller in thermophiles compared with their mesophilic counterparts (see Fig. 3). Thus, a direct comparison of normalized thermodynamic data shows that thermophiles have a high propensity to experience reduced entropic, enthalpic, and specific heat changes compared with their mesophilic counterparts. This gives additional support to our previous analysis based on Tm alone. Due to the small amount of data points, linear regression was not informative.
Growth-rate calculation of organisms and comparison with experiments
Based on our analysis above, we can model protein stability as a function of chain length using parameters of the ideal thermal protein (38,51). We can extend this to calculate the stability distribution of the entire proteome by using the chain length distribution of the proteome. For many organisms, the chain length distribution can be modeled as a Gamma distribution (52):
(8) |
where α and θ are two parameters. We previously used this approach to model proteome stability distribution for different organisms (e.g., Escherichia coli, yeast, and Caenorhabditis elegans) and growth rates (51). However, in this work, using the proteome chain length distribution and free-energy equations (Eqs. 5 and 6), we compute the free-energy distribution of the entire proteome to calculate the growth rate of several mesophilic and thermophilic organisms. As before, for a given proteome, we take
(9) |
where r0 is an intrinsic rate, and represents an Arrhenius activation barrier for a metabolic reaction rate (51,53,54). The product term describes the stabilities of proteins , where Γ is the number of essential proteins that are important for the growth rate. The expression above assumes that fitness depends on all of the essential proteins and their propensity to be in the folded state. This is motivated by the fact that compromising the stability of any of these essential proteins is lethal to the organism. This explains the product in Eq. 9. Furthermore, it assumes that growth rate is related to fitness. Equation 9 has already been successfully used to model growth rates in different organisms (51,53,54). Taking the logarithm of the rate gives Eq. 9 as
(10) |
We approximate the sum as the integral over the entire proteome free-energy distribution, , and express the average rate (53) as
(11) |
The expression above requires the stability distribution . We estimate this distribution by using the proteome length distribution (Eqs. 8) and free energy (Eq. 5 for mesophiles and Eq. 6 for thermophiles). Equation 11 predicts that cellular growth rates increase with temperature at low temperature due to the assumed activated process. However, growth rates decrease at high temperatures due to proteome denaturation (see Fig. 4). It predicts maximum growth at an optimal growth temperature. These curves are highly asymmetrical near their temperature of maximum growth, and our model predicts this well. For this calculation, our model requires two free parameters, and Γ, which we determine by fitting the experimental data on several mesophilic and thermophilic organisms (see Fig. 4). The values of the fitted parameters are reported in Table S2 using corresponding expressions of for the respective organism.
Discussion
In this work, we analyzed the largest data set obtained to date to compare thermodynamic properties between thermophiles and mesophiles. In contrast to previous studies, we computed enthalpy (per amino acid) and entropy (per amino acid) changes at the convergence temperature to compare thermophiles and mesophiles. Our rationale was to separate hydrophobic effects from other driving forces, e.g., to decouple conformational entropy and solvation entropy. With this definition, we find, in general and with high statistical confidence, a smaller gain in enthalpy (per amino acid) upon folding in thermophiles than in mesophiles. This is a destabilizing effect. However, when we consider conformational entropy, we find that, on average, thermophiles sacrifice less entropy upon folding than mesophiles, thus overcompensating for the destabilizing effect due to enthalpy. This is responsible for the extra stability in thermophiles. It may appear that our finding is in conflict with studies that predicted higher specific entropy and enthalpy in thermophiles than in mesophiles (9). This apparent contradiction is due to the temperatures at which the thermodynamic quantities were calculated. When we compute entropy and enthalpy at the Tm, we are able to reproduce the results of Kumar et al. (9). However, as noted above, the entropy computed at the Tm is not purely conformational due to the presence of solvation entropy. We find the temperatures for maximal stability to be similar among thermophiles and mesophiles, whereas the folding free energy (per amino acid) at these temperatures is significantly more favorable in thermophiles than in mesophiles, in agreement with previous studies (9,47).
To our knowledge, the analysis presented here is novel because it computes thermodynamic properties at the convergence temperature to depict the lower folding entropy (conformational) and enthalpy associated with thermophiles. Furthermore, the combined findings of reduced entropy loss and lowered enthalpy gain raise the possibility that thermophiles may retain partial contacts in their denatured state. The presence of strong hydrophobic interactions, disulfide bonds (31,55), or electrostatic interactions (24,25,56) may be responsible for such residual structure in the denatured state. This would explain the lowered gain in enthalpy upon folding, as there are already existing favorable interactions in the denatured state. However, due to the presence of these native/nonnative contacts, the conformational entropy in the denatured state will also be lowered. This could explain the reduced loss in folding entropy (conformational) observed in thermophiles. This is consistent with experimental studies indicating that a reduction of the unfolded state entropy is responsible for enhanced stability (17,18,20). The existence of residual structure in the unfolded state (13,15), and the compact denatured state (19) in thermophiles provide strong evidence that a reduced entropy change upon folding is responsible for the high Tm. Wallgren et al. (19) showed that thermophilic ribosomal protein S16 has a more compact denatured state than the mesophilic homolog. Similar observations based on radius measurements were made by Liu et al. (18) in connection to their studies on Taq DNA polymerase. A comparative investigation of two thermophilic α-amylases showed a more compact unfolded state when they were denatured thermally than when they were denatured chemically. Also, the amylase with the higher thermal stability showed a more compact state than the amylase with lower Tm (20). Residual structure has been seen in thermophilic RNases H, but not in the mesophilic homolog (13). The existence of residual structure and native/nonnative contacts, and local compactness of the denatured ensemble have been addressed in several other contexts as well (24,57–64). The presence of residual structure in the denatured state would be responsible for a lowered solvent-accessible surface area compared with a fully unfolded extended state. This would consequently explain a lower in thermophiles, and is consistent with other works in the literature (11,13–15) as well. Thus, our finding of a reduction in folding entropy is not in contradiction to studies hinting at reduced , but in accordance. It appears that our calculation based on the convergence temperature reconciles previous observations by properly extracting the conformational entropy. However, it should be remembered that lowering alone, and keeping and intact, will lead to destabilization rather than stabilization of the protein (see Fig. 1).
Another possible explanation for the reduced change in folding entropy is that specific amino acid substitutions lead to reduced entropy in the unfolded state due to the different degrees of flexibility associated with them (17,21). This is also consistent with the technique of enhancing stability by reducing conformational entropy of the denatured state by adding proline residues in β turns and at other locations in proteins (65,66). Nemethy et al. (67) quantified possible changes in unfolded chain entropy from amino-acid substitution. The effect of entropy on increased stability may also arise from different degrees of compactness in the native structure as a result of different mutations (22). Substitution of amino acids could change the entropy of the folded state. Factors such as rigidity, compactness, and rotameric states in the native state also play an important role in stabilizing thermophiles, and would be related to the entropy change as well. In computational studies, rubredoxin was found to be more globally rigid with respect to temperature than its mesophilic counterpart (68), and thermophilic RNase H was shown to have less backbone flexibility at the same temperatures, and less conformational entropy over a large temperature range than its mesophilic homolog (69).
However, a more microscopic model will be needed to further investigate the quantitative contribution that arises from the unfolded and native-state entropy difference. Based on our finding at the convergence temperature, the lowered change in entropy and enthalpy is in accordance with several experimental studies that point to residual structure and reduced specific heat change (11,13–15,18–20). Our finding does not contradict previous studies; rather, it reconciles all of them. However, reduced enthalpy and specific heat upon folding have a destabilizing effect that is overcompensated for by the reduced loss in entropy imparting higher stability in thermophiles. Thus, we conclude that the key factor behind increased thermal tolerance is the reduction of folding entropy.
In addition, our study, which is based on a homologous series, emphasizes the role of entropy in thermal stabilization of proteins. As expected, as a result of our stringent comparison of homolog families, the general trend is clearer (see Fig. 3). Moreover, the quantitative agreement between experimental thermal growth data and our proteome modeling based on ideal-mesophilic-protein and ideal-thermophilic-protein parameters provides additional support for our approach and findings. Growth rates computed using Eq. 11, along with protein thermodynamic data, capture the optimum growth temperature as well as the asymmetric temperature-dependent growth curve across many thermophilic and mesophilic organisms. Furthermore, it should be noted that we can only use mesophilic (thermophilic) ideal protein parameters to fit mesophilic (thermophilic) organisms. We obtain unphysical parameter values and a poor fit when we use mesophilic protein parameters to fit thermophilic-organism growth data, and vice versa. Thus, the extreme sensitivity of the thermodynamic parameters to model growth data, and successful modeling only upon proper selection of protein thermodynamic parameters, suggest that engineering protein stability is one of the key factors used by organisms to adopt to these temperatures. It should also be noted that application of the growth rate (Eq. 9) to model fitness may not always be accurate and depends on the nutrient condition (70,71).
Summary
In the thermodynamic analysis presented here, we attempted to elucidate the origin of enhanced stability in thermophilic proteins and proteomes. We make six key points: First, we construct and analyze thermodynamic properties of the largest set of proteins (n = 116) achieved to date for which full thermal data are available. Second, we calculate entropy and enthalpy changes at the convergence temperature to maximally decouple the effect of different driving forces, in contrast to previous attempts. Third, based on these results, we find that, on average, thermophilic proteins have less change in specific enthalpy and entropy upon folding. However, lower enthalpic gain () or reduced has a destabilizing effect that is compensated for by reduced entropic loss, which is ultimately responsible for the high Tm. We also find the temperature of maximal stability (Ts) to be similar among the two classes, although the gain in folding free energy (per amino acid) at the maximal stability temperature is significantly higher in thermophiles than in mesophiles, in agreement with previous studies (9,47). Fourth, our analysis allows us to directly extract conformational entropy, in contrast to previous studies, and hints at a possible role of residual/compact structure in the denatured state. Furthermore, based on the average parameters, we give equations for ideal mesophilic and ideal thermophilic protein free energy as a function of temperature that can be used in the absence of any information other than the chain length of the protein. Fifth, our analysis based on homologous protein sets reveals a similar trend. It supports the role of entropy in increased denaturation temperatures. Finally, we extend our ideal protein calculation to model the proteome free-energy distribution and predict the growth rates of several mesophilic and thermophilic organisms. We find that our model captures high optimal temperatures in thermophiles and is in excellent quantitative agreement with thermal growth data. This also hints at the possibility that altering protein thermodynamics to gain high Tm is a strategy that thermophilic organisms may have adopted to deal with high temperature.
Acknowledgments
We thank Ken Dill for many inspiring discussions. We also thank the anonymous reviewers of the manuscript for several constructive suggestions, and Christopher Smith for performing a preliminary search of proteins as part of his honors thesis.
K.G. received support from the Faculty Research Fund, University of Denver.
Supporting Material
References
- 1.Kumar S., Nussinov R. How do thermophilic proteins deal with heat? Cell. Mol. Life Sci. 2001;58:1216–1233. doi: 10.1007/PL00000935. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Razvi A., Scholtz J.M. Lessons in stability from thermophilic proteins. Protein Sci. 2006;15:1569–1578. doi: 10.1110/ps.062130306. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Jaenicke R. Do ultrastable proteins from hyperthermophiles have high or low conformational rigidity? Proc. Natl. Acad. Sci. USA. 2000;97:2962–2964. doi: 10.1073/pnas.97.7.2962. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.England J.L., Shakhnovich B.E., Shakhnovich E.I. Natural selection of more designable folds: a mechanism for thermophilic adaptation. Proc. Natl. Acad. Sci. USA. 2003;100:8727–8731. doi: 10.1073/pnas.1530713100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Berezovsky I.N., Shakhnovich E.I. Physics and evolution of thermophilic adaptation. Proc. Natl. Acad. Sci. USA. 2005;102:12742–12747. doi: 10.1073/pnas.0503890102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Ladenstein R. Heat capacity, configurational entropy, and the role of ionic interactions in protein thermostability. Biotechnol., Biotechnol. Equip. 2008;22:612–619. [Google Scholar]
- 7.Mevarech M., Frolow F., Gloss L.M. Halophilic enzymes: proteins with a grain of salt. Biophys. Chem. 2000;86:155–164. doi: 10.1016/s0301-4622(00)00126-5. [DOI] [PubMed] [Google Scholar]
- 8.Jaenicke R., Böhm G. The stability of proteins in extreme environments. Curr. Opin. Struct. Biol. 1998;8:738–748. doi: 10.1016/s0959-440x(98)80094-8. [DOI] [PubMed] [Google Scholar]
- 9.Kumar S., Tsai C.J., Nussinov R. Thermodynamic differences among homologous thermophilic and mesophilic proteins. Biochemistry. 2001;40:14152–14165. doi: 10.1021/bi0106383. [DOI] [PubMed] [Google Scholar]
- 10.Singleton R., Jr., Amelunxen R.E. Proteins from thermophilic microorganisms. Bacteriol. Rev. 1973;37:320–342. doi: 10.1128/br.37.3.320-342.1973. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Zhou H.X. Toward the physical basis of thermophilic proteins: linking of enriched polar interactions and reduced heat capacity of unfolding. Biophys. J. 2002;83:3126–3133. doi: 10.1016/S0006-3495(02)75316-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Razvi A., Scholtz J.M. A thermodynamic comparison of HPr proteins from extremophilic organisms. Biochemistry. 2006;45:4084–4092. doi: 10.1021/bi060038+. [DOI] [PubMed] [Google Scholar]
- 13.Robic S., Guzman-Casado M., Marqusee S. Role of residual structure in the unfolded state of a thermophilic protein. Proc. Natl. Acad. Sci. USA. 2003;100:11345–11349. doi: 10.1073/pnas.1635051100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Ratcliff K., Corn J., Marqusee S. Structure, stability, and folding of ribonuclease H1 from the moderately thermophilic Chlorobium tepidum: comparison with thermophilic and mesophilic homologues. Biochemistry. 2009;48:5890–5898. doi: 10.1021/bi900305p. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Fu H., Grimsley G., Pace C.N. Increasing protein stability: importance of ΔC(p) and the denatured state. Protein Sci. 2010;19:1044–1052. doi: 10.1002/pro.381. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Graziano G. Is there a relationship between protein thermal stability and the denaturation heat capacity change? J. Therm. Anal. Calorim. 2008;93:429–438. [Google Scholar]
- 17.Matthews B.W., Nicholson H., Becktel W.J. Enhanced protein thermostability from site-directed mutations that decrease the entropy of unfolding. Proc. Natl. Acad. Sci. USA. 1987;84:6663–6667. doi: 10.1073/pnas.84.19.6663. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Liu C., Yang Y., Licata V. Thermodynamic and structural origins for the extreme stability of Taq DNA polymerase. Biophys. J. 2009;96:330a. [Google Scholar]
- 19.Wallgren M., Adén J., Wolf-Watz M. Extreme temperature tolerance of a hyperthermophilic protein coupled to residual structure in the unfolded state. J. Mol. Biol. 2008;379:845–858. doi: 10.1016/j.jmb.2008.04.007. [DOI] [PubMed] [Google Scholar]
- 20.Fitter J., Haber-Pohlmeier S. Structural stability and unfolding properties of thermostable bacterial α-amylases: a comparative study of homologous enzymes. Biochemistry. 2004;43:9589–9599. doi: 10.1021/bi0493362. [DOI] [PubMed] [Google Scholar]
- 21.Scott K.A., Alonso D.O., Daggett V. Conformational entropy of alanine versus glycine in protein denatured states. Proc. Natl. Acad. Sci. USA. 2007;104:2661–2666. doi: 10.1073/pnas.0611182104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Berezovsky I., Chen W., Choi P., Shakhnovich E. Entropic stabilization of proteins and its proteomic consequences. PLOS Comput. Biol. 2005;1:e47. doi: 10.1371/journal.pcbi.0010047. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Elcock A.H. The stability of salt bridges at high temperatures: implications for hyperthermophilic proteins. J. Mol. Biol. 1998;284:489–502. doi: 10.1006/jmbi.1998.2159. [DOI] [PubMed] [Google Scholar]
- 24.Cho J.-H., Sato S., Raleigh D.P. Thermodynamics and kinetics of non-native interactions in protein folding: a single point mutant significantly stabilizes the N-terminal domain of L9 by modulating non-native interactions in the denatured state. J. Mol. Biol. 2004;338:827–837. doi: 10.1016/j.jmb.2004.02.073. [DOI] [PubMed] [Google Scholar]
- 25.Ge M., Xia X.-Y., Pan X.-M. Salt bridges in the hyperthermophilic protein Ssh10b are resilient to temperature increases. J. Biol. Chem. 2008;283:31690–31696. doi: 10.1074/jbc.M805750200. [DOI] [PubMed] [Google Scholar]
- 26.Kumar S., Tsai C.J., Nussinov R. Factors enhancing protein thermostability. Protein Eng. 2000;13:179–191. doi: 10.1093/protein/13.3.179. [DOI] [PubMed] [Google Scholar]
- 27.Kumar S., Ma B., Tsai C., Nussinov R. Electrostatic strengths of salt bridges in thermophilic and mesophilic glutamate dehydrogenase monomers. Proteins. 2000;38:368–383. doi: 10.1002/(sici)1097-0134(20000301)38:4<368::aid-prot3>3.0.co;2-r. [DOI] [PubMed] [Google Scholar]
- 28.Dominy B.N., Perl D., Brooks C.L., 3rd The effects of ionic strength on protein stability: the cold shock protein family. J. Mol. Biol. 2002;319:541–554. doi: 10.1016/S0022-2836(02)00259-0. [DOI] [PubMed] [Google Scholar]
- 29.Dominy B.N., Minoux H., Brooks C.L., 3rd An electrostatic basis for the stability of thermophilic proteins. Proteins. 2004;57:128–141. doi: 10.1002/prot.20190. [DOI] [PubMed] [Google Scholar]
- 30.Lee C.F., Allen M.D., Wong K.B. Electrostatic interactions contribute to reduced heat capacity change of unfolding in a thermophilic ribosomal protein l30e. J. Mol. Biol. 2005;348:419–431. doi: 10.1016/j.jmb.2005.02.052. [DOI] [PubMed] [Google Scholar]
- 31.Rosato V., Pucello N., Giuliano G. Evidence for cysteine clustering in thermophilic proteomes. Trends Genet. 2002;18:278–281. doi: 10.1016/S0168-9525(02)02691-4. [DOI] [PubMed] [Google Scholar]
- 32.Chakravarty S., Varadarajan R. Elucidation of factors responsible for enhanced thermal stability of proteins: a structural genomics based study. Biochemistry. 2002;41:8152–8161. doi: 10.1021/bi025523t. [DOI] [PubMed] [Google Scholar]
- 33.Das R., Gerstein M. The stability of thermophilic proteins: a study based on comprehensive genome comparison. Funct. Integr. Genomics. 2000;1:76–88. doi: 10.1007/s101420000003. [DOI] [PubMed] [Google Scholar]
- 34.Saelensminde G., Halskau O., Jr., Jonassen I. Amino acid contacts in proteins adapted to different temperatures: hydrophobic interactions and surface charges play a key role. Extremophiles. 2009;13:11–20. doi: 10.1007/s00792-008-0192-4. [DOI] [PubMed] [Google Scholar]
- 35.Bastolla U., Demetrius L. Stability constraints and protein evolution: the role of chain length, composition and disulfide bonds. Protein Eng. Des. Sel. 2005;18:405–415. doi: 10.1093/protein/gzi045. [DOI] [PubMed] [Google Scholar]
- 36.Gu J., Hilser V.J. Sequence-based analysis of protein energy landscapes reveals nonuniform thermal adaptation within the proteome. Mol. Biol. Evol. 2009;26:2217–2227. doi: 10.1093/molbev/msp140. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Robertson A.D., Murphy K.P. Protein structure and the energetics of protein stability. Chem. Rev. 1997;97:1251–1268. doi: 10.1021/cr960383c. [DOI] [PubMed] [Google Scholar]
- 38.Ghosh K., Dill K.A. Computing protein stabilities from their chain lengths. Proc. Natl. Acad. Sci. USA. 2009;106:10649–10654. doi: 10.1073/pnas.0903995106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Privalov P.L., Khechinashvili N.N. A thermodynamic approach to the problem of stabilization of globular protein structure: a calorimetric study. J. Mol. Biol. 1974;86:665–684. doi: 10.1016/0022-2836(74)90188-0. [DOI] [PubMed] [Google Scholar]
- 40.Privalov P.L. Stability of proteins: small globular proteins. Adv. Protein Chem. 1979;33:167–241. doi: 10.1016/s0065-3233(08)60460-x. [DOI] [PubMed] [Google Scholar]
- 41.Baldwin R.L. Temperature dependence of the hydrophobic interaction in protein folding. Proc. Natl. Acad. Sci. USA. 1986;83:8069–8072. doi: 10.1073/pnas.83.21.8069. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Doig A.J., Williams D.H. Why water-soluble, compact, globular proteins have similar specific enthalpies of unfolding at 110 degrees C. Biochemistry. 1992;31:9371–9375. doi: 10.1021/bi00154a007. [DOI] [PubMed] [Google Scholar]
- 43.Murphy K.P., Privalov P.L., Gill S.J. Common features of protein unfolding and dissolution of hydrophobic compounds. Science. 1990;247:559–561. doi: 10.1126/science.2300815. [DOI] [PubMed] [Google Scholar]
- 44.Fu L., Freire E. On the origin of the enthalpy and entropy convergence temperatures in protein folding. Proc. Natl. Acad. Sci. USA. 1992;89:9335–9338. doi: 10.1073/pnas.89.19.9335. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Murphy K.P., Gill S.J. Solid model compounds and the thermodynamics of protein unfolding. J. Mol. Biol. 1991;222:699–709. doi: 10.1016/0022-2836(91)90506-2. [DOI] [PubMed] [Google Scholar]
- 46.Hollien J., Marqusee S. Structural distribution of stability in a thermophilic enzyme. Proc. Natl. Acad. Sci. USA. 1999;96:13674–13678. doi: 10.1073/pnas.96.24.13674. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Kumar S., Nussinov R. Experiment-guided thermodynamic simulations on reversible two-state proteins: implications for protein thermostability. Biophys. Chem. 2004;111:235–246. doi: 10.1016/j.bpc.2004.06.005. [DOI] [PubMed] [Google Scholar]
- 48.Kumar S., Tsai C.J., Nussinov R. Temperature range of thermodynamic stability for the native state of reversible two-state proteins. Biochemistry. 2003;42:4864–4873. doi: 10.1021/bi027184+. [DOI] [PubMed] [Google Scholar]
- 49.Shiraki K., Nishikori S., Imanaka T. Comparative analyses of the conformational stability of a hyperthermophilic protein and its mesophilic counterpart. Eur. J. Biochem. 2001;268:4144–4150. doi: 10.1046/j.1432-1327.2001.02324.x. [DOI] [PubMed] [Google Scholar]
- 50.Chen C.-H., Roth L., Berns D. Thermodynamics elucidation of the structural stability of a thermophilic protein. Biophys. Chem. 1994;50:313–321. doi: 10.1016/0301-4622(96)00013-0. [DOI] [PubMed] [Google Scholar]
- 51.Ghosh K., Dill K. Cellular proteomes have broad distributions of protein stability. Biophys. J. 2010;99:3996–4002. doi: 10.1016/j.bpj.2010.10.036. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Zhang J. Protein-length distributions for the three domains of life. Trends Genet. 2000;16:107–109. doi: 10.1016/s0168-9525(99)01922-8. [DOI] [PubMed] [Google Scholar]
- 53.Chen P., Shakhnovich E.I. Thermal adaptation of viruses and bacteria. Biophys. J. 2010;98:1109–1118. doi: 10.1016/j.bpj.2009.11.048. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Ratkowsky D.A., Olley J., Ross T. Unifying temperature effects on the growth rate of bacteria and the stability of globular proteins. J. Theor. Biol. 2005;233:351–362. doi: 10.1016/j.jtbi.2004.10.016. [DOI] [PubMed] [Google Scholar]
- 55.Clarke J., Hounslow A.M., Daggett V. The effects of disulfide bonds on the denatured state of barnase. Protein Sci. 2000;9:2394–2404. doi: 10.1110/ps.9.12.2394. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Pace C.N., Alston R.W., Shaw K.L. Charge-charge interactions influence the denatured state ensemble and contribute to protein stability. Protein Sci. 2000;9:1395–1398. doi: 10.1110/ps.9.7.1395. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Shortle D., Chan H.S., Dill K.A. Modeling the effects of mutations on the denatured states of proteins. Protein Sci. 1992;1:201–215. doi: 10.1002/pro.5560010202. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Dill K.A., Shortle D. Denatured states of proteins. Annu. Rev. Biochem. 1991;60:795–825. doi: 10.1146/annurev.bi.60.070191.004051. [DOI] [PubMed] [Google Scholar]
- 59.Miyazawa S., Jernigan R.L. Protein stability for single substitution mutants and the extent of local compactness in the denatured state. Protein Eng. 1994;7:1209–1220. doi: 10.1093/protein/7.10.1209. [DOI] [PubMed] [Google Scholar]
- 60.Shortle D. The denatured state (the other half of the folding equation) and its role in protein stability. FASEB J. 1996;10:27–34. doi: 10.1096/fasebj.10.1.8566543. [DOI] [PubMed] [Google Scholar]
- 61.Shortle D., Ackerman M.S. Persistence of native-like topology in a denatured protein in 8 M urea. Science. 2001;293:487–489. doi: 10.1126/science.1060438. [DOI] [PubMed] [Google Scholar]
- 62.Lindorff-Larsen K., Kristjansdottir S., Vendruscolo M. Determination of an ensemble of structures representing the denatured state of the bovine acyl-coenzyme a binding protein. J. Am. Chem. Soc. 2004;126:3291–3299. doi: 10.1021/ja039250g. [DOI] [PubMed] [Google Scholar]
- 63.Bowler B.E. Thermodynamics of protein denatured states. Mol. Biosyst. 2007;3:88–99. doi: 10.1039/b611895j. [DOI] [PubMed] [Google Scholar]
- 64.Nick Pace C., Huyghues-Despointes B.M., Grimsley G.R. Urea denatured state ensembles contain extensive secondary structure that is increased in hydrophobic proteins. Protein Sci. 2010;19:929–943. doi: 10.1002/pro.370. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Watanabe K., Suzuki Y. Protein thermostabilization by proline substitutions. J. Mol. Catal. B. Enzym. 1998;4:167–180. [Google Scholar]
- 66.Trevino S.R., Schaefer S., Pace C.N. Increasing protein conformational stability by optimizing beta-turn sequence. J. Mol. Biol. 2007;373:211–218. doi: 10.1016/j.jmb.2007.07.061. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Nemethy G., Leach S., Scheraga H. The influence of amino acid side chains on the free energy of helix-coil transitions. J. Phys. Chem. 1966;70:998–1004. [Google Scholar]
- 68.Rader A.J. Thermostability in rubredoxin and its relationship to mechanical rigidity. Phys. Biol. 2009;7:16002. doi: 10.1088/1478-3975/7/1/016002. [DOI] [PubMed] [Google Scholar]
- 69.Livesay D.R., Jacobs D.J. Conserved quantitative stability/flexibility relationships (QSFR) in an orthologous RNase H pair. Proteins. 2006;62:130–143. doi: 10.1002/prot.20745. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Dietz K. Darwinian fitness, evolutionary entropy and directionality theory. Bioessays. 2005;27:1097–1101. doi: 10.1002/bies.20317. [DOI] [PubMed] [Google Scholar]
- 71.Demetrius L., Legendre S., Harremöes P. Evolutionary entropy: a predictor of body size, metabolic rate and maximal life span. Bull. Math. Biol. 2009;71:800–818. doi: 10.1007/s11538-008-9382-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Benson D., Karsch-Mizrachi I., Lipman D., Ostell J., Wheeler D. GenBank. Nucleic Acids Res. 2005;31:23–27. doi: 10.1093/nar/gkg057. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.