Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 1999 Oct 26;96(22):12482–12487. doi: 10.1073/pnas.96.22.12482

Folding protein models with a simple hydrophobic energy function: The fundamental importance of monomer inside/outside segregation

Antônio F Pereira de Araújo 1,
PMCID: PMC22956  PMID: 10535948

Abstract

The present study explores a “hydrophobic” energy function for folding simulations of the protein lattice model. The contribution of each monomer to conformational energy is the product of its “hydrophobicity” and the number of contacts it makes, i.e., E(h⃗, c⃗) = −Σi=1N cihi = −(h⃗.c⃗) is the negative scalar product between two vectors in N-dimensional cartesian space: h⃗ = (h1, … , hN), which represents monomer hydrophobicities and is sequence-dependent; and c⃗ = (c1, … , cN), which represents the number of contacts made by each monomer and is conformation-dependent. A simple theoretical analysis shows that restrictions are imposed concomitantly on both sequences and native structures if the stability criterion for protein-like behavior is to be satisfied. Given a conformation with vector c⃗, the best sequence is a vector h⃗ on the direction upon which the projection of c⃗ − c̄⃗ is maximal, where c̄⃗ is the diagonal vector with components equal to c̄, the average number of contacts per monomer in the unfolded state. Best native conformations are suggested to be not maximally compact, as assumed in many studies, but the ones with largest variance of contacts among its monomers, i.e., with monomers tending to occupy completely buried or completely exposed positions. This inside/outside segregation is reflected on an apolar/polar distribution on the corresponding sequence. Monte Carlo simulations in two dimensions corroborate this general scheme. Sequences targeted to conformations with large contact variances folded cooperatively with thermodynamics of a two-state transition. Sequences targeted to maximally compact conformations, which have lower contact variance, were either found to have degenerate ground state or to fold with much lower cooperativity.


Current theories of macromolecular folding predict that an important factor determining “protein-like” behavior is that the native structure corresponds to a sufficiently deep global minimum in the folding energy landscape (14). From an operational perspective, the simplest mathematical expression for this idea is the requirement of a large negative “Z-score” that, in analogy to elementary statistics, represents the difference between the energy of the native structure and the average energy of unfolded conformations in terms of the corresponding standard deviation (57). If sequences of N monomers are represented by an ordered set of N letters, {ai} = {a1, … , aN}, where ai stands for the monomer type at position i along the sequence, and conformations are represented by the set of N vectors, {Inline graphic} = {Inline graphic, … , Inline graphic}, where Inline graphic = (xi, yi, zi) are the coordinates of monomer i, then a given sequence {ai}* will be likely to fold to a given conformation {Inline graphic}* if {ai}* and {Inline graphic}* correspond to a large negative Z-score:

graphic file with name M7.gif 1

where E({ai}*, {Inline graphic}*) is the energy of the given sequence in the given conformation and Ē and σE are the average and standard deviation of the energy of the given sequence distributed among the unfolded state ensemble.

The simplifying assumption of statistical independence of energies among different conformations results in the so-called Random Energy Model (REM) (8, 9), which was originally proposed for spin glasses (10) and shown to be a reasonable approximation for protein lattice models (1115) (for recent theoretical reviews, including the REM, see refs. 4, 16, and 17). The REM’s most important prediction is that thermodynamic properties of the unfolded state, which is dominated by high energy conformations, do not depend on sequence but only on monomer composition. The energy of the native structure, on the other hand, which is by definition the lowest energy conformation, is sequence dependent. If the energy function is defined, E({ai}*, {Inline graphic}*) can be computed directly, while Ē and σE can be easily estimated from the composition of {ai}*. Protein-like behavior arises when Z is much larger, in absolute value, than Zc:

graphic file with name M10.gif 2

where

graphic file with name M11.gif 3

is the Z-score corresponding to the critical energy Ec, the energy below which thermodynamics become sequence dependent. Zc is another sequence-independent quantity, but the crucial value of S0, the entropy of the unfolded state, may not be easy to estimate a priori.

The above scheme is very general, as it does not make any assumptions about how different conformations are represented or about the form of the energy function. Many theoretical studies and lattice model simulations have used an energy function of the form

graphic file with name M12.gif 4

where Δ(i, j) is 1 if monomers i and j form a contact and 0 otherwise. In the lattice model, i and j are said to form a contact if Inline graphic and Inline graphic are adjacent lattice sites and |ji| > 1. Baiaj is taken from a symmetric L × L matrix of energy interactions, B, containing the energetic contribution of all possible pairwise interactions. L is the number of available monomer types, or “letters,” and has ranged from 2, as it does in the HP model, to N, as it does in the independent interaction model (reviewed in ref. 17).

Once B is chosen it is possible to use Eqs. 1 and 4 to produce sequences with good Z-scores in a particular conformation. The stability criterion can be easily satisfied for arbitrarily chosen maximally compact conformations if the number of letters is large enough. Many studies have used sequences selected to fold to maximally compact conformations in the cubic lattice using an alphabet of 20 monomer types (reviewed in ref. 18). Sequences have been selected either by direct Z-score minimization or by energy minimization with fixed monomer composition, which is also equivalent to minimization of Eq. 1 because in this last case Ē and σE are constants. Well selected sequences of up to at least 80 monomers (3) fold cooperatively in Monte Carlo simulations to the target structure starting from any randomly generated initial conformation and the target structure is thermodynamically stable at a temperature where folding is fast (18). This clear protein-like behavior indicates that Eq. 2 is indeed a valid selection criterion.

The resulting scheme permits some objections, however (see, for example, ref. 19). First, although real proteins are made up of 20 monomer types, it is well known that amino acids can be grouped in a much smaller number of classes sharing similar physical chemical properties, such as aliphatics, aromatics, polar and charged, for example, or, even more simply, hydrophobics and hydrophilics. This suggests that fundamental properties of the folding process could be encoded by a smaller, physically meaningful, alphabet. Second, the choice of the target conformation, which plays the role of the native structure in the model, appears to be completely arbitrary, as far as it is maximally compact. The model does not have anything to say about the fact that native structures of globular proteins tend to fall into a relatively small number of folds.

This study combines the general stability criterion for sequence selection with a simple and physically intuitive energy function for the protein lattice model. Energetic contributions for possible contacts are taken from an interaction matrix with elements of the form

graphic file with name M15.gif 5

where hi = h(ai) is the “hydrophobicity” of the monomer type ai. Hydrophobicities can be either positive or negative. The additive form of the B matrix elements shown in Eq. 5 implies that Eq. 4 can be expressed simply as (1, 5, 20)

graphic file with name M16.gif 6

where h⃗.c⃗ is the scalar product between two vectors in N-dimensional cartesian space: the hydrophobicity vector, h⃗ = (h1, … , hN), which represents monomer hydrophobicities and depends only on the sequence, and the contact vector, c⃗ = (c1, … , cN), which represents the number of contacts made by each monomer and is therefore dependent on the conformation. Note that the function is “unspecific” in the sense that the energetic contribution of a given monomer involved in a contact does not depend on its contact partner.

Results

Theoretical Analysis.

To use Eqs. 1 and 6 to compute Z-scores, it is necessary to estimate values for Ē and σE, the average and standard deviation of conformational energies of a given sequence over the ensemble of unfolded conformations. If there are M unfolded conformations, and unfolded conformation j corresponds to contact vector, c⃗j = (c1j, … , cNj), then

graphic file with name M17.gif 7

and

graphic file with name M18.gif 8

where (c̄)i and (σc2)i are the average and variance taken over unfolded conformations of contacts made by monomer i. Eq. 1 becomes

graphic file with name M19.gif 9

where (σc2)i = σc2 was assumed to be independent of i. Eq. 9 has a clear and important meaning. The sequence, h⃗, not only must be compatible with the target structure, c⃗, but it also must be not compatible with the average unfolded state, c̄⃗. This is best achieved when h⃗ is most compatible with c⃗ − c̄⃗, in the sense that the projection of c⃗ − c̄⃗ on the h⃗ direction is maximal. Note that the above expression does not depend on the length of h⃗. Multiplication of all hydrophobicities by a constant is equivalent to a simple change of units of the energy scale and does not affect the Z-score.

The most negative Z-score for a given conformation, Zbest, would correspond to the sequence h⃗best that minimizes Eq. 9. Direct optimization of this equation is problematic, however, as it involves quantities related to the unfolded state, which is not known a priori and can depend itself on the composition of h⃗ in a nontrivial way. A simple and useful conformation-dependent upper estimate for Zbest can nevertheless be obtained if the sequence for a given conformation is taken as

graphic file with name M20.gif 10

where c⃗up is the diagonal vector with all components equal to C/N with C = Σi=1N ci, and if it is assumed that c̄⃗ is diagonal. This assumption will be appropriate if any possible dependence on sequence position of the average number of contacts made by monomers in the unfolded state (which could arise, for example, from a dependence on individual monomer hydrophobicities) does not deviate c̄⃗ significantly from the direction determined by vector 1⃗ = (1, … , 1). As h⃗up is perpendicular to 1⃗ (i.e., h⃗up.1⃗ = CC = 0), the projection of c⃗ − c̄⃗ on h⃗up = c⃗ − c⃗up must be equal to this last vector (i.e., (c⃗ − c̄⃗).(c⃗ − c⃗up) = (c⃗ − c⃗up)2 for any diagonal vector c̄⃗) and the corresponding Z-score, as given by Eq. 9, is independent of the diagonal average unfolded state, c̄⃗.

graphic file with name M21.gif 11

where σ = Inline graphic is the standard deviation of the number of contacts made by monomers in the native conformation. The scaling with Inline graphic is expected, and it also occurs for Zc, Eq. 3, since the entropy scales with N, i.e., S0 = NS0, where s0 is the entropy per monomer.

Note that the upper limit Zup will actually be equal to Zbest if c̄⃗, for h⃗ = h⃗up, happens to be equal to c⃗up (i.e., if c̄, the average number of contacts made by each monomer in the unfolded state, is C/N). In this hypothetical situation, the sequence taken as in Eq. 10 acquires a very intuitive physical meaning. If a given monomer makes fewer contacts in the target structure than in the unfolded state, then it should be hydrophilic to destabilize the unfolded state more than the target conformation. Conversely, if it makes more contacts in the target conformation than in the unfolded state, then it should be hydrophobic to stabilize the target conformation more than the unfolded state.

For the more general case, where c̄ is not necessarily equal to C/N, Eq. 11 shows that for a given conformation with contact standard deviation σ, its best possible Z-score is at least as negative as Zup. For a given N, structures with abnormally high standard deviations of the number of contacts made by its monomers will therefore correspond to very negative Z-scores. In other words, conformations with monomers “maximally segregated” between completely buried and completely exposed environments are likely to be the most appropriate native structures for the hydrophobic energy function. This inside (more contacts than average)/outside (fewer contacts than average) segregation of monomers in the native conformation is reflected, because of Eq. 10, on the apolar (positive hydrophobicity)/polar (negative hydrophobicity) distribution on the corresponding sequence.

MC Simulations.

Monte Carlo simulations of a two-dimensional square lattice protein model were performed as a simple test for the presently proposed principle of maximal inside/outside segregation for good native structures. Chains of 24 beads were chosen for this initial study because they can be accommodated inside a region of the lattice with the interesting property of having 12 internal and 12 external sites. Conformation conf1 (Fig. 1 Upper Left) was generated to fit inside this region. If it were not for chain ends, each bead accommodated on an internal site would make the maximal number of two contacts, while the ones on external sites would make no contacts at all. This would result in an average of 1 contact per monomer with a corresponding standard deviation also of 1. Chain end effects are significant, however, at least for chains of this size. The average and standard deviation of contacts per monomer of conformation conf1 is c̄ = 1.08 and σ = 0.95 (Table 1), respectively. An unexpected rearrangement of conf1 is conf2 (Fig. 1 Upper Right), which is less compact (12 contacts instead of 13) and does not fit inside the original region. As it has one of the chain ends making three contacts, however, its contact standard deviation increases to σ = 1, while the average decreases to c̄ = 1. Ideal hydrophobicity vectors for these two conformations (sequences seq1 and seq2) were generated according to Eq. 10, with the value of c̄ taken as unity (Table 1).

Figure 1.

Figure 1

Two-dimensional conformations discussed in the present study: conf1, Upper Left; conf2, Upper Right; conf3, Lower Left; and conf4, Lower Right. They are all in a regular square lattice and are 24 beads long.

Table 1.

Conformations and corresponding ideal sequences represented by contact and hydrophobicity vectors, respectively.

c⃗ (upper) and h⃗ (lower) vectors σ E Tf
conf1 1 0 2 0 2 2 2 2 2 2 0 2 0 0 2 0 2 0 0 2 0 2 0 1
seq1 0 −1 1 −1 1 1 1 1 1 1 −1 1 −1 −1 1 −1 1 −1 −1 1 −1 1 −1 0 1.08 0.95 −24 0.92
conf2 1 0 1 0 2 1 2 2 2 0 0 2 0 0 2 0 2 0 0 2 0 2 0 3
seq2 0 −1 0 −1 1 0 1 1 1 −1 −1 1 −1 −1 1 −1 1 −1 −1 1 −1 1 −1 2 1.00 1.00 −24 0.87
conf3 1 1 1 1 1 0 1 2 2 2 2 2 2 2 2 1 1 0 1 1 1 1 0 2
seq3 0 0 0 0 0 −1 0 1 1 1 1 1 1 1 1 0 0 −1 0 0 0 0 −1 1 1.25 0.66 −18
conf4 2 0 1 1 1 0 1 2 2 2 2 2 2 2 2 1 1 0 1 1 0 2 0 2
seq4 1 −1 0 0 0 −1 0 1 1 1 1 1 1 1 1 0 0 −1 0 0 −1 1 −1 1 1.25 0.77 −22 0.71

The average number of contacts per monomer, c̄, contact standard deviation, σ, energy, E = −c⃗.h⃗, and folding transition temperature determined by the heat capacity peak, Tf, are also shown. No Tf was calculated for seq3 because it does not fold to a single structure (see text). 

A short simulation of seq2 using the standard Metropolis algorithm (21) with end flips, kinks and crankshaft moves (see, e.g., ref. 22) beginning from a randomly generated initial conformation illustrates how the chain easily folds to the corresponding target conformation in an apparent cooperative two-state transition (Fig. 2 Upper). The simulation temperature is T = 0.9, which happens to be close to the folding temperature (see Table 1). Energy and number of native contacts, Q, are plotted every thousand time steps (1 time step equals N = 24 move attempts). The minimal energy is E = −24, which corresponds to the maximal number of native contacts, Q = 12. The distribution of the equilibrium conformational ensemble at the same temperature grouped by Q gives a direct confirmation that the transition is reasonably two-state (Fig. 2 Lower). The distribution was obtained from the average of 10 independent simulations, 10 million time steps each, recorded every 10,000 steps. Results were qualitatively similar for seq1, although the transition seemed slightly less cooperative and the contribution of intermediate values of Q in the equilibrium ensemble at the folding temperature was more pronounced (not shown).

Figure 2.

Figure 2

(Upper) A short Monte Carlo trajectory for seq2 at T = 0.9, which is close to its folding temperature. Energy and number of native contacts are displayed for a total time of 1 million time steps, recorded every 1,000 steps. The chain folds cooperatively to its target conformation which corresponds to the energy value of −24 and 12 native contacts. (Lower) The equilibrium ensemble for the same sequence and at the same temperature, grouped according to the number of native contacts. The histogram was constructed from the average over 10 simulations of 10 million time steps run at T = 0.9. Error bars represent the standard deviation of the mean.

Very long simulations of 100 million time steps, recorded every 10,000 steps, were also performed. At simulation temperature T = 1, the chain folded and unfolded many times and no other conformation with energy less than or equal to the target conformation was ever found. The histogram technique (14) was used to generate the thermal average of the number of native contacts (Q) and the heat capacity (CV = σE2/T2) as a function of temperature (Fig. 3). Representative averages and standard deviations of the same quantities obtained from 10 independent simulations of 10 million time steps are also shown for seq1 at some temperatures. Good agreement with the curve indicates that the long simulation used by the histogram technique actually sampled an adequate fraction of conformational space.

Figure 3.

Figure 3

Heat capacity (Upper) and average number of native contacts (Lower) as a function of temperature for seq1, seq2, and seq4. Curves were obtained from a very long simulation of 100 million time steps run at a temperature close to Tf for each sequence. Independent results obtained from simulations of 10 million time steps run at selected temperatures are shown for seq1 in the form of the average value over 10 simulations (points) and corresponding standard deviation (error bars).

Heat capacity curves for both seq1 and seq2 display the characteristic single sharp peak of cooperative transitions. The maximum of this peak was used to define the folding transition temperature Tf, which is higher for seq1 than for seq2. The transition of seq2, however, as monitored by Q, seems to be more cooperative than for seq1. Note that values of Tf determined from the CV curves agree very well, for seq1 and seq2, with the temperature at the middle of the transition as monitored by the temperature dependence of Q. This is consistent with a two-state transition and indicates that Q can be used as an order parameter. It is interesting to note that the unfolded state has a larger heat capacity than the folded state for both sequences. This result is somehow unexpected because the well known increase in heat capacity upon unfolding of real proteins is traditionally attributed to the ordering of water around exposed apolar residues (23). As there are no explicit water molecules in the present simulations, this result points out the possibility that other factors in addition to solvation can be involved in the experimentally observed heat capacity change.

Conformation conf3 (Fig. 1 Lower Left) was generated as an example of maximally compact conformation inbedded in a regular region (a 4 × 6 rectangle, in this case). It has 15 contacts but its contact standard deviation is low. Long simulations of the corresponding sequence seq3 found no conformation with energy lower than the expected −18 but showed that this value corresponded not only to conf3, but also to a completely different conformation with the same number of contacts, conf4 (Fig. 1 Lower Right). As seq3 has two different conformations with minimal energy, it cannot fold in a protein-like manner. conf3 and conf4 have only 5 out of 15 contacts in common, and their contact vectors are not identical. Note that conf4 has a slightly higher contact standard deviation than conf3 and, according to the segregation principle, would have a better chance of being the native conformation of a better chosen sequence. The sequence obtained from its contact vector, seq4, corresponds to a much lower energy, −22.

The same procedure used for seq1 and seq2 suggests that conf4 is the single global energy minimum for seq4 but the corresponding temperature dependences of CV and Q are different (Fig. 3). There is a single peak in the CV curve, but it is very broad and its maximum occurs at a higher temperature (Tf = 0.71) than the mid-point of the transition as monitored by Q, which is less than 0.6. Taken together, the results for seq4 show that its folding is much less cooperative and that the transition can hardly be considered as protein-like.

Discussion

The hydrophobic energy function, upon which this study is based, is no more than a lattice implementation of the physical assumption that conformational free energy of real proteins can be computed from solvent accessible surface areas of different types, such as aliphatic, aromatic and polar, combined with unit free energies of hydration, which can in principle be obtained from partition experiments (2427). Recent estimates for free energy changes involved in specific pairwise interactions obtained from HPLC experimental data are not inconsistent with this simple scheme (28, 29). A recent decomposition (30) of the Miyazawa and Jerningan potential (31) has also suggested that hydrophobicity can largely explain the statistical distribution of amino acid contacts in protein structures. The conjecture that the energy can be expressed as in Eq. 6 is particularly interesting because sequence and conformation appear to contribute to the energy on an equal basis. Favorable energies correspond to a large scalar product between two vectors, which can be interpreted as a “compatibility” between sequence and structure.

Sequences and structures are not equivalent, however, as sequences are simple walks in one dimension (each step being a hydrophobicity) while structures are self-avoiding walks in three dimensions. As a consequence, there is an inevitable loss of information when conformations are represented by contact vectors and some contact vectors can represent more than one conformation. It is not clear whether this is evidence that the folding potential felt by real proteins cannot be represented in such a simple form or whether it points out another real physical restriction imposed on protein native structures, as suggested before in the context of maximally compact conformations (20). Some of the intrinsic differences between sequences and conformations are nevertheless reflected in the present vector representation. For example, the set of possible contact vectors is much more restricted than the set of possible hydrophobicity vectors, as most N-dimensional vectors represent no viable conformation.

The situation is similar to the one described in ref. 20, where it was additionally proposed that the number of sequences encoding a given conformation is proportional to the volume of the Voronoi polytope around its contact vector. This prediction arose from the restriction of conformational space to maximally compact conformations and the explicit assumption that all structures had the same number of surface and core residues (20). This is equivalent to restrict all contact vectors to a single vector length. An analogous proposal for the present model, where polytopes around c⃗ are replaced by polytopes around c⃗ − c̄⃗, is not valid, however, because the length of c⃗ − c̄⃗, even for the simplest case of constant c̄⃗ (i.e., sequences with same composition), depends on σ. A good illustration is provided by seq3, which is a vector intended to be parallel to conf3 − c̄⃗. As seq3.conf3 equals seq3.conf4 it means that the projection of seq3 on conf3 − c̄⃗ equals the one on conf4 − c̄⃗, which is not parallel to seq3. This situation is possible because conf4 corresponds to a larger σ than conf3.

It was implicitly assumed in the present theoretical analysis that monomer hydrophobicities could be tuned arbitrarily to optimize the direction of hydrophobicity vectors. It is clear, however, that the set of possible directions cannot be continuous for real protein sequences, as they are constructed from a limited number of amino acids. In more detailed model, therefore, optimization of Eq. 9 should be restricted to sequences that are compatible with a limited alphabet. Note, however, that such eventual restrictions could probably be at least partly compensated for by small adjustments of the contact vector, which should be continuous in this more realistic situation. In addition, it is important to remember that the alphabet of real amino acids is in itself a product of evolution. The requirement of producing sequences with hydrophobicity vectors sufficiently close to the appropriate directions determined by evolving structures could well have been an important selective pressure.

The results from Monte Carlo simulations demonstrate the possibility of folding protein lattice models with this simple and unspecific hydrophobic energy function when the native conformation is well chosen. The specificity required for the chain to fold to a single conformation does not come from specific interactions between its monomers, but only from the specific pattern of hydrophobicities along its sequence. This result supports, therefore, the basic assumption behind the many enumeration studies performed with the two letter “exact” HP model, which is similar in spirit to the present energy function (reviewed in ref. 19). Complete enumeration of small chains has actually shown that sequences made up of only two letters, H for hydrophobic and P for polar, are capable of displaying protein-like thermodynamics. In addition, this behavior was found to be dependent on structural details of the native conformation.

Actually, it is interesting to note that the presently proposed principle of inside/outside segregation can help to rationalize some previous results observed for the HP model even though the hydrophobic energy function, from which it was derived, is not identical to the HP function. In addition to the possibility of more than two letters for the hydrophobic function, the two functions do not account for hydrophilic (polar) residues in the same manner. Polar–polar interactions are neutral in the HP model but repulsive according to the hydrophobic function, because of the unfavorable contribution from both residues involved in the contact. In this way, even for the case of an ideal native structure, with all monomers either completely buried or completely exposed, the two-letter alphabet resulting from the hydrophobic function would not be HP. According to the description of 2 × 2 matrices used in ref. 17 the limit hydrophobic function, ( 0 +1−1 0), corresponds to an “angle” of 0, while for the HP function, (0 0−1 0), the angle is −arccos(Inline graphic). A good example of the relation between folding cooperativity and native structure contact variance discussed here is, however, unknowingly provided by figure 24 of ref. 19 for the HP model. The failure of previous attempts to use the stability criterion for HP sequences (19, 32) can also be explained by the small inside/outside segregation of the target maximally compact conformations.

The possibility of a non-maximally compact global energy minimum has been avoided in many studies by making all interactions attractive (e.g., refs. 13 and 33), but this ad hoc solution is clearly not consistent with experiments. Real proteins contain hydrophilic residues which interact more favorably with the solvent than with hydrophobic moieties, and therefore their effective interactions with hydrophobic residues cannot be favorable. The present results suggest that this apparently innocent requirement of maximal compactness, in addition to not being rooted on any fundamental physical principle, can be an unnecessary source of artificial complications. Maximally compact conformations are actually poor choices for native structures in the case of square lattice models because most of the monomers make not the maximal or minimum number of contacts but something in between. It has also been noted, even for the case of real proteins, that the assumption of maximal compactness is not well justified. Globular proteins are certainly compact, but their shapes differ significantly from perfect spheres (reviewed in ref. 19). This study suggests that this relative compactness is likely to be an indirect effect of the principle of inside/outside segregation and not of fundamental importance in itself.

Finally, the generality of the theoretical analysis suggests that inside/outside segregation could well be involved in many structural properties observed in real proteins. Although these initial computational results do not permit such generalizations, it is not unreasonable to speculate that segregation can be related to the organization of long chains in independent domains, for example, or even to properties of fibrous proteins, such as the association of long helices to form coiled coils. Further studies involving a systematic exploration of conformational space of different chain sizes, in two and three dimensions, as well as a direct computation of some quantity equivalent to σ in real structures, will be required to elucidate these fundamental questions concerning the generality and applicability of the principle of inside/outside segregation in protein structures.

Acknowledgments

I am grateful to Fernando Oliveira and Georgios Pappas, Jr. for useful discussions and to Tom Pochapsky and Eugene Shakhnovich for their critical reading of the manuscript. This study was supported by the Brazilian Conselho Nacional de Pesquisa (CNPq).

Footnotes

This paper was submitted directly (Track II) to the PNAS office.

References

  • 1.Goldstein R, Luthey-Schulten Z, Wolynes P. Proc Natl Acad Sci USA. 1992;89:4918–4922. doi: 10.1073/pnas.89.11.4918. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Šali A, Shakhnovich E, Karplus M. Nature (London) 1994;369:248–251. doi: 10.1038/369248a0. [DOI] [PubMed] [Google Scholar]
  • 3.Shakhnovich E I. Phys Rev Lett. 1994;72:3907–3910. doi: 10.1103/PhysRevLett.72.3907. [DOI] [PubMed] [Google Scholar]
  • 4.Bryngelson J, Onuchic J, Socci N, Wolynes P. Proteins: Struct Funct Genet. 1995;21:167–195. doi: 10.1002/prot.340210302. [DOI] [PubMed] [Google Scholar]
  • 5.Bowie T U, Luthy R, Eisenberg D. Science. 1991;253:164–169. doi: 10.1126/science.1853201. [DOI] [PubMed] [Google Scholar]
  • 6.Gutin A, Abkevich V, Shakhnovich E. Proc Natl Acad Sci USA. 1995;92:3066–3076. doi: 10.1073/pnas.92.5.1282. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Abkevich V I, Gutin A M, Shakhnovich E I. J Mol Biol. 1995;252:460–471. doi: 10.1006/jmbi.1995.0511. [DOI] [PubMed] [Google Scholar]
  • 8.Bryngelson J, Wolynes P. Proc Natl Acad Sci USA. 1987;84:7524–7528. doi: 10.1073/pnas.84.21.7524. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Shakhnovich E, Gutin A. Biophys Chem. 1989;34:187–199. doi: 10.1016/0301-4622(89)80058-4. [DOI] [PubMed] [Google Scholar]
  • 10.Derrida B. Phys Rev B. 1981;24:2613–2626. [Google Scholar]
  • 11.Shakhnovich E, Gutin A. Proc Natl Acad Sci USA. 1993;90:7195–7199. doi: 10.1073/pnas.90.15.7195. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Abkevich V I, Gutin A M, Shakhnovich E I. J Chem Phys. 1994;101:6052–6062. [Google Scholar]
  • 13.Socci N, Onuchic J. J Chem Phys. 1994;101:1519–1528. [Google Scholar]
  • 14.Socci N, Onuchic J. J Chem Phys. 1995;103:4732–4744. [Google Scholar]
  • 15.Pereira de Araújo A F, Pochapsky T C. Folding Design. 1996;1:299–314. doi: 10.1016/S1359-0278(96)00043-0. [DOI] [PubMed] [Google Scholar]
  • 16.Onuchic J N, Luthey-Schulten Z, Wolynes P G. Annu Rev Phys Chem. 1997;48:545–600. doi: 10.1146/annurev.physchem.48.1.545. [DOI] [PubMed] [Google Scholar]
  • 17.Pande V S, Grosberg A, Tanaka T. Biophys J. 1997;73:3192–3210. doi: 10.1016/S0006-3495(97)78345-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Shakhnovich E I. Curr Opin Struct Biol. 1997;7:29–40. doi: 10.1016/s0959-440x(97)80005-x. [DOI] [PubMed] [Google Scholar]
  • 19.Dill K, Bronberg S, Yue K, Fiebig K, Yee D, Thomas P D, Chan H S. Protein Sci. 1995;4:561–602. doi: 10.1002/pro.5560040401. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Li H, Tang C, Wingreen N S. Proc Natl Acad Sci USA. 1998;95:4987–4990. doi: 10.1073/pnas.95.9.4987. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Metropolis N, Rosembluth A, Rosembluth M, Teller A. J Chem Phys. 1953;21:1087–1092. [Google Scholar]
  • 22.Betancourt M, Onuchic J. J Chem Phys. 1995;103:773–787. [Google Scholar]
  • 23.Privalov P L. In: Protein Folding. Creighton T E, editor. New York: Freeman; 1992. [Google Scholar]
  • 24.Murphy K P, Bhakuni V, Xie D, Freire E. J Mol Biol. 1992;227:293–306. doi: 10.1016/0022-2836(92)90699-k. [DOI] [PubMed] [Google Scholar]
  • 25.Makhatadze G I, Privalov P L. J Mol Biol. 1993;232:639–659. doi: 10.1006/jmbi.1993.1416. [DOI] [PubMed] [Google Scholar]
  • 26.Makhatadze G I, Privalov P L. J Mol Biol. 1993;232:660–679. doi: 10.1006/jmbi.1993.1417. [DOI] [PubMed] [Google Scholar]
  • 27.Chang H S, Dill K A. Annu Rev Biophys Biomol Struct. 1997;26:425–459. doi: 10.1146/annurev.biophys.26.1.425. [DOI] [PubMed] [Google Scholar]
  • 28.Pochapsky T C, Gopen Q. Protein Sci. 1992;1:786–795. doi: 10.1002/pro.5560010610. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Pereira de Araújo A F, Pochapsky T C, Joughin B. Biophys J. 1999;76:2319–2328. doi: 10.1016/S0006-3495(99)77389-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Li H, Tang C, Wingreen N S. Phys Rev Lett. 1997;79:765–768. [Google Scholar]
  • 31.Miyazawa S, Jerningan R. Macromolecules. 1985;18:534–552. [Google Scholar]
  • 32.Yue K, Fiebig K M, Thomas P D, Chang H S, Shakhnovich E I, Dill K A. Proc Natl Acad Sci USA. 1995;92:325–329. doi: 10.1073/pnas.92.1.325. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Li H, Helling R, Tang C, Wingreen N S. Science. 1996;273:666–669. doi: 10.1126/science.273.5275.666. [DOI] [PubMed] [Google Scholar]

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES