Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2008 Feb 27;105(9):3321–3326. doi: 10.1073/pnas.0712240105

Assessing the solvent-dependent surface area of unfolded proteins using an ensemble model

Haipeng Gong 1,*, George D Rose 1,
PMCID: PMC2265189  PMID: 18305164

Abstract

We present a physically rigorous method to calculate solvent-dependent accessible surface areas (ASAs) of amino acid residues in unfolded proteins. ASA values will be larger in a good solvent, where solute–solvent interactions dominate and promote chain extension. Conversely, they will be smaller in a poor solvent, where solute–solute interactions dominate and promote chain collapse. In the method described here, these solvent-dependent effects are modeled by Boltzmann-weighting a simulated ensemble for solvent quality—good or poor. Solvent quality is parameterized as intramolecular hydrogen bond strength, using a “hydrogen bond dial” that can be varied from “off” to “high” (i.e., from 0 to −6 kcal/mol per hydrogen bond). When plotted as a function of hydrogen bond strength, the Boltzmann-weighted distribution of conformers describes a sigmoidal curve, with a transition midpoint near 1.5 kcal/mol per hydrogen bond. ASA tables for the 20 residues are provided under good solvent conditions and at this transition midpoint. For the backbone, these midpoint ASA values are found to be in good agreement with the earlier estimate of unfolded state ASA given by the mean of Creamer's upper and lower bounds [Creamer TP, et al. (1997) Biochemistry 36:2832–2835], a gratifying result in that cosolvents of experimental interest, such as urea (good solvent) and trimethylamine N-oxide (poor solvent), are known to affect the backbone predominantly. Unanticipated results from our simulations predict that a significant population of three-residue, hydrogen-bonded turns (inverse γ-turns) will be detectable in blocked polyalanyl heptamers in poor solvent—an experimentally verifiable conjecture.

Keywords: γ-turns, hydrogen bonding, protein folding, unfolded state


Globular proteins fold to uniquely ordered, biologically relevant conformers under physiological conditions, but they unfold at high temperature, high pressure, extremes of pH, or in the presence of denaturing solvents. The transition between these states, N(ative) ⇌ U(nfolded), is spontaneous, cooperative, and reversible (1). With some exceptions (2), the N state is the biologically relevant form, and it is amenable to structural characterization (3). In contrast, the U state can only be described by a statistical model (46) and characterized in terms of averaged dimensions, for example, its mean radius of gyration (〈Rg〉) or its mean-squared end-to-end distance (〈L2〉) (7).

In addition to these classical measures, a reliable estimate of accessible surface area (ASA) (8) in the unfolded state has become a pressing need of late, motivated particularly by recent work that requires it (9, 10). Calculation of native-state ASA is straightforward in proteins of known structure (1113), but corresponding values in the unfolded state are model-dependent. Creamer et al. (14, 15) sought to estimate the ASA of residues in the unfolded state by bracketing it between two reliable extremes, an upper limit modeled on an extended polypeptide in good solvent and a lower limit extracted from chain segments in proteins of known structure. As a practical measure, Schellman (16) took the mean of these two extremes, and Auton and Bolen (9) followed suit. However, simply using the numerical average is unsatisfying because it lacks a rigorous physical basis. In an innovative approach, Goldenberg (17) simulated populations of protein-length chains by adapting a standard software package that had been developed originally to generate three-dimensional models from NMR-derived distance constraints. But this approach is not entirely satisfying either because a disproportionately large fraction of the residues fall within sterically restricted regions of conformational space (see table 1 in ref. 17).

Table 1.

Size and mean radius of gyration for generated ensembles

Length*
2 3 4 5 6 7
No. of conformers 13,000 24,000 60,000 192,000 998,000 7,333,000
Calculated 〈Rg 3.013 3.727 4.418 5.084 5.722 6.343
Predicted 〈Rg§ 3.031 3.866 4.595 5.253 5.860 6.428

*Ensembles were generated for Ac-(Ala)n-Nme peptides, with n ranging from 2 to 7.

The number of generated conformers in the ensemble.

The calculated mean radius of gyration, 〈Rg〉, for the ensemble.

§The value of 〈Rg〉 predicted for a statistical coil of corresponding in good solvent from Eq. 1, using a prefactor of R0 = 2.0 and an exponent of ν = 0.6.

Here, we describe a physically based method to calculate both backbone and side-chain residue surface areas. The desired data are extracted from an ensemble of clash-free, hydrogen bond-satisfied peptides, generated by randomly varying backbone dihedral angles (including the ω-angle) within sterically allowed regions of conformational space (φ,ψ-regions). Hydrogen bond partners for polar groups in members of this ensemble are provided by either the backbone or water. The size of the ensemble needed to achieve statistical significance is determined by stringent convergence criteria, as described. Resulting radii of gyration and end-to-end distances of this ensemble are shown to be Gaussian-distributed, with averages as expected for randomly coiled peptides of corresponding length (4).

By using this model, ASA values can be scaled to reflect solvent quality—good or poor—by Boltzmann-weighting the ensemble according to hydrogen bond strength. Results from these simulations predict that a significant population of inverse γ-turns (C7-equatorial) will be found in blocked polyalanyl heptamers in poor solvent, an experimentally verifiable conjecture.

Results

Description of Simulations.

Representative ensembles of clash-free, hydrogen bond-satisfied conformers were generated for successive blocked polyalanyl peptides, ranging in length from monomers to seven-mers, by Monte Carlo fragment assembly using a nine-state model. Hydrogen bond satisfaction for backbone polar groups is provided by either intramolecular partners or solvent. At each peptide length, the backbone accessible surface area (ASA) was calculated, an additional peptide unit was added, and a new round of simulation was initiated. This cycle was repeated until the ASA reached a plateau, which occurred by n = 7, as shown in Fig. 1. Perhaps surprisingly, a blocked seven-mer is sufficient to assess unfolded-state ASA.

Fig. 1.

Fig. 1.

Ensemble-based backbone ASA. The average backbone accessible surface area, 〈ASA〉, of the central residue in peptide ensembles is plotted as a function of chain length. Peptides are of the form Ac-(Ala)n-Nme, with n ranging from monomers to seven-mers and Nme being N-methylamide. When n is an even number, 〈ASA〉 is averaged over the two central residues. Every conformer in each ensemble is clash-free, with backbone polar groups that form isoenergetic hydrogen bonds, either to other backbone partners or to solvent. From the plot, it is apparent that backbone 〈ASA〉 attains a plateau by n = 7.

Conformers of desired length were generated using our previously described fragment-based method (18, 19), although several improvements to the earlier work are introduced here. In essence, conformation space is divided into mesostates, each of which corresponds to a 70° × 70° grid square on the φ,ψ-map (Fig. 2a). These nine states cover >90% of the high-resolution (≤2 Å, R ≤ 0.25) nonredundant (20) coil library (21), a repository of conformers other than α-helices and β-strands excised from proteins of known structure (20) and taken as a model for unfolded-state conformers (22, 23). The remaining 10% of conformers not covered by these nine mesostates are on the right-hand side of the φ,ψ-map (i.e., φ > 0°), and 77% are from glycine residues.

Fig. 2.

Fig. 2.

Nine-state model. (a) Major populated regions of the φ,ψ-map are subdivided into nine, 70° × 70° grid squares (labeled A–I), termed mesostates. Approximately 90% of the coil library (21) is included within these nine regions; 77% of the remaining excluded residues are contributed by glycines that have φ > 0. (b) A scatter plot of all φ,ψ-angles from simulations using the nine-state model for n = 7. Every φ,ψ-pair visited during the simulation is marked by a point on the map, each from a clash-free, hydrogen bond-satisfied peptide conformer. Sampled regions closely resemble both the classical Ramachandran diagram (55) and the nine-state model.

Clash-free, hydrogen bond-satisfied (by either backbone or water) conformers were generated at random from fragments in a fragment library. This library of six-residue fragments was constructed from a computational procedure, not from a structural database, using a method described in ref. 24. A six-residue fragment length was chosen because systematic local steric clash can span six consecutive residues in polypeptides (2426). The fragment-assembly algorithm was devised to generate conformers of any desired length from an unbiased fusion of library fragments. Each conformer was then mapped into its corresponding mesostate sequence, a many:one mapping. Consequently, the distribution of companion mesostate sequences mirrors the underlying distribution of torsion angles in the fragment library and is suitable for use in evaluating when the stopping criteria have been met.

The size of the ensemble needed for statistical significance at each peptide length is based on two rigorous stopping criteria: (i) additional conformers are generated at random until the entropy of the conformational space spanned by the ensemble attains equilibrium, and (ii) the mesostate sequences of at least 95% of the previous 1,000 conformers have already been sampled. Ensemble size expands rapidly with peptide length, reaching 7,333,000 conformers for blocked seven-residue peptides (Table 1). A plot of all φ,ψ-values sampled by this ensemble is shown in Fig. 2b, and a full description of the algorithm is given in Methods.

Using these criteria, each generated ensemble closely approximates a normal distribution (Fig. 3), with a calculated mean radius of gyration, 〈Rg〉, as expected for a statistical coil of that length (Table 1). In greater detail, the well known power-law relationship (27) between chain length and 〈Rg〉 for a random coil with excluded volume in good solvent is given by the equation

graphic file with name zpq00908-9375-m01.jpg

where N is the chain length in monomer units, Ro is a prefactor related to the persistence length, and ν is the characteristic exponent that depends on solvent quality. When best-fit to this equation, our 〈Rg〉 values are

graphic file with name zpq00908-9375-m02.jpg

Both the prefactor and exponent are close to those reported by Kohn et al. (1.330 and 0.605, respectively) from small-angle x-ray scattering measurements of 26 denatured proteins (28).

Fig. 3.

Fig. 3.

Radii of gyration from simulations. Radii of gyration, Rg, from Ac-(Ala)n-Nme ensembles for n = 4, 5, 6, and 7 (colored black, red, green, and blue, respectively). Data are displayed as probability densities (like smoothed histograms): the distribution of Rg values has been normalized to unity in these equal-area curves. Each generated ensemble closely approximates a normal distribution, with a mean 〈Rg〉 near the value expected for a statistical coil of corresponding length in good solvent (see Table 1).

Dialing-in Hydrogen Bond Energy.

If a randomly generated conformer juxtaposes a backbone donor and acceptor and has acceptable hydrogen bond geometry, the criterion for steric clash is relaxed, and the groups are classified as hydrogen bond-satisfied. Consequently, unweighted hydrogen-bonded pairs are present in the initial population, and the ensemble can then be reweighted based on an assigned intramolecular hydrogen bond energy, which is varied between 0 and −6 kcal/mol.

When the Boltzmann-weighted distribution of conformers is plotted as a function of the assigned hydrogen bond energy, characteristic measures such as 〈Rg〉 and 〈end-to-end distance〉 describe a sigmoidal transition with a midpoint near 1.5 kcal/mol per hydrogen bond (Fig. 4). This midpoint approximates experimentally determined values (2931), although perhaps fortuitously so, given the omission of all energetic terms in our simulations except sterics and hydrogen bonding. Regardless, the transition midpoint is similar for all parameters of interest and provides a self-consistent reference point at which to characterize the hydrogen bond-weighted population.

Fig. 4.

Fig. 4.

Characteristic parameters. Mean values of Rg (Å), end-to-end distance (Å), and backbone and side-chain ASA (Å2) are shown for the Ac-(Ala)7-Nme ensemble, plotted as a function of the Boltzmann-weighted hydrogen bond strength in nominal units of kilocalories per mole, as it is “dialed” upward from 0. All plots are approximately sigmoidal and have midpoints ≈1.5 kcal/mol per H bond, near the value of an experimentally determined intramolecular hydrogen bond (30, 31). At unrealistically strong hydrogen bond energies, a small number of unusual conformers can dominate the Boltzmann-weighted population, perturbing the sigmoid curve, as seen in the 〈ASAbackbone〉 plot. In this particular case, the perturbation is a singularity that results from overweighting a small subpopulation with multiple sites of self-interaction, and it is length-specific; the corresponding plot for six-mers is smoothly sigmoidal (data not shown). In practice, such singularities are not a problem under more realistic solvent conditions. As seen in the four profiles, the transition midpoint is similar for all ensemble-averaged parameters of interest, and it provides a physically based reference point.

Surface Area of Residues in the Unfolded State.

As shown in Fig. 1, backbone ASA for a blocked polyalanyl peptide reaches a plateau by n = 7. Accordingly, the central residue in an Ac-(Ala)3-Xaa-(Ala)3-Nme peptide is used to estimate the average accessible surface area of unfolded state residues, where Xaa is any of the 20 possible residues. Backbone and side-chain ASA values for each residue type are listed in Table 2.

Table 2.

Backbone and side-chain accessible surface area

Residue Hydrogen bond strength*
Creamer's values
Backbone
Side chain
Backbone
Side chain
0 −1.5 0 −1.5 Upper Lower Average Upper Lower Average
Ala 38.8 33.6 63.1 60.2 35.9 19.8 27.85 63.6 46.6 55.10
Arg 32.0 24.7 198.3 185.2 33.0 17.1 25.05 185.3 156.9 171.10
Asn 34.6 28.8 90.0 84.3 32.7 17.6 25.15 95.6 84.5 90.05
Asp 33.5 27.1 103.6 99.4 33.9 18.1 26.00 94.8 79.2 87.00
Cys 34.0 28.0 97.0 94.0 34.5 18.2 26.35 83.0 62.9 72.95
Gln 32.6 25.0 119.4 113.7 33.4 17.2 25.30 128.7 105.0 116.85
Glu 32.7 26.0 133.6 130.8 33.5 17.9 25.70 123.9 102.8 113.35
Gly 74.9 67.9 0.0 0.0 75.7 54.6 65.15 0.0 0.0 0.00
His 31.4 27.0 144.4 140.0 33.4 14.9 24.15 119.1 103.9 111.50
Ile 27.7 22.5 139.4 135.6 24.7 15.2 19.95 134.1 100.1 117.10
Leu 30.3 24.6 143.3 139.5 30.7 14.7 22.70 117.7 101.4 109.55
Lys 32.1 26.5 170.5 160.5 33.8 18.3 26.05 158.8 142.5 150.65
Met 32.5 26.6 150.5 147.2 33.8 16.7 25.25 139.5 105.3 122.40
Phe 31.3 22.0 166.8 166.6 33.3 15.3 24.30 139.8 118.7 129.25
Pro 27.5 24.4 103.5 101.4 26.1 18.9 22.50 90.5 83.5 87.00
Ser 36.0 30.3 75.6 71.1 35.0 23.8 29.40 73.3 59.7 66.50
Thr 30.8 25.2 101.7 96.6 29.5 18.6 24.05 91.2 77.3 84.25
Trp 30.0 23.2 209.2 202.9 32.0 15.1 23.55 158.4 154.7 156.55
Tyr 31.3 26.5 182.5 179.2 33.5 17.7 25.60 152.3 131.0 141.65
Val 28.7 23.2 114.8 111.5 24.9 15.9 20.40 110.9 81.8 96.35

Mean accessible surface areas, 〈ASAbackbone〉 and 〈ASAside chain〉, for the central residue of the Ac-(Ala)3-X-(Ala)3-Nme ensemble, where X is any of the 20 residues.

*Hydrogen bond strength (in kcal/mol per hydrogen bond) is “dialed” from “off” to “high” as solvent quality is shifted from good to poor. Values are given at “off” and near the transition midpoint, at −1.5 (see Fig. 4).

Upper and lower bounds for 〈ASA〉 from the Creamer models (14, 15), and the arithmetic average of these two extrema (see Fig. 5).

Overall, the backbone values in Table 2 are similar to Creamer's upper bounds (15). In some cases, they can even exceed Creamer's extended-conformation upper limit by a few ångstroms because peptide flexibility in our ensemble model allows for greater curvature and, hence, slightly greater backbone exposure than a completely extended conformation. Unlike backbone values, side-chain ASA has not yet reached a plateau at n = 7, and therefore the side-chain values listed in Table 2 can only be regarded as upper limits.

Of course, ASA values are solvent-dependent. A good solvent like urea interacts favorably with the peptide backbone (10, 3234), promoting extended conformations (35). Conversely, a poor solvent like trimethylamine N-oxide (TMAO) interacts unfavorably with the backbone, promoting contracted conformations (36) together with associated intramolecular hydrogen bonding (37). The issue of whether aqueous buffer alone is a good or poor solvent remains a topic of considerable controversy.

To capture solvent dependence, parameters of interest, 〈X〉, were reweighted using an intramolecular hydrogen bond potential, ε, that ranges from 0 to −6 in decrements of 0.1:

graphic file with name zpq00908-9375-m03.jpg

where Xi is the parameter value in conformation i, ε is the hydrogen bond energy in kilocalories per mole, Hi is the number of intramolecular hydrogen bonds in conformation i, N is number of conformers in the ensemble, R is the gas constant, T is the absolute temperature, and β = 1/RT is taken at T = 300 K. This sum is normalized by the partition function, Z, the Boltzmann-weighted sum over all conformers in the canonical ensemble, i.e., Z = Σi e−βεHi. The value of ε = 0 represents an unweighted average, where conformers satisfying hydrogen bond geometry are present but make no energetic contribution to the ensemble; successive ε-values correspond to increasingly poor solvent and correspondingly stronger backbone–backbone hydrogen bonds. The transition from good to poor solvent for parameters of interest in Fig. 4 is consistent with the coil-globule transition observed in the denatured state of protein L on transfer from denaturing solvent to folding conditions (38).

For calculations that require unfolded state ASA values under folding conditions, we propose using the seven-mer ensemble, weighted by a hydrogen bond energy of 1.5 kcal/mol. This value corresponds to an approximate transition midpoint for the parameters shown in Fig. 4, akin to a θ-point (ref. 4, p 34), where expanded conformers counterbalance contracted conformers. The average per-residue ASA value of each backbone and side chain, calculated at this selected hydrogen bond energy, is listed in Table 2. Backbone values are in close agreement with the average of Creamer's upper and lower bounds (15), illustrated graphically in Fig. 5, and therefore these values are also close to Schellman's estimate (16) used by Auton and Bolen (9, 10). Side-chain ASA sampled at this hydrogen bond energy is listed in Table 2, but, as noted, these side-chain values have not quite leveled off at n = 7, resulting in calculated values that are often larger than the mean of Creamer's upper and lower bounds. Nevertheless, these side-chain ASAs can serve to refine the estimated upper bounds at the transition midpoint.

Fig. 5.

Fig. 5.

Comparison of 〈ASAbackbone〉 with the Creamer extrema. Blue dots represent the Boltzmann-weighted, ensemble-averaged values of backbone 〈ASA〉 at the transition midpoint for Ac-(Ala)7-Nme peptides. In every case, these values fall within the interval between Creamer's upper (downward-pointing red triangles) and lower (upward-pointing red triangles) bounds (15).

γ-Turns in the Unfolded Population.

At the transition midpoint, the reweighted φ,ψ-map has a significant population in the γ-basin (Fig. 6a), resembling the corresponding basin from the coil library (Fig. 6b). In the coil library, the γ-basin is centered at φ,ψ = −85°, 78°, adjacent to the PII basin, centered at φ,ψ = −65°, 141°. Unlike a three-residue turn of PII helix, which lacks intramolecular hydrogen bonds (39), a residue in γ-conformation (i.e., C7-equatorial) at position i engenders a three-residue, hydrogen-bonded inverse γ-turn: NH (i + 1)⋯O=C (i − 1) (40). The emergence of a pronounced γ-basin in these simulations is insensitive to the particular choice of hydrogen bond energy; any value within a broad range would suffice because a seven-residue peptide can realize a sterically accessible, hydrogen-bonded conformer in only a few conformational classes (e.g., γ-, β-, or α-helical turn), of which the γ-turn pays the smallest price in conformational entropy. The agreement between these simple simulations and data from solved structures in the coil library is consistent with the supposition that the γ-population, observed in experimentally determined structures, is promoted by local hydrogen bonding. If this supposition is correct, blocked polyalanyl heptamers in poor solvent will populate γ-turns in detectable measure.

Fig. 6.

Fig. 6.

Weighted φ,ψ-maps from simulation and experiment. (a) An energy-weighted map of the Ac-(Ala)7-Nme ensemble with hydrogen-bond energy set to its value at the transition midpoint in Fig. 4. (b) A population-weighted map of conformers from the coil library (21). These contour maps are partitioned into a 2°× 2° grid; grid squares are colored by the total number of conformers they contain. Colors are indicated on the rainbow scale, to the right of each figure. In energy-weighted simulations (a), the only conspicuous region of high population density corresponds to inverse γ-turns (i.e., C7-equatorial), a three-residue, hydrogen-bonded NH (i + 1)⋯O=C (i − 1) conformation (40). A distinct peninsula that also corresponds to inverse γ-turns is observed in experimentally derived structures (b).

Discussion

For solutes like peptides that undergo solvent-dependent conformational changes (41, 42), characteristic system parameters like 〈ASA〉 and 〈Rg〉 are underdetermined without an explicit specification of solvent conditions. In particular, the population of intramolecular hydrogen-bonded structures in peptides depends on solvent quality, defined as the character of some solvent relative to a reference solvent, typically aqueous buffer. A good solvent favors enhanced solute–solvent interactions; a poor solvent favors enhanced solute–solute interactions. Cosolvents that enhance solvent quality, like urea, promote backbone–solvent hydrogen bonding, whereas cosolvents that reduce solvent quality, like TMAO, promote backbone–backbone hydrogen bonding (37, 43). Consistent with these experimental data, peptide–solvent interactions are known to have a dramatic effect on the backbone but only a comparatively small effect on side chains (9, 10, 3234).

In essence, these conformation-dependent effects can be characterized as solvent-mediated competition between inter- and intramolecular hydrogen bonds (44). The ensemble approach developed here seeks to capture solvent conditions by a “hydrogen bond dial” that models solvent quality by regulating the energy assigned to an intramolecular hydrogen bond (25). As the dial is varied from “off” to “high” (i.e., 0 to −6 kcal/mol), parameters of interest trace sigmoidal curves with similar transition midpoints. Approximately 72% of the conformers in the ensemble of blocked polyalanyl seven-mers include at least one hydrogen-bonded configuration, resulting in a substantial population of contracted conformers at the transition midpoint.

Almost all of these hydrogen-bonded configurations are involved in peptide chain turns, predominantly inverse γ-turns (Fig. 6a) that form N–H (i + 1) ⋯ (i − 1) O=C hydrogen bonds (40). A single residue at position i with backbone dihedral angles in the γ-basin is sufficient to engender a γ-turn. Other recent studies also report a significant turn population in peptides (4549) under various solvent conditions. Schweitzer-Stenner and Measey (49) report that the turn population is length-dependent, with only a small fraction in trimers and tetramers but a large fraction in seven-mers. Their focus is on β-turns, but the same conclusion would hold for inverse γ-turns.

There is an extensive earlier chemical literature on γ-turns; see, for example, refs. 5052. More recently, Kallenbach and coworkers (45) found they can drive the short pentapeptide AcGGAGGNH2 from PII toward an increasing population of hydrogen-bonded γ- or β-turns by using selected cosolvent additions that produce graduated reductions in solvent quality. Conversely, Creamer and coworkers (53) found that they can promote PII conformation by addition of the cosolvent urea, which enhances solvent quality by interacting with the protein backbone (10, 3234) and competing with intramolecular hydrogen bonds (37).

Much recent work—both experimental and theoretical—reports a significant population of PII conformation in unfolded proteins; see ref. 54 and references therein. In retrospect, a source of unsuspected confusion in interpreting such results is the widespread tendency to overclassify γ-conformers into the PII-basin when subdividing the φ,ψ-map. Ramachandran et al. (55) distinguished PII conformation from 2.27-helix (their terminology for inverse γ-conformers), but this distinction has often been ignored in more recent work—regrettably so. The PII- and γ-basins occupy distinct, nonintersecting regions of the φ,ψ-map (Fig. 6b), and they are chemically distinct as well; only the latter forms intramolecular hydrogen-bonded structures.

We caution that the approach presented here is an imperfect model. The hydrogen bond dial eliminates the need for explicit hydrogen bond energies (intrapeptide, peptide–water, water–water), substituting instead the energy by which an intramolecular hydrogen bond exceeds that of a corresponoding peptide–water hydrogen bond (29, 31). Further, for peptides in a good solvent, hydrogen-bonded conformers were accepted deliberately, albeit with energy = 0. Were the donor–acceptor distance of close approach constrained to be no less than the sum of their respective van der Waals radii, conformers would be more extended, on average, than those generated here. Such populations could be modeled readily by regenerating the ensemble using the current approach but with the criteria for steric collision imposed on all interatomic interactions, including donor–acceptor pairs. In this case, the hydrogen-bonding requirements of backbone polar groups would be satisfied exclusively by solvent. However, our emphasis here is on unfolded proteins under folding conditions, not under good-solvent extremes. Moreover, despite this caveat, the present model is probably sufficient for most practical purposes given that the 〈Rg〉 values in Table 1 exhibit only minor deviations from the expected averages for randomly coiled peptides (4).

We conclude this discussion by again underscoring the degree to which the unfolded state is a dynamic ensemble for which the values of familiar characteristic parameters depend on solvent quality. Seen in a larger context, our results suggest the need for a reassessment of hydrogen bonding in unfolded proteins under solvent conditions of experimental interest (29) such as molten globules (56, 57). The prediction of a population of γ-turns in short peptides, described in the preceding, is but one example. In general, it is apparent that protein–solvent interactions in the unfolded states can depart radically from the naive picture of a featureless backbone (24, 26) with polar groups that are satisfied exclusively by solvent (58).

Methods

Boltzmann-weighted ASAs were calculated for the middle residue, Xaa, in a blocked alanine-based host, N-acetyl-(Ala)k-Xaa-(Ala)k-N-methylamide; Xaa is any of the 20 residues. Backbone conformational entropy was calculated by using a corresponding polyalanyl peptide, Ac-(Ala)n-Nme, with n = 2k + 1. Backbone (ϕ, ψ, ω) and side-chain (χi) torsions were varied as described; bond lengths and angles, taken from the LINUS package (59, 60), were held constant at typical values.

Nine-State Model.

The φ,ψ-map (Fig. 2a) was subdivided into nine sterically allowed regions (gray areas labeled A to I), each 70° × 70°, called mesostates. Only 10% of residues in the coil library (21) fall outside these nine states; 77% of these outliers are glycines with φ > 0. Any backbone structure with known φ,ψ-angles can be mapped into a conformational string, that is, a linear sequence of mesostates chosen from these nine sterically allowed regions. In cases where none of the nine mesostates included a particular φ,ψ-pair, the residue in question was assigned to the closest mesostate. Conformational strings obtained in this way are used to estimate the effective size of the ensemble, as described next.

Entropy and the Size of Conformational Space.

Conformers of length n are generated at random by fragment assembly from a fragment library (described below). We seek a thermodynamically based estimator that measures the degree to which this generated ensemble of conformers, Ω, spans available conformational space. The entropy of the ensemble, S = R log Ω, is such a measure but suffers from the familiar problem that S increases without limit as the number of generated fragments increases. However, the number of possible conformational strings is fixed at 9n, and the probability of any given string is determined by the underlying distribution of its torsion angles in the fragment library.

Using this relationship, the desired estimator is obtained by mapping the generated conformers into their corresponding conformational strings. Then, taking each conformational string to represent a microstate, the ensemble entropy, S = R log Ω, can be calculated as

graphic file with name zpq00908-9375-m04.jpg

where summation is over the ensemble of generated strings, Ω; pi is the probability that the conformer will adopt the ith conformational string; and R is the ideal gas constant. Accordingly, we use S/R=−Σi pi log pi to estimate the magnitude of conformational space.

Construction of the Fragment Library.

A library of 78,761 six-residue fragments was constructed by using the Fitzkee method (24), a Monte Carlo-based simulation that provides clash-free conformers, with all backbone hydrogen-bonding groups satisfied either internally or by solvent. Atomic radii from ref. 60 were used here: C (sp2) = 1.5 Å, C (sp3) = 1.65 Å, O (sp2) = 1.35 Å, O (sp3) = 1.5 Å, N (sp2) = 1.35 Å, and H = 1.1 Å. All radii were scaled by 0.95. Water molecules were modeled as spheres with radii = 1.25Å. When generating library conformers, backbone φ,ψ-angles were allowed to sample all conformational space; the ω-angle was chosen at random from a normal distribution with a mean of 180° ± 5°.

The Fragment Assembly Algorithm.

Peptides of length n were assembled from a library of pretested, clash-free, hydrogen bond-satisfied fragments. This approach eliminates substantial sampling overhead. A length of n = 6 was chosen because systematic local steric clash can span six consecutive residues (2426). Fragments were generated as follows:

  1. Typically, fragment-assembly algorithms sample termini less frequently than central residues. To eliminate end-effect bias, five virtual residues are appended to either end of the fragment.

  2. A residue is chosen at random together with a six-residue window, itself chosen randomly from the six possible windows that include the chosen residue. Next, a randomly chosen six-residue library fragment is substituted for the six residues within the window. Backbone φ-, ψ-, and ω-angles of nonvirtual residues in this window are then varied; new values are selected from a normal distribution centered at existing values, with a standard deviation of 5° for φ and ψ and 2° for ω. However, glycine and proline and side-chain torsions are chosen at random, as in ref. 59. Coordinates are updated at each iteration and retained if clash-free and hydrogen bond-satisfied. Atom and water radii are the same as those used in constructing the fragment library.

  3. Step 2 is repeated until the torsion angles of every nonvirtual residue in the peptide fragment have been replaced at least once.

Steps 1–3 result in the generation of a clash-free, hydrogen bond-satisfied conformer. The conformer is then mapped into its corresponding conformational string; both conformer and string are saved. The distribution of conformational strings is updated every 1,000 repetitions of steps 1–3, and the ensemble entropy is recalculated (Eq. 4) to evaluate the stopping criteria. This procedure is applicable to peptides of any length, including those shorter than the six-residue library fragments.

Stopping Criteria.

New conformers are generated and added to the growing ensemble until two stopping criteria are satisfied.

  1. The ensemble entropy approaches equilibrium. This criterion is met when the values of S/R in three successive rounds (1,000 new conformers added at each round) have pairwise differences of <0.001.

  2. The sampling of conformational space approaches saturation. This criterion is met when the conformational strings of 95% or more of the last 1,000 newly generated conformers have already been sampled.

ACKNOWLEDGMENTS.

We thank Neville Kallenbach for his many durable contributions, Buzz Baldwin, Juliette Lecomte, and Lauren Perskie for helpful discussion and careful reading of the manuscript, Lauren Perskie for Fig. 6b, and the Mathers Foundation for support.

Footnotes

The authors declare no conflict of interest.

References

  • 1.Anfinsen CB. Principles that govern the folding of protein chains. Science. 1973;181:223–230. doi: 10.1126/science.181.4096.223. [DOI] [PubMed] [Google Scholar]
  • 2.Dunker AK, et al. Intrinsically disordered protein. J Mol Graphics Model. 2001;19:26–59. doi: 10.1016/s1093-3263(00)00138-8. [DOI] [PubMed] [Google Scholar]
  • 3.Berman HM, et al. The Protein Data Bank. Nucleic Acids Res. 2000;28:235–242. doi: 10.1093/nar/28.1.235. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Flory PJ. Statistical Mechanics of Chain Molecules. New York: Wiley; 1969. [Google Scholar]
  • 5.Tanford C. Protein denaturation. Adv Protein Chem. 1968;23:121–282. doi: 10.1016/s0065-3233(08)60401-5. [DOI] [PubMed] [Google Scholar]
  • 6.Fitzkee NC, Rose GD. Reassessing random-coil statistics in unfolded proteins. Proc Natl Acad Sci USA. 2004;101:12497–12502. doi: 10.1073/pnas.0404236101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Fleming PJ, Rose GD. In: Protein Folding Handbook. Kiefhaber T, Buchner J, editors. Vol 2. Weinheim: Wiley-VCH; 2005. pp. 710–736. [Google Scholar]
  • 8.Richards FM. Areas, volumes, packing, and protein structure. Annu Rev Biophys Bioeng. 1977;6:151–176. doi: 10.1146/annurev.bb.06.060177.001055. [DOI] [PubMed] [Google Scholar]
  • 9.Auton M, Bolen DW. Predicting the energetics of osmolyte-induced protein folding/unfolding. Proc Natl Acad Sci USA. 2005;102:15065–15068. doi: 10.1073/pnas.0507053102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Auton M, Holthauzen LM, Bolen DW. Anatomy of energetic changes accompanying urea-induced protein denaturation. Proc Natl Acad Sci USA. 2007;104:15317–15322. doi: 10.1073/pnas.0706251104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Lee B, Richards FM. The interpretation of protein structures: Estimation of static accessibility. J Mol Biol. 1971;55:379–400. doi: 10.1016/0022-2836(71)90324-x. [DOI] [PubMed] [Google Scholar]
  • 12.Lesser GJ, Rose GD. Hydrophobicity of amino acid subgroups in proteins. Proteins Struct Funct Genet. 1990;8:6–13. doi: 10.1002/prot.340080104. [DOI] [PubMed] [Google Scholar]
  • 13.Rose GD, Geselowitz AR, Lesser GJ, Lee RH, Zehfus MH. Hydrophobicity of amino acid residues in globular proteins. Science. 1985;229:834–838. doi: 10.1126/science.4023714. [DOI] [PubMed] [Google Scholar]
  • 14.Creamer TP, Srinivasan R, Rose GD. Modeling unfolded states of peptides and proteins. Biochemistry. 1995;34:16245–16250. doi: 10.1021/bi00050a003. [DOI] [PubMed] [Google Scholar]
  • 15.Creamer TP, Srinivasan R, Rose GD. Modeling unfolded states of proteins and peptides. II. Backbone solvent accessibility. Biochemistry. 1997;36:2832–2835. doi: 10.1021/bi962819o. [DOI] [PubMed] [Google Scholar]
  • 16.Schellman JA. Protein stability in mixed solvents: A balance of contact interaction and excluded volume. Biophys J. 2003;85:108–125. doi: 10.1016/S0006-3495(03)74459-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Goldenberg DP. Computational simulation of the statistical properties of unfolded proteins. J Mol Biol. 2003;326:1615–1633. doi: 10.1016/s0022-2836(03)00033-0. [DOI] [PubMed] [Google Scholar]
  • 18.Gong H, Fleming PJ, Rose GD. Building native protein conformation from highly approximate backbone torsion angles. Proc Natl Acad Sci USA. 2005;102:16227–16232. doi: 10.1073/pnas.0508415102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Gong H, Shen Y, Rose GD. Building native protein conformation from NMR backbone chemical shifts using Monte Carlo fragment assembly. Protein Sci. 2007;16:1515–1521. doi: 10.1110/ps.072988407. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Wang G, Dunbrack RL., Jr PISCES: A protein sequence culling server. Bioinformatics. 2003;19:1589–1591. doi: 10.1093/bioinformatics/btg224. [DOI] [PubMed] [Google Scholar]
  • 21.Fitzkee NC, Fleming PJ, Rose GD. The Protein Coil Library: A structural database of nonhelix, nonstrand fragments derived from the PDB. Proteins. 2005;58:852–854. doi: 10.1002/prot.20394. [DOI] [PubMed] [Google Scholar]
  • 22.Serrano L. Comparison between the phi distribution of the amino acids in the protein database and NMR data indicates that amino acids have various phi propensities in the random coil conformation. J Mol Biol. 1995;254:322–333. doi: 10.1006/jmbi.1995.0619. [DOI] [PubMed] [Google Scholar]
  • 23.Smith LJ, et al. Analysis of main chain torsion angles in proteins: Prediction of NMR coupling constants for native and random coil conformations. J Mol Biol. 1996;255:494–506. doi: 10.1006/jmbi.1996.0041. [DOI] [PubMed] [Google Scholar]
  • 24.Fitzkee NC, Rose GD. Sterics and solvation winnow accessible conformational space for unfolded proteins. J Mol Biol. 2005;353:873–887. doi: 10.1016/j.jmb.2005.08.062. [DOI] [PubMed] [Google Scholar]
  • 25.Pappu RV, Srinivasan R, Rose GD. The Flory isolated-pair hypothesis is not valid for polypeptide chains: Implications for protein folding. Proc Natl Acad Sci USA. 2000;97:12565–12570. doi: 10.1073/pnas.97.23.12565. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Fitzkee NC, Rose GD. Steric restrictions in protein folding: An alpha-helix cannot be followed by a contiguous beta-strand. Protein Sci. 2004;13:633–639. doi: 10.1110/ps.03503304. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Flory PJ. Principles of Polymer Chemistry. Ithaca, NY: Cornell Univ Press; 1953. [Google Scholar]
  • 28.Kohn JE, et al. Random-coil behavior and the dimensions of chemically unfolded proteins. Proc Natl Acad Sci USA. 2004;101:12491–12496. doi: 10.1073/pnas.0403643101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Pace CN, Trevino S, Prabhakaran E, Scholtz JM. Protein structure, stability and solubility in water and other solvents. Philos Trans R Soc Lond B Biol Sci. 2004;359:1225–1235. doi: 10.1098/rstb.2004.1500. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Scholtz JM, et al. Calorimetric determination of the enthalpy change for the alpha-helix to coil transition of an alanine peptide in water. Proc Natl Acad Sci USA. 1991;88:2854–2858. doi: 10.1073/pnas.88.7.2854. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Richardson JM, Lopez MM, Makhatadze GI. Enthalpy of helix-coil transition: missing link in rationalizing the thermodynamics of helix-forming propensities of the amino acid residues. Proc Natl Acad Sci USA. 2005;102:1413–1418. doi: 10.1073/pnas.0408004102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Scholtz JM, Barrick D, York EJ, Stewart JM, Baldwin RL. Urea unfolding of peptide helices as a model for interpreting protein unfolding. Proc Natl Acad Sci USA. 1995;92:185–189. doi: 10.1073/pnas.92.1.185. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.O'Brien EP, Dima RI, Brooks B, Thirumalai D. Interactions between hydrophobic and ionic solutes in aqueous guanidinium chloride and urea solutions: Lessons for protein denaturation mechanism. J Am Chem Soc. 2007;129:7346–7353. doi: 10.1021/ja069232+. [DOI] [PubMed] [Google Scholar]
  • 34.Cannon JG, Anderson CF, Record MT., Jr Urea-amide preferential interactions in water: quantitative comparison of model compound data with biopolymer results using water accessible surface areas. J Phys Chem B. 2007;111:9675–9685. doi: 10.1021/jp072037c. [DOI] [PubMed] [Google Scholar]
  • 35.Auton M, Ferreon AC, Bolen DW. Metrics that differentiate the origins of osmolyte effects on protein stability: A test of the surface tension proposal. J Mol Biol. 2006;361:983–992. doi: 10.1016/j.jmb.2006.07.003. [DOI] [PubMed] [Google Scholar]
  • 36.Qu Y, Bolen CL, Bolen DW. Osmolyte-driven contraction of a random coil protein. Proc Natl Acad Sci USA. 1998;95:9268–9273. doi: 10.1073/pnas.95.16.9268. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Street TO, Bolen DW, Rose GD. A molecular mechanism for osmolyte-induced protein stability. Proc Natl Acad Sci USA. 2006;103:13997–14002. doi: 10.1073/pnas.0606236103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Sherman E, Haran G. Coil-globule transition in the denatured state of a small protein. Proc Natl Acad Sci USA. 2006;103:11539–11543. doi: 10.1073/pnas.0601395103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Creamer TP. Left-handed polyproline II helix formation is (very) locally driven. Proteins. 1998;33:218–226. [PubMed] [Google Scholar]
  • 40.Rose GD, Gierasch LM, Smith JA. Turns in peptides and proteins. Adv Protein Chem. 1985;37:1–109. doi: 10.1016/s0065-3233(08)60063-7. [DOI] [PubMed] [Google Scholar]
  • 41.Avbelj F, Luo P, Baldwin RL. Energetics of the interaction between water and the helical peptide group and its role in determining helix propensities. Proc Natl Acad Sci USA. 2000;97:10786–10791. doi: 10.1073/pnas.200343197. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Mezei M, Fleming PJ, Srinivasan R, Rose GD. Polyproline II helix is the preferred conformation for unfolded polyalanine in water. Proteins. 2004;55:502–507. doi: 10.1002/prot.20050. [DOI] [PubMed] [Google Scholar]
  • 43.Baskakov I, Bolen DW. Forcing thermodynamically unfolded proteins to fold. J Biol Chem. 1998;273:4831–4834. doi: 10.1074/jbc.273.9.4831. [DOI] [PubMed] [Google Scholar]
  • 44.Rose GD, Fleming PJ, Banavar JR, Maritan A. A backbone-based theory of protein folding. Proc Natl Acad Sci USA. 2006;103:16623–16633. doi: 10.1073/pnas.0606843103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Liu Z, et al. Solvent dependence of PII conformation in model alanine peptides. J Am Chem Soc. 2004;126:15141–15150. doi: 10.1021/ja047594g. [DOI] [PubMed] [Google Scholar]
  • 46.Zagrovic B, et al. Unusual compactness of a polyproline type II structure. Proc Natl Acad Sci USA. 2005;102:11698–11703. doi: 10.1073/pnas.0409693102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Crick SL, Jayaraman M, Frieden C, Wetzel R, Pappu RV. Fluorescence correlation spectroscopy shows that monomeric polyglutamine molecules form collapsed structures in aqueous solutions. Proc Natl Acad Sci USA. 2006;103:16764–16769. doi: 10.1073/pnas.0608175103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Makowska J, et al. Polyproline II conformation is one of many local conformational states and is not an overall conformation of unfolded peptides and proteins. Proc Natl Acad Sci USA. 2006;103:1744–1749. doi: 10.1073/pnas.0510549103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Schweitzer-Stenner R, Measey TJ. The alanine-rich XAO peptide adopts a heterogeneous population, including turn-like and polyproline II conformations. Proc Natl Acad Sci USA. 2007;104:6649–6654. doi: 10.1073/pnas.0700006104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Beveridge DL, Ravishanker G, Mezei M, Gedulin B. Solvent effect on conformational stability in the Ala dipeptide: Full free energy simulations. J Biomol Struct Dyn. 1986;3:237–252. [Google Scholar]
  • 51.Sapse A-M, Mallah-Levy L, Daniels SB, Erickson BW. The γ-turn: Ab initio calculations on proline and N-acetylproline amide. J Am Chem Soc. 1987;109:3526–3529. [Google Scholar]
  • 52.Milner-White EJ. Situations of gamma-turns in proteins. Their relation to alpha-helices, beta-sheets and ligand binding sites. J Mol Biol. 1990;216:386–397. [PubMed] [Google Scholar]
  • 53.Whittington SJ, Chellgren BW, Hermann VM, Creamer TP. Urea promotes polyproline II helix formation: implications for protein denatured states. Biochemistry. 2005;44:6269–6275. doi: 10.1021/bi050124u. [DOI] [PubMed] [Google Scholar]
  • 54.Shi Z, Chen K, Liu Z, Kallenbach NR. Conformation of the backbone in unfolded proteins. Chem Rev. 2006;106:1877–1897. doi: 10.1021/cr040433a. [DOI] [PubMed] [Google Scholar]
  • 55.Ramachandran GN, Sasisekharan V. Conformation of polypeptides and proteins. Adv Protein Chem. 1968;23:283–438. doi: 10.1016/s0065-3233(08)60402-7. [DOI] [PubMed] [Google Scholar]
  • 56.Ptitsyn OB. Molten globule and protein folding. Adv Protein Chem. 1995;47:83–229. doi: 10.1016/s0065-3233(08)60546-x. [DOI] [PubMed] [Google Scholar]
  • 57.Kuwajima K. The molten globule state of alpha-lactalbumin. FASEB J. 1996;10:102–109. doi: 10.1096/fasebj.10.1.8566530. [DOI] [PubMed] [Google Scholar]
  • 58.Fleming PJ, Rose GD. Do all backbone polar groups in proteins form hydrogen bonds? Protein Sci. 2005;14:1911–1917. doi: 10.1110/ps.051454805. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Srinivasan R, Rose GD. Ab initio prediction of protein structure using LINUS. Proteins. 2002;47:489–495. doi: 10.1002/prot.10103. [DOI] [PubMed] [Google Scholar]
  • 60.Srinivasan R, Rose GD. A physical basis for protein secondary structure. Proc Natl Acad Sci USA. 1999;96:14258–14263. doi: 10.1073/pnas.96.25.14258. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES