Skip to main content
PLOS Computational Biology logoLink to PLOS Computational Biology
. 2016 Jan 4;12(1):e1004686. doi: 10.1371/journal.pcbi.1004686

Hydrodynamic Radii of Intrinsically Disordered Proteins Determined from Experimental Polyproline II Propensities

Maria E Tomasso 1, Micheal J Tarver 1, Deepa Devarajan 1, Steven T Whitten 1,*
Editor: Predrag Radivojac2
PMCID: PMC4699819  PMID: 26727467

Abstract

The properties of disordered proteins are thought to depend on intrinsic conformational propensities for polyproline II (PP II) structure. While intrinsic PP II propensities have been measured for the common biological amino acids in short peptides, the ability of these experimentally determined propensities to quantitatively reproduce structural behavior in intrinsically disordered proteins (IDPs) has not been established. Presented here are results from molecular simulations of disordered proteins showing that the hydrodynamic radius (R h) can be predicted from experimental PP II propensities with good agreement, even when charge-based considerations are omitted. The simulations demonstrate that R h and chain propensity for PP II structure are linked via a simple power-law scaling relationship, which was tested using the experimental R h of 22 IDPs covering a wide range of peptide lengths, net charge, and sequence composition. Charge effects on R h were found to be generally weak when compared to PP II effects on R h. Results from this study indicate that the hydrodynamic dimensions of IDPs are evidence of considerable sequence-dependent backbone propensities for PP II structure that qualitatively, if not quantitatively, match conformational propensities measured in peptides.

Author Summary

Molecular models of disordered protein structures are needed to elucidate the functional mechanisms of intrinsically disordered proteins, a class of proteins implicated in many disease pathologies and human health issues. Several studies have measured intrinsic conformational propensities for polyproline II helix, a key structural motif of disordered proteins, in short peptides. Whether or not these experimental polyproline II propensities, which vary by amino acid type, reproduce structural behavior in intrinsically disordered proteins has yet to be demonstrated. Presented here are simulation results showing that polyproline II propensities from short peptides accurately describe sequence-dependent variability in the hydrodynamic dimensions of intrinsically disordered proteins. Good agreement was observed from a simple molecular model even when charge-based considerations were ignored, predicting that global organization of disordered protein structure is strongly dependent on intrinsic conformational propensities and, for many intrinsically disordered proteins, modulated only weakly by coulombic effects.

Introduction

Many proteins, and protein domains, that perform critical biological tasks have disordered structures under normal solution conditions [13]. These proteins are referred to as intrinsically disordered [4] and, accordingly, molecular models of disordered protein structures are needed to understand the physical basis for the activities [2,3], roles regulating key signaling pathways [5], and relationships to human health issues [69] that have been linked to intrinsically disordered proteins (IDPs).

The properties of disordered protein structures are often associated with conformational propensities for polyproline II (PP II) helix [1012] and charge-based intramolecular interactions [1315]. PP II propensities are locally-determined [16] and intrinsic to amino acid type [1719], while charge-charge interactions seem to be important for organizing disordered structures owing to both long and short range contacts [1315,20,21]. Since chain preferences for PP II increase the hydrodynamic sizes of IDPs [22,23], and Coulombic interaction energies are distance-dependent, it could be argued that charge effects on IDP structures are modulated locally by intrinsic PP II propensities. A number of issues with that hypothesis, however, are apparent. First, it has not been established if PP II propensities measured in short peptide models of the unfolded states of proteins [1719] translate to IDPs. It could be that PP II propensities are negligible and unimportant in IDP systems. Second, methods capable of separating the impact of weak to possibly strong local conformational propensities and charge-charge interactions in the context of flexible and disordered protein structures have not been demonstrated, but are required for testing any potential interdependence.

To investigate such issues, a computer algorithm [2224] based on the Hard Sphere Collision (HSC) model [25] was developed for parsing the contributions of intrinsic PP II propensities and charge to the structures of IDPs, as represented by the hydrodynamic radius (R h). A HSC model was chosen since PP II propensities and charge effects could be added separately and in steps, to isolate contributions to simulated IDP structures. R h was chosen since experimental values are available for a wide range of IDP sequences, allowing direct comparisons to model-simulated R h.

Here we demonstrate that R h for disordered proteins trend with chain propensities for PP II structure by a simple power-law scaling relationship. Using experimental PP II propensities for the common biological amino acids from Kallenbach [17], Creamer [18], and Hilser [19], this relationship was tested against experimental R h from 22 IDPs [23,2642] ranging in size from 73 to 260 residues and net charge from 1 to 43. We observed that the power-law scaling function was able to reproduce IDP R h with good agreement when using propensities from Hilser, while the Kallenbach and Creamer scales consistently overestimated R h. The ability to describe R h from just intrinsic PP II propensities associated with a sequence was supported by simulation results showing that charge effects on IDP R h are generally weak. Relative to the effects of PP II propensities, charge effects on IDP R h were substantial only when charged side chains were separated in sequence by 2 or fewer residue positions and if the sequence had higher than typical bias for one charge type (i.e., positive or negative). Overall, these results demonstrated that two seemingly disparate experimental datasets, IDP R h and intrinsic PP II propensities, are in qualitative agreement; providing evidence for considerable sequence-dependent conformational preferences for PP II structure in the disordered states of biological proteins.

Results

Computer simulation of R h dependence on PP II propensity

R h for IDPs are sensitive to site-specific and general structural perturbations such as amino acid substitutions [23], changes in net charge [13,14], charge rearrangements [15], and temperature changes [22,43,44]. Fig 1 shows that IDP R h differ substantially from R h for folded proteins [22,45,46] that have similar residue length, N. R h from modeling proteins with no strongly preferred conformations [22], which is referred to as a random coil [47], is also provided for comparison to the experimental values. The solid line representing coil R h was determined from computer simulation of randomly configured polypeptide chains using a HSC model [22]. Owing to favorable native contacts that promote stable globular structures, folded proteins have R h that are compacted relative to the R h of simulated random coils. In contrast, the data in Fig 1 indicate that R h from IDPs are generally larger than random coil estimates.

Fig 1. R h comparison to number of residues, N. Filled and open circles represent experimental R h for IDPs [23,2642] and folded proteins [22,45,46], respectively.

Fig 1

The solid line is the R h dependence on N estimated from simulations of randomly configured protein structures [22]. Stippled lines show R h for randomly configured structures with chain propensities for PP II (f PPII) from 0.1 to 1 in 0.1 increments. Every other stippled line is end-labeled by its f PPII value.

The dependence of R h on N for chemically denatured proteins follows a power-law scaling relationship,

Rh=RoNv, (1)

where R o is 2.2 Å and v is 0.57 [45]. To understand changes in R o and v that are required for modeling the dependence of R h on N for IDPs, it is useful to recognize that unfolded proteins in aqueous solutions absent high concentrations of guanidine hydrochloride or urea show R h compaction [48] with a concomitant decrease in v [49]. Consistent with that observation, Marsh and Forman-Kay demonstrated that R h and N scale with v = 0.509 for IDPs under normal conditions [49]. R o for IDPs was found to depend on PRO content and net charge by,

Ro=(1.24fPRO+0.904)(0.00759|Q|+0.963)2.49, (2)

where f PRO is the fractional number of PRO residues and |Q| the absolute net charge determined from sequence [49]. Since PRO residues have strong propensities for PP II helix, which is an extended structure [50], and repulsive interactions between charged groups likewise favor extended conformations to minimize unfavorable energetics, a simple molecular interpretation of Eq (2) can be offered whereby the R h dependence on N for IDPs follows a baseline trend of R h = (2.17 Å)∙N 0.509 (i.e., R o with f PRO and |Q| set to zero) with sequence-dependent increases in R h from this baseline owing to chain propensities for PP II and repulsive charge-charge interactions. Simulated R h for random coils were observed to trend with N by R h = (2.16 Å)∙N 0.509 [22], supporting this hypothesis (and reproduced in Fig 1). The effects of ALA to GLY substitutions on IDP R h also indicated that chain propensities for PP II structure modulate IDP R h and not simply PRO content [23].

To model the effects of PP II propensities on coil R h, a sampling bias for PP II structure was applied to random coil simulations and the relationship between R h, N, and fractional number of residues in the PP II conformation, f PPII, was determined [22,23]. This is shown in Fig 1 by stippled lines to demonstrate that increases in f PPII cause increases in coil R h. These results were generated from simulations that modeled PP II bias by applying an identical sampling bias for PP II structure at each residue position in a polypeptide chain and, accordingly, did not include effects that could be caused by position-specific variations in PP II propensity.

To test for effects on coil R h owing to PP II propensity variations within a polypeptide chain, conformational ensembles for N = 15, 25, 35, 50, and 75 were generated for poly-ALA with the algorithm modified to allow position-specific sampling rates for PP II structure. It was shown previously that the effects of N on R h were mostly insensitive to amino acid sequence in HSC model simulations of disordered proteins [22] and thus poly-ALA was chosen as a computational simplification. Variations in PP II propensity among residue positions were simulated by applying a sampling bias for PP II structure (S PPII) at every position, every second position, every third position, every fourth position, or every fifth position in the poly-ALA chains. S PPII at values of 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, and 0.9 were tested at the indicated residue locations. This PP II sampling strategy resulted in 225 separate simulated ensembles (5 N lengths X 5 patterns X 9 S PPII values).

A set of simulations using randomly determined position-specific bias for PP II structure was also modeled using poly-ALA chains. These additional simulations used N = 15, 25, and 35, with each residue position assigned a different random value for S PPII. Position-specific random assignments were repeated 3 times for S PPII ranging from 0 to 1, 0 to 0.5, 0.25 to 0.75, and 0.5 to 1, resulting in an additional 36 simulated ensembles (3 N lengths X 3 distributions of random position-specific PP II biases X 4 applied ranges in PP II sampling bias).

The ensemble-averaged fractional number of residues in the PP II conformation (i.e., the propensity) can be different from S PPII in these simulations since randomly generated structures containing van der Waals contact violations are removed from the calculation. Differences between the applied sampling rate (i.e., S PPII) and the observed ensemble-averaged rate (i.e., f PPII) at S PPII-targeted positions followed the same Gaussian relationship that was established previously for whole-chain S PPII and f PPII comparisons [22] and thus straight-forward conversion between applied and observed bias rates was available (S1 Fig). f PPII determined from simulation for residue positions with no applied S PPII was 0.012 ± 0.004.

Cumulative results from the >250 separate ensemble simulations were analyzed in terms of the power-law scaling relationship given by Eq (1). Previously, we demonstrated that the exponential term, v, was dependent on S PPII while R o was mostly independent of S PPII with an averaged value of 2.16 Å [22]. Fig 2A shows v, determined from ln(R h/2.16)/ln(N), for each simulated ensemble and plotted as a function of f PPII calculated for the whole chain. R h for each simulated ensemble was calculated as,

Rh=L/2, (3)

and f PPII,chain as,

fPPII,chain=NPPII/N (4)

Fig 2. Simulated effect of PP II propensities on coil R h.

Fig 2

Each circle and square represents a simulated disordered polypeptide. Squares are from ensembles simulated with position-specific PP II propensities assigned randomly; circles had PP II propensity assignments that followed the sequence patterns described in the text. In panel A, f PPII,chain was calculated as <N PPII>/N, where <N PPII> was the ensemble averaged number of residues with (Φ,Ψ) in the PP II region (-75±10, 145±10), and v was calculated as ln(R h/R o)/ln(N) using <L>/2 for R h and 2.16 Å for R o. These data were fit to v = v o + β∙ln(1-f PPII,chain), with v o and β as fit parameters, producing the solid line. In panel B, R h,HSC was calculated as <L>/2. R h,fit was determined from f PPII,chain using R h,fit = (2.16 Å)∙N v and the panel A fit for v. R h,HSC and R h,fit correlation (R2) is provided in the figure.

In Eq (3), <L> = ∑ L iP i, where L i is the maximum Cα-Cα distance calculated for state i, P i is the Boltzmann probability for state i, and the summation was over all states i of an ensemble. In Eq (4), <N PPII> = ∑ N PPII,iP i, where N PPII,i is the number of residues in the PP II conformation for state i. The distinction of “chain” given to f PPII in Eq (4) was provided to limit confusion between f PPII calculated for a whole chain versus f PPII calculated for specific residue positions.

The relationship between v and f PPII,chain for all simulations followed a logarithmic trend that was fit to the equation,

v(fPPII,chain)=vo+βln(1fPPII,chain), (5)

using the Levenberg-Marquardt method of nonlinear least squares [51,52]. The parameters v o and β were found to be 0.503 ± 0.002 and -0.11 ± 0.003, respectively. Fig 2B shows that R h determined from f PPII,chain (Eq (4)) and N by combining Eqs (1) and (5) (see Eq (6) below) correlated strongly with R h calculated directly from a simulated ensemble (Eq (3)). All possible patterns of position-specific PP II bias were not tested in our computer trials. Results in Fig 2 predict, however, that in general a quantitative relationship exists for disordered proteins between R h, N, and the ensemble-averaged per-residue chain propensity for PP II structure (f PPII,chain).

Test of model using experimental PP II propensities

Results from HSC model simulations that are summarized in Figs 1 and 2 can be interpreted as an ideal relationship between R h and N that includes the general effects of sterics and PP II propensities but is absent other intrinsic and intramolecular factors. Contributions from Coulombic interaction energies to IDP R h will be discussed below and added to this model. First, the simulation-derived relationship between R h, N, and f PPII,chain is tested by applying experimental PP II propensities to the sequences of IDPs in Fig 1. The identity, sequence, and experimental R h for each IDP are given in Supporting Information (S1 and S2 Tables). This dataset includes 22 IDPs containing 3016 total residue positions. Amino acids represented at rates greater than 0.05 in this dataset were, in rank order and listed by their three letter codes, SER (0.104), GLU (0.100), LEU (0.083), PRO (0.080), ASP (0.074), GLY (0.073), ALA (0.073), THR (0.061), LYS (0.055), GLN (0.053), and VAL (0.053).

Amino acid PP II propensities reported by Kallenbach [17], Creamer [18], and Hilser [19] for disordered proteins are reproduced in Table 1 and were used for testing the relationship,

Rh=2.16N0.5030.11ln(1fPPII,chain) (6)

Table 1. Intrinsic backbone PP II propensities measured in disordered peptides.

Kallenbach [17] Creamer [18] Hilser [19]
host a Ac-G2XG2-NH2 Ac-P3XP3GY-NH2 Ac-VP2XVP2R3Y-NH2
ALA (A) 0.818 0.61 0.37
CYS (C) 0.557 0.55 0.25
ASP (D) 0.552 0.63 0.30
GLU (E) 0.684 0.61 0.42
PHE (F) 0.639 0.58 0.17
GLY (G) - 0.58 0.13
HIS (H) 0.428 0.55 0.20
ILE (I) 0.519 0.50 0.39
LYS (K) 0.581 0.59 0.56
LEU (L) 0.574 0.58 0.24
MET (M) 0.498 0.55 0.36
ASN (N) 0.667 0.55 0.27
PRO (P) - 0.67 1.00
GLN (Q) 0.654 0.66 0.53
ARG (R) 0.638 0.61 0.38
SER (S) 0.774 0.58 0.24
THR (T) 0.553 0.53 0.32
VAL (V) 0.743 0.49 0.39
TRP (W) 0.764 - 0.25
TYR (Y) 0.630 - 0.25
average 0.626 0.58 0.35

asequence of host peptide used to measure PP II propensity at the guest position, X

These propensity scales were chosen since weak correlations are observed among the group (S2 Fig), indicating a potential for yielding different results when each set is used separately with Eq (6) for a given IDP sequence. A physical explanation for the different PP II propensity values reported for the amino acids is not given here (e.g., the reported ALA PP II propensities are very different when compared), other than to note that their measurements used host peptide sequences that were also very different (Table 1). Kallenbach measured PP II propensities in the background of a GLY-rich host peptide, whereas the scale reported by Creamer was determined for positions flanked on both sides by PRO residues. The propensity scale from Hilser was measured for positions located in between PRO and valine (VAL). Other PP II propensity scales were not included in these tests due to similarities to the Kallenbach, Creamer, or Hilser reported values. For example, a PP II propensity scale from Zondlo [53] correlated with the Creamer values (coefficient of determination, R2, gave a correlation of 0.58), likely owing to the use of a host peptide that also flanked the guest position with PRO residues.

Inspection of Table 1 shows that PP II propensities for tryptophan (TRP) and tyrosine (TYR) were not reported by Creamer. For these amino acids, we used the averaged Creamer-reported value calculated from the 18 other amino acids (0.58). In the Hilser set, TRP and TYR had lower than average PP II propensity. In contrast, TRP and TYR had higher than average PP II propensity in the Kallenbach set. Using the Creamer average was a compromise that likely had low significance in our tests since TRP and TYR had very low representation among the IDP sequences; 0.008 and 0.012, respectively. PP II propensities were not reported for PRO and GLY by Kallenbach. Here, we used 1 for PRO since it is generally accepted that PRO has the highest propensity for PP II structure [10,12,1719]. This gave PRO a larger value than ALA (0.818), which was the amino acid with the highest reported propensity in the Kallenbach set. GLY was given a propensity of 0.50, which is lower than the Kallenbach average (0.626) but higher than the lowest value (0.428). This also was a compromise from observing that GLY had the lowest value in the Hilser set (0.13), but an average value in the Creamer set (0.58).

f PPII,chain was calculated for each IDP by using the amino acid PP II propensity given in Table 1, summing over the IDP sequence, and dividing by N. Fig 3A shows the experimental scales predict different chain propensities for PP II structure for each IDP sequence. The scale from Kallenbach gave f PPII,chain ranging from 0.746 to 0.628, whereas the Creamer and Hilser scales gave f PPII,chain from 0.609 to 0.579 and 0.489 to 0.283, respectively. Eq (6) was then used to predict R h from f PPII,chain for comparison to experimentally observed R h, which is shown in Fig 3B. The average prediction error (|R h,predictedR h,observed|) and the correlation between predicted and observed R h is given in Table 2. To assess contributions from the amino acid scales for predicting R h, a null model was included by assigning each amino acid the PP II propensity of 0.012, the background f PPII calculated from HSC simulations when no sampling bias for PP II structure was applied (i.e., S PPII = 0). Accordingly, the null model represents random coil values.

Fig 3. Chain propensity for PP II from experimental scales and comparison of predicted and observed R h.

Fig 3

Panel A gives f PPII,chain for each IDP sequence, ordered left to right to show the range obtained with each scale, calculated using experimental PP II propensities from Kallenbach (red triangles), Creamer (blue squares), and Hilser (open circles). X is f PPII,chain from the null model. Panel B shows R h predicted for each IDP using Eq (6) and f PPII,chain from panel A. Symbols in panel B match panel A representations. Black dots show R h predicted from the composite propensity scale. Stippled line is the identity line.

Table 2. Comparison of predicted and observed R h.

Propensity Scale Average Error (Å) a R2 b Average Normalized Error c R2 d
Null (random coil) 7.1 ± 3.7 0.797 -0.28 ± 0.13 0.265
Kallenbach 13.4 ± 5.4 0.819 0.51 ± 0.15 0.301
Creamer 8.4 ± 4.3 0.817 0.32 ± 0.13 0.297
Hilser 2.5 ± 1.8 0.825 0.006 ± 0.12 0.407
Composite 2.4 ± 1.8 0.834 0.015 ± 0.12 0.423
Static 2.6 ± 2.0 0.799 -0.016 ± 0.13 0.291

adetermined from |predicted R h—observed R h|

bcoefficient of determination, correlation of predicted R h and observed R h

cdetermined from (predicted R h—observed R h)/(random coil R h)

dcoefficient of determination, correlation of normalized error and net charge density

Different values of f PPII,chain predict different R h for a given IDP sequence, as expected from Eq (6). For example, the null model, which used the smallest f PPII,chain values, predict R h that are smaller than observed for each IDP. In contrast, PP II propensities from Kallenbach and Creamer, which report relatively large f PPII,chain values, predict R h that are larger than observed for each IDP. Experimental propensities from Hilser predict R h that trend with the identity line, showing good agreement, but also showing scatter relative to that line (average error was 2.5 Å). In an attempt to reduce prediction error, a composite PP II propensity scale that used the Hilser values by default but the Kallenbach values for residues located between GLY (i.e., GLY-X-GLY) and Creamer values for residues located between PRO (i.e., PRO-X-PRO) was tested. This context-specific composite propensity scale (identified as “Composite” in Table 2 and Fig 3B) caused only small changes in predicted R h, with no significant improvement in prediction capabilities relative to using only the Hilser reported PP II propensities.

Since R h increases with N (Fig 1), prediction error was normalized for peptide length by,

normalized error = (predicted Rh  observed Rh)/(random coil Rh) (7)

Random coil R h was calculated using Eq (6) with f PPII,chain = 0.012, the null model value. Average normalized error is given in Table 2 for each propensity scale. Fig 4 shows trends in the normalized error with N and net charge density, determined as the absolute net charge normalized for peptide length,

net charge density = |Q|/(random coil Rh) (8)

Fig 4. Correlation of normalized error in predicted R h to N and net charge density.

Fig 4

Normalized error and net charge density were calculated for each IDP using Eqs (7) and (8), respectively. In both panels, red triangles show normalized error from R h predicted using the Kallenbach reported propensities, blue squares from Creamer reported propensities, open circles from Hilser reported propensities, black dots from the composite propensity scale, and X is the null model. Lines are linear fits to the five prediction sets colored as the symbols (Kallenbach scale was red; Creamer was blue, Hilser was stippled black, composite was solid black, and null was dotted black).

S1 Table gives net charge and N for each IDP. No obvious bias with peptide length (i.e., N) was observed in the normalized error for the Hilser and composite propensity scales. Normalized error clearly increased with N when using Kallenbach and Creamer values, indicating that these PP II propensities may be over-estimated when applied to IDP sequences to predict R h. Since the exponent in Eq (6) becomes larger with increasing f PPII,chain, a set of propensity values that systematically are too large would cause normalized errors that increase with N.

It is interesting to note that normalized error correlated with net charge density for each experimental propensity scale (Fig 4B and Table 2), suggesting that prediction error was caused partially by charge effects on R h that were not included in the model. This is not surprising since Marsh and Forman-Kay demonstrated that increases in net charge correlate with increases in IDP R h [49] and the trend we observed of decreasing normalized error with increased net charge density is consistent with their conclusions. Extrapolating this trend to zero net charge density for the Hilser and composite propensity scales yields positive normalized errors suggesting that, in the background of no net charge contributions to R h, the PP II propensities reported by Hilser may also be slightly too large when using Eq (6) to predict R h.

While this analysis of experimental PP II propensities indicated that one of the scales was capable of reproducing experimental R h with good agreement for a set of IDPs, it is important to recognize that comparative tests based on Eq (6) may not be suitable for affirmation. Since R h in this model depends only on N and chain averaged propensity for PP II structure, contrived scales that predict IDP R h with similar agreement in terms of the average prediction error are simple to generate. For example, each IDP could be given a sequence-independent f PPII,chain value of 0.364, which was determined by converting experimental R h to an apparent f PPII,chain using Eq (6) and then averaging over the IDP dataset. Using this static f PPII,chain to predict IDP R h gives an average prediction error (identified as “Static” in Table 2) that is close to the error obtained when using the experimental scale from Hilser. Correlations between predicted and observed R h and between normalized error and net charge density for the contrived static scale, however, decreased relative to the correlations that were observed with the experimental scales, suggesting that static representations of f PPII,chain may not fully capture some molecular dependencies that are inherent to IDP R h.

To further investigate the capabilities of Eq (6) for relating IDP R h and PP II propensity, random sets of amino acid scales were generated following a two-step protocol and analyzed. First, a random number between 0 and 1 was used to target an average propensity for a scale. Then, random scales were generated, where each amino acid was assigned a different random value between 0 and 1, until a set was found whose average for the 20 amino acids matched the target determined in the first step (±0.05). The goal from using two steps to generate scales was to ensure that chain averaged propensities in the high, medium, and low range were evenly sampled. This sampling scheme was repeated until 100,000 random scales were generated. Each propensity scale was then used to predict R h from Eq (6) and the results are summarized in Fig 5. It was observed that randomly generated scales gave average prediction errors for the IDP dataset ranging from 1.9 to 239.8 Å, correlations between predicted and observed R h ranging from 0.02 to 0.88, and correlations between normalized error and net charge density from 0 to 0.81. Optimal values for these metrics (i.e., highest correlations coupled with lowest average error), seem to focus toward values of R2 and average error that are obtained when using experimental PP II propensities from Hilser. This result shows that experimental R h of the IDP dataset are in good qualitative agreement with experimental PP II propensities reported by Hilser, and vice versa, giving evidence that the molecular properties of IDPs that link R h, N, and f PPII,chain are well-approximated by the simple power-law scaling relationship of Eq (6).

Fig 5. R h prediction from random PP II propensity scales.

Fig 5

Random scales were generated as described in the text and used to predict R h for each IDP by Eq (6). Shown is the correlation (R2) obtained for each scale between observed and predicted R h plotted against the correlation obtained between the normalized error (n. error) and the net charge density (ncd). Shown by color is the average prediction error of each scale. Random scales giving average prediction error larger than 75 Å were omitted to emphasize differences at lower error values.

Effects of Coulombic interaction energies on R h

In the HSC model used for this study, a computer algorithm generates polypeptide structures by random conformational search until R h (Eq (3)) converges to a stable ensemble-averaged value [22]. A structure-based energy function parameterized to solvent-accessible surface areas that has been tested extensively [5462] is used to population-weight each randomly generated structure. To approximate charge effects on ensemble populations, the energy function was modified to include Coulombic interaction energies by,

ΔGCoulomb=332DH2O2iZi(jZjRijeκRij), (9)

where the constant 332 converts the energy into units of kilocalories per mole at 25°C, D H2O is the dielectric of water, Z is the charge at site i or j, R ij is the distance between two charged sites i and j (in Å), κ (the Debye parameter) accounts for screening from solution ionic strength, and the sums are over all charge-bearing sites. The Debye parameter was calculated as,

κ=2.913I/DH2O, (10)

where I is ionic strength (in molarity, M). D H2O used was 78.3 [63] and I was 0.1 M to represent normal conditions. Since the simulations used poly-ALA chains, charged residues were modeled with a positive or negative charge located at the coordinates of the Cβ atom to denote the approximate location for flexible and charged side chains. Coordinates for the backbone N and O atoms of the first and last residues were used to assign positive and negative charge, respectively, to N- and C-termini. Simulations were limited to 25 residue poly-ALA chains to establish trends for the effects of charge on R h in this model. For each ensemble, an identical S PPII was applied at each residue position. S PPII was varied among the different simulations to target ensemble-averaged f PPII,chain ranging from 0.1 to 0.92.

Fig 6A shows that introducing charge at N- and C-termini had no effect on simulated R h for poly-ALA chains. Modeling negative charge at the Cβ position of each residue, or positive charge (S3 Fig), caused large increases in R h from repulsive electrostatic intramolecular interactions. Identical charge at every other residue position caused smaller increases in R h, while identical charge at every third position gave R h that were mostly similar to R h of poly-ALA modeled with no charges. These data predict that the effects of charge on IDP R h should weaken as charged residues separate in sequence, as expected. Fig 6B shows the ensemble-averaged distance between “charged” Cβ atoms that were closest in sequence for each ensemble in panel A, indicating repulsive charge-charge interactions at distances ≥9 Å had only minor effects on R h. The Debye length for the modeled conditions (i.e., 1/κ) was 9.6 Å, which is the distance where interactions between charged groups become negligible at a given ionic strength. The simulation results thus trend with expected outcomes for fully solvated charges. It was also observed that, for polypeptides with each residue position charged, f PPII,chain calculated for an ensemble was larger than expected based upon the applied S PPII (Fig 6A inset). This result predicts that repulsive charge-charge interactions between side chain groups preferentially select for the extended PP II structure to minimize unfavorable interaction energies.

Fig 6. Simulated effect of charged residues on R h.

Fig 6

In panel A, the stippled line is R h from Eq (6) with N = 25 and f PPII,chain = 0–0.98. Plotted symbols are R h from poly-ALA simulations (N = 25) calculated using Eq (3). Open squares are uncharged poly-ALA and open circles have charged termini. Filled circles have each residue modeled with negative charge at the Cβ atom. Filled squares have every other residue modeled with negative charge, filled triangles have every third residue with negative charge, and X is every fourth residue with negative charge. In panel B, <R ij> is the ensemble averaged distance (in Å) between Cβ atoms from two charged residues, i and j, closest in sequence. Panel B symbols match panel A representations. A inset: comparison of observed f PPII,chain (shown as obs f PPII) to f PPII,chain expected from the applied S PPII (shown as applied f PPII; calculated as f PPII = S PPII 0.062∙exp(-(S PPII-0.63)2/(2∙0.282)) [22]. Note that filled circles trend higher than other plotted data. Inset symbols match panel representations.

To test the effects of clusters of charge on R h, polypeptides with patterns consisting of three consecutively charged residues were also simulated (Fig 7). Similar trends were observed, whereby the effects of charge on R h weaken as charged groups (i.e., clusters) were separated in sequence. Charge clusters, however, affected R h when modeled with 4 intervening non-charged residues, with weaker effects persisting at even larger separation distances between the clusters. This contrasts with the simulation results for non-clustered charged residues that exhibited negligible effects on R h when charges were separated by as little as 2 intervening uncharged residue positions (Fig 6A).

Fig 7. Simulated effect of clusters of charged residues on R h.

Fig 7

Filled circles, open circles, open squares, and the stippled line were reproduced from Fig 6A. As in Fig 6A, R h was calculated from poly-ALA simulations with N = 25. A charge cluster was defined as three consecutive residues with negative charge modeled at the Cβ atoms. Charge clusters separated in sequence by two uncharged residues (no charge modeled at Cβ) are shown with filled squares whereas charge clusters separated by four uncharged residues are shown with filled triangles. X and + symbols represent charge clusters separated by six and eight uncharged residues, respectively. Inset: comparison of observed f PPII,chain to f PPII,chain expected from the applied S PPII (following Fig 6A inset description). Inset symbols match panel representations.

Since IDPs, in general, contain both positive and negative charges, simulations with opposite charge at adjacent residue positions were also performed. Fig 8A shows that repeating patterns of opposite charge had minimal effects on R h in these simulations, even when each residue position was charged. This was mostly the case for charge clusters too (Fig 8B) with the exception that the simulation would sporadically generate ensembles with compacted R h, whereby “compacted” is used to indicate R h smaller than what was observed for non-charged poly-ALA coils of identical N. Overall, the amount of R h compaction owing to favorable interactions between oppositely charged residues (or clusters) was small when compared to increases in R h that were observed owing to unfavorable interactions between identically charged residues (or clusters).

Fig 8. Simulated effect on R h from oppositely charged residues.

Fig 8

Stippled line in each panel was reproduced from Fig 6A. As in Fig 6A, R h was calculated from poly-ALA simulations with N = 25. Charge was modeled with opposite charge at adjacent residue positions (panel A) or adjacent clusters (panel B). In panel A, filled circles have each residue modeled with charge at the Cβ atom (first residue negative, second residue positive, third residue negative, etc.). Filled squares have every other residue modeled with charge (first residue negative, third residue positive, etc.), filled triangles have every third residue modeled with charge, and X represents every fourth residue modeled with charge. In panel B, each residue in a cluster had identical charge while clusters adjacent in sequence had opposite charge. Filled circles are poly-ALA with every residue charged (i.e., residues 1–3 having negative charge, residues 4–6 with positive charge, residues 7–9 with negative charge, etc.). Charge clusters separated in sequence by two uncharged residues are shown with filled squares (i.e., residue 1–3 with negative charge, residues 4–5 uncharged, residues 6–8 with positive charge, etc.) whereas charge clusters separated by four uncharged residues are shown by filled triangles. X and + symbols represent charge clusters separated by six and eight uncharged residues, respectively. Insets: comparison of observed f PPII,chain to f PPII,chain expected from the applied S PPII (following Fig 6A inset description). Inset symbols match panel representations.

The results in Figs 68 from modeling charge effects on R h indicate that, in general, the strongest effects on R h should occur owing to identical charges at sequentially-adjacent residue positions (Figs 6 and 7) and for polypeptides with the least amount of mixing of positive and negative charge types (Fig 8). To test these two general observations, the IDP dataset was analyzed to determine the net number of adjacent charges in each IDP sequence. This was calculated by first summing the number of ASP residues that had GLU or ASP immediately next or prior in sequence with the number of GLU residues that had GLU or ASP immediately next or prior in sequence to determine the total number of negative charges with an adjacent negatively charged neighbor. A similar calculation was performed using LYS and ARG to determine the number of positive charges with an adjacent positively charged neighbor. The net number of adjacent charges for an IDP was then the absolute value in the difference between the positive and negative adjacent charge numbers (provided in S1 Table). Fig 9A shows that normalized error in predicted R h for the IDP dataset trends with the net adjacent charge density (i.e., net adjacent charge normalized for peptide length), similar to the correlation that was observed between normalized error and net charge density (Fig 4B). This should be expected since net charge and net adjacent charge correlate with R2 = 0.64 in the dataset.

Fig 9. Correlation of normalized error in predicted R h to net adjacent charge density.

Fig 9

Panel A symbols and lines match their Fig 4 representations. Panel B shows correlations (R2) between normalized error and net adjacent charge density for all IDPs, IDPs in the high charge bias group (labeled as “high bias”), and IDPs in the low charge bias group (labeled as “low bias”). Red columns are correlations from using the Kallenbach propensity scale to predict R h, blue from using the Creamer propensities, white the Hilser propensities, and black the composite propensity scale.

The set of IDPs was also split according to the amount of mixing of positive and negative charge types in a given sequence. To do this, a “charge bias” was calculated for each IDP as the simple ratio of total negative charges (sum of ASP and GLU residues) to total positive charges (sum of LYS and ARG residues), or vice versa, depending on which ratio gave a value greater than 1. As a metric for separating IDPs with “high” and “low” charge bias, a “typical” charge bias was calculated for the entire dataset by the concatenated sequence and found to be 1.9. The average IDP charge bias, found to be 4.2, was not used to separate IDPs since: 1) ratio-based distributions are skewed, 2) only 7 IDPs would have been in the “high” charge bias set, and 3) 4 of these 7 were sequences derived from the p53 protein. Using the charge bias of the concatenated sequence gave 12 IDPs in the high charge bias set and 10 IDPs in the low charge bias set.

Fig 9B shows that correlations between net adjacent charge density and normalized error in predicted R h persisted in the set of IDPs with high charge bias and mostly disappeared for IDPs with low charge bias, seeming to agree with the simulation prediction that significant mixing of positive and negative charge types in a sequence should reduce charge effects on R h. Applying this analysis to net charge density gave different results (S4 Fig). Correlations between net charge density and normalized error in predicted R h decreased for both the high and low charge bias sets. This could be owing to trends shown in Fig 6, whereby net charge effects on R h depended strongly on the distance between the charged groups. Overall, these results seem to indicate that charge effects on IDP structures are highly dependent on sequence, however, charge effects on R h can be weakened substantially by mixing negative and positive charge types or by slight increases in the distances between charged groups in sequence. The hypothesis that charge effects on R h may be generally weak for IDPs is supported by data in Fig 3B showing that R h could be predicted without specific consideration of charges when provided an appropriate amino acid scale for intrinsic PP II propensities.

Discussion

Fig 1 shows that experimental R h for IDPs are much larger than computational predictions based on random coil modeling of the R h dependence on N. Numerous studies have demonstrated the importance of Coulombic effects for regulating IDP structural preferences [1315]. Thus, it could be surprising to note that sequence effects on IDP R h can be predicted with good agreement from sequence differences in PP II propensity, even when other intramolecular factors are ignored. R h predicted from IDP sequence and Eq (6) seemed to work best when using an experimental PP II propensity scale from Hilser and colleagues [19], or a composite scale that combined the Hilser, Kallenbach [17], and Creamer [18] propensities, giving an average error of ~2.5 Å for an IDP dataset covering a wide range of residue lengths, net charge, and sequence composition. As examples of sequence differences in this dataset, the fractional number of PRO residues (f PRO = (# PRO residues)/N) varied from 0 to 0.24, SER from 0.02 to 0.20, GLU from 0.06 to 0.31, and ALA from 0 to 0.16, indicating significant sequence diversity among the IDPs that were tested.

If it were established that molecular descriptions for R h depend mostly on PP II propensities for disordered proteins, this would have important implications. First, R h well-above random coil estimates would indicate non-trivial preferences for PP II structure. Fig 1 shows this to be the case for many IDPs. And second, large variations in R h for IDPs with similar N would indicate large differences in propensity for PP II structure among the biologically common amino acids. Observed differences in amino acid propensity for PP II [1719,53] are thus consistent with the observed differences in R h for IDPs with similar N. For example, consider that R h varied from 24.5 Å to 32.4 Å for IDPs with N = 87–97 in Fig 1. The average prediction error in R h for these 8 IDPs from using Eq (6) and the composite propensity scale was only 1.7 ± 0.7 Å, though net charge ranged from 4 to 29 for these proteins. In contrast, predictions using random coil values give R h from 20.5 to 21.7 Å with an average error of 6.4 ± 2.7 Å.

The simulation-derived relationship between R h, N, and f PPII,chain appears to be surprisingly simple for disordered proteins. As noted above, Eq (6) should be interpreted as an ideal relationship that excludes many molecular factors known to regulate structural preferences in proteins (e.g., electrostatic effects, cis-trans isomerization rates). Observed deviations from this “ideal” behavior can then be interpreted in terms of factors that were not modeled, as shown (Fig 4B). We recognize that exclusive use of poly-ALA for computational modeling may prove to be unjustified with further studies. Poly-ALA was used as a simplifying step since the effects of N on R h were mostly independent of amino acid sequence in previous HSC-based simulations and agreed with general IDP trends determined from a literature survey [22,49]. As shown here, this simulation-derived relationship provides a straight-forward molecular explanation for R h variations among IDPs. The R h dependence on f PPII,chain also predicts heat-induced compaction of IDP R h since the enthalpy of unfolding PP II structure is positive [16,64]. Many studies have demonstrated R h compaction caused by elevated temperatures for IDPs [22,43,44].

As noted above, the simulation results presented here could be interpreted as indicating that charge effects on R h are generally weak for IDPs, relative to the effects of intrinsic PP II propensities. These data demonstrate, however, that certain sequence patterns of charge can modulate R h substantially (see Fig 6). For charged groups, this would be those that are separated at distances averaging less than the solution Debye length, involving identical charge type (i.e., positive or negative), and within a region showing higher than typical charge bias. These general rules are in qualitative agreement with results from Pappu and colleagues showing that simulated hydrodynamic sizes for highly charged and disordered polypeptides, with every residue modeled as GLU or LYS, depend strongly on the mixing of negative and positive charge types [15]. In that study, mixing of charge types in a sequence caused structural compaction relative to biased charge distributions, similar to our own conclusions. The observation that unfavorable charge-charge interactions between side chain groups can promote PP II structure (Figs 6A and 7 insets) has also been noticed in computational studies from other researchers [14,65]. This result predicts multiple mechanisms for charge-mediated regulation of IDP structure; possibly owing to both the accumulation of charge and local modulation of PP II propensities. Overall, these data demonstrate the importance of sequence context for understanding the structural properties of IDPs and for describing quantitatively how disordered protein structures respond to discrete perturbations such as changes in charge state and amino acid substitutions.

Methods

Computer generation of polypeptide structures

Detailed description of the computer algorithm that was used is provided elsewhere [22,24]. Briefly, simulations of disordered protein structures were limited to poly-ALA polypeptides. Main chain atoms of poly-ALA were generated using the standard bond angles and bond lengths [66] and a random sampling of the dihedral angles Φ, Ψ, and ω. The dihedral angle ω was given a Gaussian fluctuation of ±5° around the trans value of 180°. To sample conformational space efficiently, (Φ,Ψ) values were restricted to the allowed Ramachandran regions [67]. Of the two possible positions of the side chain Cβ atom, the one corresponding to L-alanine was used throughout the studies. To calculate state distributions typical of protein ensembles, a structure-based energy function parameterized to solvent-accessible surface areas was used to population-weight the generated structures [5462].

Supporting Information

S1 Fig. Comparison of f PPII and S PPII.

In this figure, S PPII is the average applied sampling rate for PP II for residues with S PPII ≠ 0 in a simulation, while f PPII was the observed per-position average PP II rate, also excluding residues with S PPII = 0. Open circles are from ensembles where position-specific S PPII followed the pattern specified in the text (i.e., different simulations had different S PPII ranging from 0.1 to 0.9 in 0.1 increments applied to each residue, every other residue, every third residue, etc.) which is why circles align at S PPII = 0.1–0.9 in 0.1 increments. Blue circles give the average f PPII for each applied S PPII. Open squares represent this calculation performed on simulations using randomly assigned position-specific S PPII. Stippled line is the identity; solid line is the relationship between f PPII and S PPII established previously for S PPII applied at constant values across all residues [22]. In general, f PPII trends with S PPII by: f PPII = S PPII-0.062∙exp(-(S PPII-0.63)2/(2∙0.282)). This gives the algorithm the ability to target specific f PPII from the applied value of S PPII.

(TIF)

S2 Fig. Correlation of experimental PP II propensities for the common amino acids.

Panel A, correlation of Kallenbach [17] and Creamer reported values [18]. Panel B, correlation of Kallenbach and Hilser reported values [19]. Panel C, correlation of Creamer and Hilser reported values. Panel D, correlation of Creamer and Zondlo reported values [53].

(TIF)

S3 Fig. Simulated effect of positive charged residues on R h.

Stippled line is R h from Eq (6) with N = 25 and f PPII,chain from 0 to 0.98. Symbols are simulated R h from ensembles of poly-ALA (N = 25) using Eq (3) (R h = <L>/2). Filled circles have each residue modeled with positive charge at the Cβ atom. Filled squares have every other residue modeled with positive charge, filled triangles have every third residue modeled with positive charge, and X represents every fourth residue modeled with positive charge. Inset: comparison of observed f PPII,chain to f PPII,chain expected from the applied S PPII (following Fig 6A inset description). Inset symbols match panel representations.

(TIF)

S4 Fig. Correlation of normalized error in predicted R h to net charge density.

Shown are correlations (R2) between normalized error and net charge density for all IDPs, IDPs in the high charge bias group (labeled as “high bias”), and IDPs in the low charge bias group (labeled as “low bias”). Red columns are correlations from using the Kallenbach propensity scale to predict R h, blue from using the Creamer propensities, white the Hilser propensities, and black the composite propensity scale.

(TIF)

S1 Table. IDP dataset.

(DOCX)

S2 Table. Sequence of each IDP in dataset.

(DOCX)

Data Availability

All relevant data are within the paper and its Supporting Information files.

Funding Statement

Research reported in this publication was supported by the National Institute of General Medical Sciences of the National Institutes of Health under award R15GM115603 and by the Division of Materials Research of the National Science Foundation under award DMR-1205670. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1. Dunker AK, Obradovic Z, Romero P, Garner EC, Brown CJ. Intrinsic protein disorder in complete genomes. Genome Inf Ser. 2000;11: 161–171. [PubMed] [Google Scholar]
  • 2. Radivojac P, Iakoucheva LM, Oldfield CJ, Obradovic Z, Uversky VN, Dunker AK. Intrinsic disorder and functional proteomics. Biophys J. 2007;92: 1439–1456. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Ward JJ, Sodhi JS, McGuffin LJ, Buxton BF, Jones DT. Prediction and functional analysis of native disorder in proteins from the three kingdoms of life. J Mol Biol. 2004;337: 635–645. [DOI] [PubMed] [Google Scholar]
  • 4. van der Lee R, Buljan M, Lang B, Weatheritt RJ, Daughdrill GW, Dunker AK, et al. Classification of intrinsically disordered regions and proteins. Chem Rev. 2014;114: 6589–6631. 10.1021/cr400525m [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Iakoucheva LM, Radivojac P, Brown CJ, O'Connor TR, Sikes JG, Obradovic Z, Dunker AK. The importance of intrinsic disorder for protein phosphorylation. Nucleic Acids Res. 2004;32: 1037–1049. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Uversky VN, Davé V, Iakoucheva LM, Malaney P, Metallo SJ, Pathak RR, Joerger AC. Pathological unfoldomics of uncontrolled chaos: intrinsically disordered proteins and human diseases. Chem Rev. 2014;114: 6844–6879. 10.1021/cr400713r [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Wright PE, Dyson HJ. Intrinsically disordered proteins in cellular signaling and regulation. Nat Rev Mol Cell Biol. 2015;16: 18–29. 10.1038/nrm3920 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Babu MM, van der Lee R, de Groot NS, Gsponer J. Intrinsically disordered proteins: regulation and disease. Curr Opin Struct Biol. 2011;21: 432–440. 10.1016/j.sbi.2011.03.011 [DOI] [PubMed] [Google Scholar]
  • 9. Iakoucheva LM, Brown CJ, Lawson JD, Obradović Z, Dunker AK. Intrinsic disorder in cell-signaling and cancer-associated proteins. J Mol Biol. 2002;323: 573–584. [DOI] [PubMed] [Google Scholar]
  • 10. Shi Z, Chen K, Liu Z, Kallenbach NR. Conformation of the backbone in unfolded proteins. Chem Rev. 2006;106: 1877–1897. [DOI] [PubMed] [Google Scholar]
  • 11. Shi Z, Olson CA, Rose GD, Baldwin RL, Kallenbach NR. Polyproline II structure in a sequence of seven alanine residues. Proc Natl Acad Sci USA. 2002;99: 9190–9195. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Schweitzer-Stenner R. Conformational propensities and residual structures in unfolded peptides and proteins. Mol BioSyst. 2012;8: 122–133. 10.1039/c1mb05225j [DOI] [PubMed] [Google Scholar]
  • 13. Müller-Späth S, Soranno A, Hirschfeld V, Hofmann H, Rüegger S, Reymond L, Nettels D, Schuler B. Charge interactions can dominate the dimensions of intrinsically disordered proteins. Proc Natl Acad Sci USA. 2010;107: 14609–14614. 10.1073/pnas.1001743107 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Mao AH, Crick SL, Vitalis A, Chicoine CL, Pappu RV. Net charge per residue modulates conformational ensembles of intrinsically disordered proteins. Proc Natl Acad Sci USA. 2010;107: 8183–8188. 10.1073/pnas.0911107107 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Das RK, Pappu RV. Conformations of intrinsically disordered proteins are influenced by linear sequence distributions of oppositely charged residues. Proc Natl Acad Sci USA. 2013;110: 13392–13397. 10.1073/pnas.1304749110 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Chen K, Liu Z, Kallenbach NR. The polyproline II conformation in short alanine peptides is noncooperative. Proc Natl Acad Sci USA. 2004;101: 15352–15357. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Shi Z, Chen K, Liu Z, Ng A, Bracken WC, Kallenbach NR. Polyproline II propensities from GGXGG peptides reveal an anticorrelation with beta-sheet scales. Proc Natl Acad Sci USA. 2005;102: 17964–17968. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Rucker AL, Pager CT, Campbell MN, Qualls JE, Creamer TP. Host-guest scale of left-handed polyproline II helix formation. Proteins 2003;53: 68–75. [DOI] [PubMed] [Google Scholar]
  • 19. Elam WA, Schrank TP, Campagnolo AJ, Hilser VJ. Evolutionary conservation of the polyproline II conformation surrounding intrinsically disordered phosphorylation sites. Protein Sci. 2013;22: 405–417. 10.1002/pro.2217 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Cho JH, Sato S, Horng JC, Anil B, Raleigh DP. Electrostatic interactions in the denatured state ensemble: their effect upon protein folding and protein stability. Arch Biochem Biophys. 2008;469:20–28. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Cho JH, Raleigh DP. Mutational analysis demonstrates that specific electrostatic interactions can play a key role in the denatured state ensemble of proteins. J Mol Biol. 2005;353:174–185. [DOI] [PubMed] [Google Scholar]
  • 22. Langridge TD, Tarver MJ, Whitten ST. Temperature effects on the hydrodynamic radius of the intrinsically disordered N-terminal region of the p53 protein. Proteins 2014;82: 668–678. 10.1002/prot.24449 [DOI] [PubMed] [Google Scholar]
  • 23. Perez RB, Tischer A, Auton M, Whitten ST. Alanine and proline content modulate global sensitivity to discrete perturbations in disordered proteins. Proteins 2014;82: 3373–3384. 10.1002/prot.24692 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Whitten ST, Yang HW, Fox RO, Hilser VJ. Exploring the impact of polyproline II (PII) conformational bias on the binding of peptides to the SEM-5 SH3 domain. Protein Sci. 2008;17: 1200–1211. 10.1110/ps.033647.107 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Richards FM. Areas, volumes, packing, and protein structure. Annu Rev Biophys Bioeng. 1977;6: 151–176. [DOI] [PubMed] [Google Scholar]
  • 26. Lowry DF, Stancik A, Shrestha RM, Daughdrill GW. Modeling the accessible conformations of the intrinsically unstructured transactivation domain of p53. Proteins 2008;71: 587–598. [DOI] [PubMed] [Google Scholar]
  • 27. Donaldson L, Capone JP. Purification and characterization of the carboxyl-terminal transactivation domain of Vmw65 from herpes simplex virus type 1. J Biol Chem. 1992;267: 1411–1414. [PubMed] [Google Scholar]
  • 28. Sivakolundu SG, Nourse A, Moshiach S, Bothner B, Ashley C, Satumba J, Lahti J, Kriwacki RW. Intrinsically unstructured domains of Arf and Hdm2 form bimolecular oligomeric structures in vitro and in vivo. J Mol Biol. 2008;384: 240–254. 10.1016/j.jmb.2008.09.019 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Yi S, Boys BL, Brickenden A, Konermann L, Choy WY. Effects of zinc binding on the structure and dynamics of the intrinsically disordered protein prothymosin alpha: evidence for metalation as an entropic switch. Biochemistry 2007;46: 13120–13130. [DOI] [PubMed] [Google Scholar]
  • 30. Sanchez-Puig N, Veprintsev DB, Fersht AR. Binding of natively unfolded HIF-1alpha ODD domain to p53. Mol Cell. 2005;17: 11–21. [DOI] [PubMed] [Google Scholar]
  • 31. Campbell KM, Terrell AR, Laybourn PJ, Lumb KJ. Intrinsic structural disorder of the C-terminal activation domain from the bZIP transcription factor Fos. Biochemistry 2000;39: 2708–2713. [DOI] [PubMed] [Google Scholar]
  • 32. Geething NC, Spudich JA. Identification of a minimal myosin Va binding site within an intrinsically unstructured domain of melanophilin. J Biol Chem. 2007;282: 21518–21528. [DOI] [PubMed] [Google Scholar]
  • 33. Soragni A, Zambelli B, Mukrasch MD, Biernat J, Jeganathan S, Griesinger C, Ciurli S, Mandelkow E, Zweckstetter M. Structural characterization of binding of Cu(II) to tau protein. Biochemistry 2008;47: 10841–10851. 10.1021/bi8008856 [DOI] [PubMed] [Google Scholar]
  • 34. Adkins JN, Lumb KJ. Intrinsic structural disorder and sequence features of the cell cycle inhibitor p57Kip2. Proteins 2002;46: 1–7. [DOI] [PubMed] [Google Scholar]
  • 35. Uversky VN, Permyakov SE, Zagranichny VE, Rodionov IL, Fink AL, Cherskaya AM, Wasserman LA, Permyakov EA. Effect of zinc and temperature on the conformation of the gamma subunit of retinal phosphodiesterase: a natively unfolded protein. J Proteome Res. 2002;1: 149–159. [DOI] [PubMed] [Google Scholar]
  • 36. Haaning S, Radutoiu S, Hoffmann SV, Dittmer J, Giehm L, Otzen DE, Stougaard J. An unusual intrinsically disordered protein from the model legume Lotus japonicus stabilizes proteins in vitro. J Biol Chem. 2008;283: 31142–31152. 10.1074/jbc.M805024200 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Permyakov SE, Millett IS, Doniach S, Permyakov EA, Uversky VN. Natively unfolded C-terminal domain of caldesmon remains substantially unstructured after the effective binding to calmodulin. Proteins 2003;53: 855–862. [DOI] [PubMed] [Google Scholar]
  • 38. Paleologou KE, Schmid AW, Rospigliosi CC, Kim HY, Lamberto GR, Fredenburg RA, Lansbury PT Jr, Fernandez CO, Eliezer D, Zweckstetter M, Lashuel HA. Phosphorylation at Ser-129 but not the phosphomimics S129E/D inhibits the fibrillation of alpha-synuclein. J Biol Chem. 2008;283: 16895–16905. 10.1074/jbc.M800747200 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Baker JMR. Structural characterization and interactions of the CFTR regulatory region. Ph.D. Dissertation. University of Toronto. 2009.
  • 40. Choi UB, McCann JJ, Weninger KR, Bowen ME. Beyond the random coil: stochastic conformational switching in intrinsically disordered proteins. Structure 2011;19: 566–576. 10.1016/j.str.2011.01.011 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41. Magidovich E, Orr I, Fass D, Abdu U, Yifrach O. Intrinsic disorder in the C-terminal domain of the Shaker voltage-activated K1 channel modulates its interaction with scaffold proteins. Proc Natl Acad Sci USA. 2007;104: 13022–13027. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42. Sanchez-Puig N, Veprintsev DB, Fersht AR. Human full-length securin is a natively unfolded protein. Protein Sci. 2005;14: 1410–1418. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43. Kjaergaard M, Nørholm AB, Hendus-Altenburger R, Pedersen SF, Poulsen FM, Kragelund BB. Temperature-dependent structural changes in intrinsically disordered proteins: formation of alpha-helices or loss of polyproline II. Protein Sci. 2010;19: 1555–1564. 10.1002/pro.435 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44. Wuttke R, Hofmann H, Nettels D, Borgia MB, Mittal J, Best RB, Schuler B. Temperature-dependent solvation modulates the dimensions of disordered proteins. Proc Natl Acad Sci USA. 2014;111: 5213–5218. 10.1073/pnas.1313006111 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45. Wilkins DK, Grimshaw SB, Receveur V, Dobson CM, Jones JA, Smith LJ. Hydrodynamic radii of native and denatured proteins measured by pulse field gradient NMR techniques. Biochemistry 1999;38: 16424–16431. [DOI] [PubMed] [Google Scholar]
  • 46. Tcherkasskaya O, Davidson EA, Uversky VN. Biophysical constraints for protein structure prediction. J Proteome Res. 2003;2: 37–42. [DOI] [PubMed] [Google Scholar]
  • 47. Fitzkee NC, Rose GD. Reassessing random-coil statistics in unfolded proteins. Proc Natl Acad Sci USA. 2004;101: 12497–12502. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48. Auton M, Ferreon ACM, Bolen DW. Metrics that differentiate the origins of osmolyte effects on protein stability: a test of the surface tension proposal. J Mol Biol. 2006;361:983–992. [DOI] [PubMed] [Google Scholar]
  • 49. Marsh JA, Forman-Kay JD. Sequence determinants of compaction in intrinsically disordered proteins. Biophys J. 2010;98: 2383–2390. 10.1016/j.bpj.2010.02.006 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50. Cowan PM, McGavin S. Structure of poly-L-proline. Nature 1955; 176:501–503. [Google Scholar]
  • 51. Levenberg K. A method for the solution of certain non-linear problems in least squares. Quart Appl Math. 1944;2: 164–168. [Google Scholar]
  • 52. Marquardt DW. An algorithm for least-squares estimation of nonlinear parameters. SIAM J Appl Math. 1963;11: 431–441. [Google Scholar]
  • 53. Brown AM, Zondlo NJ. A propensity scale for type II polyproline helices (PPII): aromatic amino acids in proline-rich sequences strongly disfavor PPII due to proline-aromatic interactions. Biochemistry 2012;51: 5041–5051. 10.1021/bi3002924 [DOI] [PubMed] [Google Scholar]
  • 54. Baldwin RL. Temperature dependence of the hydrophobic interaction in protein folding. Proc Natl Acad Sci USA. 1986;83: 8069–8072. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55. Murphy KP, Freire E. Thermodynamics of structural stability and cooperative folding behavior in proteins. Adv Protein Chem. 1992;43: 313–361. [DOI] [PubMed] [Google Scholar]
  • 56. Murphy KP, Bhakuni V, Xie D, Freire E. Molecular basis of cooperativity in protein folding. III. Structural identification of cooperative folding units and folding intermediates. J Mol Biol. 1992;227: 293–306. [DOI] [PubMed] [Google Scholar]
  • 57. Lee KH, Xie D, Freire E, Amzel LM. Estimation of changes in side chain configurational entropy in binding and folding: General methods and application to helix formation. Proteins 1994;20: 68–84. [DOI] [PubMed] [Google Scholar]
  • 58. Xie D, Freire E. Structure based prediction of protein folding intermediates. J Mol Biol. 1994;242: 62–80. [DOI] [PubMed] [Google Scholar]
  • 59. Gómez J, Hilser VJ, Xie D, Freire E. The heat capacity of proteins. Proteins 1995;22: 404–412. [DOI] [PubMed] [Google Scholar]
  • 60. D’Aquino JA, Gómez J, Hilser VJ, Lee KH, Amzel LM, Freire E. The magnitude of the backbone conformational entropy change in protein folding. Proteins 1996;25: 143–156. [DOI] [PubMed] [Google Scholar]
  • 61. Habermann SM, Murphy KP. Energetics of hydrogen bonding in proteins: A model compound study. Protein Sci. 1996;5: 1229–1239. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62. Luque I, Mayorga OL, Freire E. Structure-based thermodynamic scale of alpha-helix propensities in amino acids. Biochemistry 1996;35: 13681–13688. [DOI] [PubMed] [Google Scholar]
  • 63. Malmberg CG, Maryott AA. Dielectric constant of water from 0 to 100C. J Res Natl Bur Stand. 1956;56: 1–8. [Google Scholar]
  • 64. Hamburger JB, Ferreon JC, Whitten ST, Hilser VJ. Thermodynamic mechanism and consequences of the polyproline II (PII) structural bias in the denatured states of proteins. Biochemistry 2004;43: 9790–9799. [DOI] [PubMed] [Google Scholar]
  • 65. Krimm S, Mark JE. Conformations of polypeptides with ionized side chains of equal length. Proc Natl Acad Sci USA. 1968;60: 1122–1129. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66. Momany FA, McGuire RF, Burgess AW, Scheraga HA. Energy parameters in polypeptides. VII. Geometric parameters, partial atomic charges, nonbonded interactions, hydrogen bond interactions, and intrinsic torsional potentials for the naturally occurring amino acids. J Phys Chem. 1975;79: 2361–2381. [Google Scholar]
  • 67. Mandel N, Mandel G, Trus BL, Rosenberg J, Carlson G, Dickerson RE. Tuna cytochrome c at 2.0 Å resolution. III. Coordinate optimization and comparison of structures. J Biol Chem. 1977;252: 4619–4636. [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

S1 Fig. Comparison of f PPII and S PPII.

In this figure, S PPII is the average applied sampling rate for PP II for residues with S PPII ≠ 0 in a simulation, while f PPII was the observed per-position average PP II rate, also excluding residues with S PPII = 0. Open circles are from ensembles where position-specific S PPII followed the pattern specified in the text (i.e., different simulations had different S PPII ranging from 0.1 to 0.9 in 0.1 increments applied to each residue, every other residue, every third residue, etc.) which is why circles align at S PPII = 0.1–0.9 in 0.1 increments. Blue circles give the average f PPII for each applied S PPII. Open squares represent this calculation performed on simulations using randomly assigned position-specific S PPII. Stippled line is the identity; solid line is the relationship between f PPII and S PPII established previously for S PPII applied at constant values across all residues [22]. In general, f PPII trends with S PPII by: f PPII = S PPII-0.062∙exp(-(S PPII-0.63)2/(2∙0.282)). This gives the algorithm the ability to target specific f PPII from the applied value of S PPII.

(TIF)

S2 Fig. Correlation of experimental PP II propensities for the common amino acids.

Panel A, correlation of Kallenbach [17] and Creamer reported values [18]. Panel B, correlation of Kallenbach and Hilser reported values [19]. Panel C, correlation of Creamer and Hilser reported values. Panel D, correlation of Creamer and Zondlo reported values [53].

(TIF)

S3 Fig. Simulated effect of positive charged residues on R h.

Stippled line is R h from Eq (6) with N = 25 and f PPII,chain from 0 to 0.98. Symbols are simulated R h from ensembles of poly-ALA (N = 25) using Eq (3) (R h = <L>/2). Filled circles have each residue modeled with positive charge at the Cβ atom. Filled squares have every other residue modeled with positive charge, filled triangles have every third residue modeled with positive charge, and X represents every fourth residue modeled with positive charge. Inset: comparison of observed f PPII,chain to f PPII,chain expected from the applied S PPII (following Fig 6A inset description). Inset symbols match panel representations.

(TIF)

S4 Fig. Correlation of normalized error in predicted R h to net charge density.

Shown are correlations (R2) between normalized error and net charge density for all IDPs, IDPs in the high charge bias group (labeled as “high bias”), and IDPs in the low charge bias group (labeled as “low bias”). Red columns are correlations from using the Kallenbach propensity scale to predict R h, blue from using the Creamer propensities, white the Hilser propensities, and black the composite propensity scale.

(TIF)

S1 Table. IDP dataset.

(DOCX)

S2 Table. Sequence of each IDP in dataset.

(DOCX)

Data Availability Statement

All relevant data are within the paper and its Supporting Information files.


Articles from PLoS Computational Biology are provided here courtesy of PLOS

RESOURCES