Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2024 Feb 19.
Published in final edited form as: J Phys Chem B. 2023 May 18;127(21):4746–4760. doi: 10.1021/acs.jpcb.3c01619

The Analytical Flory Random Coil is a Simple-to-Use Reference Model for Unfolded and Disordered Proteins

Jhullian J Alston 1,2,*, Garrett M Ginell 1,2,*, Andrea Soranno 1,2, Alex S Holehouse 1,2,
PMCID: PMC10875986  NIHMSID: NIHMS1966831  PMID: 37200094

Abstract

Denatured, unfolded, and intrinsically disordered proteins (collectively referred to here as unfolded proteins) can be described using analytical polymer models. These models capture various polymeric properties and can be fit to simulation results or experimental data. However, the model parameters commonly require users’ decisions, making them useful for data interpretation but less clearly applicable as stand-alone reference models. Here we use all-atom simulations of polypeptides in conjunction with polymer scaling theory to parameterize an analytical model of unfolded polypeptides that behave as ideal chains (ν = 0.50). The model, which we call the analytical Flory Random Coil (AFRC), requires only the amino acid sequence as input and provides direct access to probability distributions of global and local conformational order parameters. The model defines a specific reference state to which experimental and computational results can be compared and normalized. As a proof-of-concept, we use the AFRC to identify sequence-specific intramolecular interactions in simulations of disordered proteins. We also use the AFRC to contextualize a curated set of 145 different radii of gyration obtained from previously published small-angle X-ray scattering experiments of disordered proteins. The AFRC is implemented as a stand-alone software package and is also available via a Google colab notebook. In summary, the AFRC provides a simple-to-use reference polymer model that can guide intuition and aid in interpreting experimental or simulation results.

INTRODUCTION

Proteins are finite-sized heteropolymers, and the application of polymer physics has provided a useful toolkit for understanding protein structure and function19. In particular, there has been significant interest in unfolded proteins under both native and non-native conditions2,1017. Depending on the experimental techniques employed, a variety of polymeric properties can be measured, including the radius of gyration (Rg), the hydrodynamic radius (Rh), the end-to-end distance (Re), and the apparent scaling exponent (νapp). These and many other parameters can be calculated directly from all-atom simulations, and the synergy of simulation and experiment has provided a powerful approach for constructing large ensembles of unfolded proteins for greater insight into the unfolded state15,1828.

Polymers can be described in terms of scaling laws, expressions that describe how chain dimensions vary as a function of chain length2931. Polymer scaling laws typically have the format D = R0Nν. Here, D reports on chain dimensions, R0 is a prefactor in units of spatial distance, and N is the number of monomers, which in the case of proteins is typically written in terms of the number of amino acids. ν (or, more accurately, νapp when applied to finite-sized heteropolymers like proteins) is the (apparent) Flory scaling exponent. In principle, νapp lies between 0.33 (as is obtained for a perfect spherical globule) and 0.59 (as obtained for a self-avoiding chain). However, for finite-sized polymers, values beyond 0.59 can be obtainable for self-repulsive chains 3234. The applicability of polymer scaling laws to describe real proteins assumes they are sufficiently long to display bona fide polymeric behavior and that they are sufficiently self-similar over a certain length scale, analogous to fractals. While this assumption often holds true, it is worth noting that sequence-encoded patterns in specific chemistries and/or secondary structure can lead to deviations from homopolymer-like behavior 18,3537.

To what extent do polymer scaling laws apply to real proteins? For denatured polypeptides, Kohn et al. reported the ensemble-average radius of gyration using the scaling expressions Rg = 1.927N0.59811. This result provides strong experimental evidence to support a model whereby denaturants unfolded proteins by uniformly weakening intramolecular protein-protein interactions1. A value for νapp of 0.598 also agrees with the previously reported value of 0.57 by Wilkins et al. and earlier work by Damaschun1,10,12. In short, under strongly denaturing conditions, proteins appear to behave as polymers in a good solvent 1,32,3841.

For proteins under native or native-like conditions, the apparent scaling exponents obtained for unfolded polypeptides are more variable. Marsh and Forman-Kay reported an average scaling expression of Rh = 2.49N0.509, for a set of intrinsically disordered proteins, while Bernadó and Svergun found a similar average relationship in Rg = 2.54N0.52 42,43. More recently, various means to estimate νapp for individual proteins have enabled values of νapp between 0.42 and 0.60 to be measured for a wide range of unfolded proteins of different lengths and compositions15,18,23,25,39,40,4446. An emerging consensus suggests that νapp depends on the underlying amino acid sequence2,17,47. If sequence-encoded chemical biases enable intramolecular interactions, then νapp may be lower than 0.5. Notably, despite clear conceptual limitations, the physics of homopolymers remains a convenient tool through which unfolded proteins can be assessed15,18,36,37,48.

Given the variety in scaling exponents for unfolded proteins under native conditions, we felt that a sequence-specific reference model would be helpful for the field. Such a model could provide a touchstone for experimentally measurable polymeric parameters, including intermolecular distances, the radius of gyration, the end-to-end distance, and the hydrodynamic radius. Similarly, such a model would provide a simple reference state with which simulations could be directly compared and used to identify sequence-specific effects. Finally, a standard reference model could offer an easy way to compare unfolded proteins of different lengths to assess if they behave similarly despite different absolute dimensions.

Here, we perform sequence-specific numerical simulations for polypeptides as an ideal chain, so-called Flory Random Coil (FRC) simulations2,31. Under these conditions, chain-chain, chain-solvent, and solvent-solvent interactions are all equivalent, no long-range excluded volume contributions are included, and as such, the polypeptide behaves as a Gaussian chain with νapp = 0.5. Because our FRC implementation minimizes finite-chain artifacts, we can parameterize an analytical, sequence-specific model using standard approaches from scaling theory, a model we call the Analytical Flory Random Coil (AFRC). This model enables the calculation of distance distributions for the end-to-end distance and the radius of gyration, as well as a variety of additional parameters that become convenient for the analysis of all-atom simulations and experiments.

The AFRC is not a predictor of unfolded protein dimensions. Those dimensions depend on the complex interplay of chain:chain and chain:solvent interactions, which are themselves determined by sequence-encoded chemistry4953. Instead, the AFRC provides a simple reference state that can aid in interpreting experimental and computational results without needing information other than the protein sequence. The AFRC is implemented in a stand-alone Python package and is also provided as a simple Google Colab notebook. We demonstrate the utility of this model by comparing experimental data and computational results.

The remainder of this paper is outlined as follows. First, we discuss the implementation details of the model, including a comparison against existing polymer models. Next, we analyzed previously published all-atom simulations to demonstrate how the AFRC can identify signatures of sequence-specific intramolecular interactions in disordered ensembles. Finally, we use the AFRC model to re-interpret previously reported small-angle X-ray scattering data of intrinsically disordered proteins.

METHODS

Flory Random Coil (FRC) Monte Carlo simulations were run using a customized version of CAMPARI (V1). Simulations were run in a simulation droplet with a radius of 500 Å for 25 × 106 steps with 50 × 103 steps discarded as equilibration. Conformers were saved every 5 × 103 steps, generating 5 × 103 independent conformations. For each Monte Carlo move, a residue is randomly selected, and the phi and psi angles for that residue are changed to a new pair sampled from a precomputed set of residue-specific allowed dihedrals (Fig. S1). Sampling is independent of the state of any other residues, such that the system evolves via a set of n random walks through dihedral space, where n is the number of residues. In this way, moves are in effect rejection-free but micro-reversibility is maintained, meaning these ensembles are sufficiently well-sampled and enable calibration for FRC fitting parameters (Table S1).

Homopolymeric FRC simulations were run for length of 51, 101, 151, 251 and 351 residues for all twenty amino acids (i.e. 100 independent sequences in total). Heteropolymeric simulations were run for lengths 10, 20, 30, 40, 50, 100, 120, 140, 180, 200, 250, 300, 350, 400, 450, 500 (i.e. 320 independent sequences in total). For each length series, twenty separate simulations were run where, for each sequence, one of the twenty amino acids is enriched (30% of the sequence) while the remaining residues are randomly selected. All FRC simulations were analyzed using SOURSOP28.

Excluded volume (EV) simulations were run using CAMPARI (V2). In EV simulations, the underlying energy function for the ABSINTH forcefield is altered such that solvation, attractive Lennard-Jones, and polar (charge) interactions are set to zero, as has been described previously54. EV simulations were used solely to compare finite-size effects for ensembles constructed for real chains. Excluded volume (EV) Monte Carlo simulations were run for homopolymers of 50, 100, 150, 200, 250, 300, 350, 400, 450, and 500 residue poly-alanine chains as a reference model to quantify finite-size effects. Simulations were run in a simulation droplet with a radius of 500 Å for 21 × 106 steps, with 1 × 106 steps discarded as equilibration. It is worth noting that given chains are generated in a random non-overlapping starting configuration and the only criterion for move acceptance or rejection is steric overlap, strictly speaking, no equilibration is needed as the chain begins the simulation “equilibrated” in the context of the underlying Hamiltonian. Conformers were saved every 2 × 104 steps, generating 1 × 103 independent conformations, a sufficiently large ensemble for our purposes of calculating internal scaling profiles (Fig. 1E). However, as a point of pedagogy, we note that ensembles of 1 × 103 conformers for large (200+) residue IDRs would (in general) be insufficient for calculating many other types of simulation-derived properties (e.g. local dihedral distributions) due to the mismatch between the number of potential microstates and observed number of conformers.

Figure 1: The AFRC is a pre-parameterized polymer model based on residue-specific polypeptide behavior.

Figure 1:

A. Schematic of the amino acid dihedral angles. B. Ramachandran map for alanine used to select acceptable backbone conformations for the FRC simulations. All twenty amino acids are shown in Fig. S1. C. Graphical rendering of an FRC ensemble for a 100-residue homopolymer. The red chain is a highlighted single conformation, and the shaded other chains shown to highlight the heterogeneous nature of the underlying ensemble. D. Flory Random Coil (FRC) simulations performed using a modified version of the ABSINTH implicit model and CAMPARI simulation engine yield ensembles that scale as ideal chains (i.e., Re and Rg scale with the number of monomers to the power of 0.5). E. Internal scaling profiles for FRC simulations and Excluded Volume (EV) simulations for poly-alanine chains of varying lengths (filled circles demark the end of profiles for different polymer lengths). Internal scaling profiles map the average distance between all pairs of residues |i-j| apart in sequence space, where i and j define two residues. This double average reports on the fact we average over both all pairs of residues that are |i-j| apart and do so over all possible configurations. EV simulations show a characteristic tapering (“dangling end” effect) for large values of |i-j|. All FRC simulation profiles superimpose on top of one another, reflecting the absence of finite chain effects. F. Histograms of end-to-end distances (blue) taken from FRC simulations vs. corresponding probability density profiles generated by the Analytical FRC (AFRC) model (black line) show excellent agreement. G. Histograms of radii of gyration (red) taken from FRC simulations vs. corresponding probability density profiles generated by the AFRC model (black line) also show excellent agreement.

For quantifying dangle end effects of internal vs. external inter-residue distances (Fig. S1D), we ran extensive additional simulations of an A151 homopolymer (to match FRC simulations). For these simulations, ten independent replicas were run for 8.05 × 107 steps, with the first 5 × 105 discarded as equilibration. Conformers were saved every 2 × 104 steps. These simulations generated an ensemble of 4 × 104 conformations, enabling a robust assessment of finite-size effects.

Finite-size/sampling effects for FRC simulations are assessed in several ways, by comparing internal scaling profiles as a function of length and by calculating the Flory characteristic ratio (see supplementary information). Parameter fitting to extract residue-specific parameters follows a simple scan of possible parameters for each amino acid independently, with the optimal parameter sitting at the bottom of a convex function with respect to root mean square error (i.e., there is a single best-fitting parameter in all cases).

1.2. All-atom simulations

All-atom simulations were analyzed as described previously, and all the all-atom trajectories can be obtained as described previously28. Specifically, all-atom simulations included both Monte Carlo and molecular dynamics simulations. Monte Carlo simulations include those of Ash155, p5356, p2757, the notch intracellular domain58, the hnRNPA1 low complexity domain25. Molecular dynamics simulations include alpha-synuclein, DrkN, ACTR and NTail 59. All simulation trajectories were taken from previously published studies, such that the simulation details can be obtained from those papers.

1.3. SAXS data

Experimental SAXS data includes 145 separate radius of gyration values. All values and associated references are included in table S4. In addition, all data are tabulated at the main GitHub directory for this paper (https://github.com/holehouse-lab/supportingdata/tree/master/2023/alston_ginell_2023) and available as an Excel spreadsheet and Pandas-compatible CSV file.

1.4. Amino acid sequence analysis

Sequence analysis to calculate the fraction of charged residues and proline residues was done using localCIDER60 and sparrow (https://github.com/idptools/sparrow).

1.5. AFRC implementation

The AFRC is implemented as a stand-alone Python package. All code is open-sourced and available at https://github.com/idptools/afrc. All documentation is available at https://afrc.readthedocs.io/. The package itself can be downloaded from https://pypi.org/project/afrc and installed using the command pip install afrc. A Google colab notebook that implements the AFRC along with the other three analytical models described in this work are linked from https://github.com/idptools/afrc.

The afrc package uses numpy and scipy, and in addition to the AFRC implements the Worm-like chain (WLC), the self-avoiding random walk (SAW), and the ν-dependent self-avoiding random walk (SAW-ν) 23,61.

1.6. Figures and analysis in this paper

Jupyter notebooks to recreate all figures in this paper are available at https://github.com/holehouse-lab/supportingdata/tree/master/2023/alston_ginell_2023.

RESULTS

Implementation of a numerical model for sequence-specific ideal chain simulations

We used a Monte Carlo-based approach to construct sequence-specific atomistic ensembles of polypeptides as ideal chains. All-atom simulations with all non-bonded and solvation interactions scaled to zero were performed using a modified version of the CAMPARI Monte Carlo simulation engine using bond lengths and atomic radii defined by the ABSINTH-OPLS forcefield2,62,63. We modified CAMPARI to reproduce Flory’s rotational isomeric state approximation31,64. In this method, an initial conformation of the polypeptide is randomly generated. Upon each Monte Carlo step, a residue is randomly selected, the backbone dihedrals are rearranged to one of a subset of allowed residue-specific psi/phi values (i.e., specific isomeric states), and the chain is rearranged accordingly (Fig. 1A, B). Allowed phi/psi values are selected from a database of residue-specific allowed values as determined by all-atom simulations of peptide units, with the associated Ramachandran maps shown in Fig. S1. Importantly, the Monte Carlo moves in these simulations approach are rejection-free. That is, only allowed phi/psi angles are proposed, and no consideration of steric overlap in the resulting conformation is given. The ensemble generated by these simulations is referred to as the Flory Random Coil (FRC, Fig. 1C) and has been used as a convenient reference frame for comparing simulations of disordered and unfolded polypeptides for over a decade (as reviewed by Mao et al.2)15,54,6568.

FRC simulations enable the construction of ensembles where each amino acid exists in a locally allowed configuration, yet no through-space interactions occur. This has two important implications for the construction of an ideal chain model. Firstly, each monomer has no preference for chain:chain vs. chain:solvent interactions (each monomer is “agnostic” to its surroundings). As a result, both internal and global dimensions show scaling behavior with an apparent scaling exponent (νapp) of 0.5 (Fig. 1D), analogous to that of a polymer in a theta solvent. Secondly, terminal residues sample conformational space in the same way as residues internal to the chain (Fig. S2). This means that end-effects that emerge finite-chain effects are not experienced in terms of end effects (Fig. 1E). This is in contrast to finite-sized self-avoiding chains, in which internal scaling profiles reveal a noticeable and predictable “dangling end” finite-chain effect (Fig 1E, Fig. S2). In summary, FRC simulations enable us to generate ensembles at all-atom resolution that are nearly fully approximations of ideal chains, reproducing the behavior of a hypothetical “ideal” polypeptide.

Constructing an analytical description of the Flory Random Coil

Our FRC ensembles enable the calculation of a range of polymeric properties, including inter-residue distances, inter-residue contact probabilities, the hydrodynamic radius, or the radius of gyration. Comparing these properties with experiments or simulations is often convenient, offering a standard reference frame for normalization and biophysical context2,15,17,36,37. However, performing and analyzing all-atom simulations with CAMPARI necessitates a level of computational sophistication that may make these calculations inaccessible to many scientists. To address this, we next sought to develop a set of closed-form analytical expressions to reproduce these properties and implement them as an easy-to-use package available both locally and – importantly – via a simple web interface (Google colab notebook).

FRC simulations generate ensembles that – by definition – reproduce the statistics expected for an ideal chain. As mentioned, polymer scaling behavior generally takes the form;

D=R0Nv (1)

For an ideal chain, νapp should not depend on the amino acid sequence (as all chains should scale with νapp = 0.5). However, the prefactor R0 can and will show sequence dependence. As such, computing polymeric properties from sequences necessitates a means to calculate sequence-specific prefactor values. Prefactor values were parameterized using homopolymer simulations of each amino acid (see supplementary information). The inter-residue distance prefactor A0 was parameterized by fitting internal scaling profiles using equation (2);

r(i,j)2=A0|ij|v (2)

In equation 2, |i-j| is the number of residues between residues at position i and j, the left-hand-side reports on the root-mean-square (RMS) distance between residues i and j in the chain, ν is the scaling exponent (in our case this is equal to 0.5), and A0 is a prefactor for which we can directly solve for. The double angle brackets around the RMS distance reflect the fact we are averaging over all pairs of residues that are |i-j| apart and doing so for all chain configurations. Plotting |i-j| vs. the RMSD generates the internal scaling profile shown in Fig. 1E. By fitting homopolymers of the 20 amino acids, a set of residue-specific A0 prefactors was determined, as listed in Supplementary Table 1.

For our homopolymers, we can calculate the root-mean-squared end-to-end distances using equation (3);

re2=A0Nv (3)

From this, we can then use the standard function for P(r) of a Gaussian chain to calculate the end-to-end distance distribution;

P(r)=4πr2(32πre2)3/2e(3r22re2) (4)

After determining residue-specific A0, a comparison of analytical and numerical simulation distributions show excellent agreement when homopolymer end-to-end distance distributions are compared between FRC simulations and the AFRC-derived values (Fig. 1f).

We next took a similar route to define the radius of gyration (Rg) distribution. While no closed-form solution for the distribution of the radius of gyration exists, Lhuillier previously defined a closed-form approximation for this distribution for a fractal chain69;

PRg(x)Nvdf(x)(xNv) (5)

Where;

f(x)exp[(Nvx)αd(xNv)δ] (6)

And the variables α and δ are defined as:

α=1(vd1) (7)
δ=1(1v) (8)

Here, x represents the distance in some arbitrary units (written as such to avoid confusion with r, which represents the distance in Angstroms [Å]), N and ν again represent the total number of residues and the scaling exponent (0.5.), while d is the dimensionality (d=3). This allows us to calculate α and δ exactly, given ν is fixed at 0.5. Consequently, we can recast equation 5 into units of Å using a sequence-specific normalization factor (X0);

r=X0x (9)

To calculate X0, we fit numerically-generated P(Rg) distributions from homopolymer simulations with a series of analytically generated distributions to identify the best-fitting amino acid-specific X0 values. This fitting is done to match the ensemble-average Rg the two distributions, which conveniently follow a convex relationship with a single best-fitting value. These prefactors are listed in Supplementary Table 1. As with the end-to-end distances, a comparison of numerically-generated P(Rg) with analytically-generated P(Rg) values are in extremely good agreement (Fig. 1g). Comparing ensemble average end-to-end distance and radii of gyration for homopolymers of all 20 amino acids in lengths from 50 to 350 amino acids revealed a Pearson correlation coefficient of 0.999 and a root mean square error (RMSE) of 0.8 Å and 0.3 Å for the end-to-end distance and radius of gyration, respectively (Fig. S2).

With analytical expressions for computing the end-to-end distance and radius of gyration probability distributions in hand, we can calculate additional polymeric properties. Given the fractal nature of the Flory Random Coil and the absence of end effects, we can calculate all possible inter-residue distances and, correspondingly, contact frequencies between pairs of residues (Fig. 2a, b). Similarly, using either the Kirkwood-Riseman equation or a recently derived empirical relationship, we can compute an approximation for the ensemble-average hydrodynamic radius7072. In summary, the AFRC offers an analytic approach for calculating sequence-specific ensemble properties for unfolded homopolymers.

Figure 2. The AFRC enables the calculation of intra-residue distance distributions and expected distance-dependent contact fractions.

Figure 2.

A. We compared all-possible mean inter-residue distances obtained from FRC simulations with predictions from the AFRC. The maximum deviation across the entire chain is around 2.5 Å, with 92% of all distances having a deviation of less than 1 Å. B. Using the inter-residue distance, we can calculate the average fraction of an ensemble in which two residues are in contact (i.e., within some threshold distance). Here, we assess how that fractional contact varies with the contact threshold (different lines) and distance between the two residues. The AFRC does a somewhat poor job of estimating contact fractions for pairs of residues separated by 1,2 or 3 amino acids due to the discrete nature of the FRC simulations vis the continuous nature of the Gaussian chain distribution. However, the agreement is excellent above a sequence separation of three or more amino acids, suggesting that the AFRC offers a reasonable route to normalize expected contact frequencies.

Generalization to heteropolymers

Our parameterization has thus far focused exclusively on homopolymer sequences. However, Flory’s rotational isomeric state approach requires complete independence of each amino residue31,64. Consequently, we expected the prefactor associated with a given heteropolymer to reflect a weighted average of prefactors taken from homopolymers, where the sequence composition determines the weights.

To test this expectation, we compared numerical simulations with AFRC predictions for a set of different polypeptide sequences finding excellent agreement in both end-to-end distances and radii of gyration (Fig. 3a, b and Fig. S3). Similarly, given the absence of end-effects, our analytical end-to-end distance expression works equally well for intramolecular distances in addition to the end-to-end distance. To assess this, we compared internal scaling profiles between FRC simulations and AFRC predictions (Fig. 3c). These profiles compare the ensemble average distance between each possible inter-residue distance and offer a convenient means to assess both short and long-range intramolecular distances. We performed FRC simulations for 320 different polypeptide sequences ranging in length from 10 to 500 amino acids with a systematic variation in amino acid composition. Across all internal scaling profile comparisons between FRC and AFRC simulations, the overall average RMSE was 0.5 Å, with almost all (92%) of individual comparisons revealing an RMSE under 1 Å (Fig. 3D). Similarly, the Pearson’s correlation coefficient between internal scaling profiles for FRC vs. AFRC for all ten-residue chains was 0.9993, which was the worst correlation across all lengths (Fig. S4). In summary, the AFRC faithfully reproduces homo- and hetero-polymeric dimensions for polypeptides under the FRC assumptions.

Fig. 3. The AFRC generalizes to arbitrary heteropolymeric sequences with the same precision and accuracy as it does for homopolymeric sequences.

Fig. 3

A. Representative examples of randomly selected heteropolymers of lengths 100, 250, and 450, comparing the AFRC-derived end-to-end distance distribution (black curve) with the empirically-determined end-to-end distance histogram from FRC simulations (blue bars). B. The same three polymers, as shown in A, now compare the AFRC-derived radius of gyration distance distribution (black curve) with the empirically-determined radius of gyration histogram from FRC simulations (blue bars). C Comparison of AFRC vs. FRC simulation-derived internal scaling profiles for a 150-amino acid random heteropolymer. The deviation between FRC and AFRC for these profiles offers a measure of agreement across all length scales. D Comparison of root-mean-square error (RMSE) obtained from internal scaling profile comparisons (i.e., as shown in C) for 320 different heteropolymers straddling 10 to 500 amino acids in length. In all cases, the agreement with theory and simulations is excellent.

Comparison with existing polymer models

For completeness, we compared the end-to-end distance distributions obtained from several other polymer models used throughout the literature for describing unfolded and disordered polypeptides. Previously-used polymer models offer a means to analytically fit experimental or computational results and benefit from taking one (or more) parameters that define the model’s behavior. While the AFRC does not enable fitting to experimental or simulated data, it only requires an amino acid sequence as input. With this in mind, the AFRC serves a fundamentally different purpose than commonly used models.

We wondered if dimensions obtained from the AFRC would be comparable with dimensions obtained from other polymer models when using parameters used previously in the literature. We compared distributions obtained from the worm-like chain (WLC), the self-avoiding walk (SAW) model, and a recently-developed ν-dependent self-avoiding walk (SAW-ν)23,61. For the WLC model, we used a persistence length of 3.0 Å and an amino acid size of 3.8 Å (such that the contour length, lc, is defined as N×3.861). For the SAW model, we used a scaling prefactor of 5.5 Å (i.e., assuming 〈Re〉 = 5.5N0.598)23,32,61. Finally, for SAW-ν, we computed distributions using a prefactor of 5.5 Å and using several different ν values6,23. These values were chosen because previous studies have used them to describe intrinsically disordered proteins.

Fig. 4A shows comparisons of the AFRC distance distribution obtained for a 100-mer polyalanine (A100) vs. the WLC and SAW (top) and vs. ν-dependent distributions (bottom). The AFRC is slightly more expanded than the WLC model using the parameters provided, although the persistence length can, of course, be varied to explore more compact (lower lp) or more extended (higher lp) distributions (Fig. S6A). The AFRC is substantially more compact than the SAW model. The comparison with the SAW model is important, as with a prefactor of 5.5 Å the SAW model describes a polypeptide as a self-avoiding random coil (ν=0.588), whereas the AFRC describes a polypeptide as an ideal chain (ν = 0.5), such that we should expect the SAW to be more expanded than the AFRC. Finally, in comparing the AFRC with the SAW-ν model, we find that the AFRC distribution falls almost completely top of the ν = 0.50 distribution. This indicates that both models arrive at nearly identical distance distributions despite being developed independently. This result is both confirmatory and convenient, as it means the AFRC and SAW-ν models can be used to analyze the same data without concern for model incompatibility.

Fig. 4. The AFRC is complementary to existing polymer models.

Fig. 4

(A) Comparison of end-to-end distance distributions for several other analytical models, including the Wormlike Chain (WLC), the self-avoiding walk (SAW), and the ν-dependent SAW model (SAW-ν). The AFRC behaves like a ν-dependent SAW with a scaling exponent of 0.5. (B) Comparisons of ensemble-average radii of gyration as a function of chain length for the same sets of polymer models. The AFRC behaves as expected and again is consistent with a ν-dependent SAW with a scaling exponent of 0.5.

We emphasize that this comparison with the existing polymer model is not presented to imply the AFRC is better than existing models but to highlight their compatibility. One can tune input parameters for all three models to arrive at qualitatively matching end-to-end distributions (Fig. S6B). The major difference between these three models and the AFRC is simply that the AFRC requires only amino acid sequence as input, making it a convenient reference point. For completeness, all four models are implemented in our Google colab notebook.

We also compared ensemble-average radii of gyration obtained from the various models with those obtained from the AFRC. While the WLC, SAW, and SAW-ν models do not provide approximate closed-form solutions for the radius of gyration distribution, they do enable an estimate of the ensemble-average radius of gyration to be calculated23,61. Using the same model parameters as was used in Fig. 4A, the AFRC falls between the SAW and the WLC. Moreover, the AFRC radii of gyration scale almost 1:1 with the SAW-ν derived radii as a function of chain length when ν = 0.50. As such, we conclude that the AFRC is consistent with existing polymer models yet benefits from being both parameter-free (for the user) and offering full distributions for the radius of gyration and intramolecular distance distributions per-residue contact fractions, convenient properties for normalization in simulations and experiment.

Comparison with all-atom simulations

Our work thus far has focussed on developing and testing the robustness of the AFRC. Having done this, we next sought to ask how similar (or dissimilar) distributions obtained from the AFRC are compared to all-atom simulations. We used simulations generated via all-atom molecular dynamics with the Amber99-disp forcefield and all-atom Monte Carlo simulations with the ABSINTH-OPLS forcefield25,5559,63. Specifically, we examined nine different fully disordered proteins: The unfolded Drosophila Drk N-terminal SH2 domain (DrkN, 59 residues)59,73,74, the ACTR domain of p160 (ACTR, 71 residues)39,40,59,75, a C-terminal disordered subregion of the yeast transcription factor Ash1 (Ash1, 83 residues)55, the N-terminal disordered regions of p53 (p53, 91 residues)56,76, the C-terminal IDR of p27 (p26, 107 residues)57, the intrinsically disordered intracellular domain of the notch receptor (Notch, 132 residues)58, the C-terminal disordered domain of the measles virus nucleoprotein (Ntail, 132 residues)59,77, the C-terminal low-complexity domain of hnRNPA1 (A1-LCD, 137 residues)25, and full-length alpha-synuclein (asyn, 140 residues)59,78,79.

We compared distributions for the end-to-end distance and radius of gyration for our all-atom simulations with analogous distributions generated by the AFRC (Fig. 5). These comparisons revealed that while the general shape of the distributions recovered from simulations was not dissimilar from the AFRC-derived end-to-end distance and radius of gyration distributions, the width and mean were often different. This is hardly surprising, given that the global dimensions of an unfolded protein depend on the underlying amino acid sequence. The ratio of the mean end-to-end distance divided by the AFRC-derived mean end-to-end distance (or the corresponding ratio for the radius of gyration) was found to range between 0.7 and 1.4. In some cases, the end-to-end distance ratio or radius of gyration ratio varied within the same protein. For example, for the 132-residue intracellular-domain IDR from Notch (Notch), the end-to-end distance ratio was 0.8 (i.e., smaller than predicted by the AFRC), while the radius of gyration ratio was 1.0. Similarly, in alpha-synuclein (Asyn), the corresponding ratios were 0.7 and 0.9, again reporting a smaller end-to-end distance than radius of gyration. As suggested previously, discrepancies in end-to-end distance vs. radius of gyration vs. expectations from homopolymer models are diagnostic of sequence-encoded conformational biases18,35,36,49,50,80,81.

Fig. 5. AFRC-derived distance distributions enable simulations to be qualitatively compared against a null model.

Fig. 5

A. Comparison of the end-to-end distance distributions from the AFRC (black line) and SAW-ν (blue dashed line, with ν=0.5 and prefactor = 5.5) with the simulation-derived end-to-end distribution (blue bars) for all-atom simulations of nine different disordered proteins. B. Comparison of the AFRC-derived radius of gyration distributions (black line) with the simulation-derived radius of gyration distribution (red bars) for all-atom simulations of nine different disordered proteins.

We also used the AFRC to calculate scaling maps. Scaling maps are non-redundant matrices of inter-residue distances obtained from simulations and normalized by the expected inter-residue distances obtained by the AFRC (Fig. 6)49,55,81. We compared these scaling maps (top left triangle of each panel) against absolute distances (bottom right triangle). This comparison highlights the advantage that using a reference polymer model offers. Long-range sequence-specific conformational biases are much more readily visualized as deviations from an expected polymer model. Moreover, the same dynamic range of values can be used for chains of different lengths, normalizing the units from Å to a unitless ratio.

Fig. 6. The AFRC enables a consistent normalization of intra-chain distances to identify specific sub-regions that are closer or further apart than expected.

Fig. 6

Inter-residue scaling maps (top left) and distance maps (bottom right) reveal the nuance of intramolecular interactions. Scaling maps (top left) report the average distance between each pair of residues (i,j) divided by the distance expected for an AFRC-derived distance map, providing a unitless parameter that varies between 0.7 and 1.3 in these simulations. Distance maps (bottom right) report the absolute distance between each pair of residues in angstroms. While distance maps provide a measure of absolute distance in real space, scaling maps provide a cleaner, normalized route to identify deviations from expected polymer behavior, offering a convenient means to identify sequence-specific effects. For example, in Notch and alpha-synuclein, scaling maps clearly identify end-to-end distances as close than expected. Scaling maps also offer a much sharper resolution for residue-specific effects - for example, in p53, residues embedded in the hydrophobic transactivation domains are clearly identified as engaging in transient intramolecular interactions, leading to sharp deviations from expected AFRC distances.

Returning to the notch simulations, both types of intramolecular distance analysis clearly illustrate a strong long-range interaction between the N-terminal residues 1–30 and the remainder of the sequence. The long-range interaction between chain ends influences the end-to-end distance much more substantially than it does the radius of gyration (Fig. 6). Similarly, in alpha-synuclein, we observed long-range interactions between the negatively charged C-terminus and the positively-charged residues 20–50, leading to a reduction in the end-to-end distance. In short, the AFRC provides a convenient approach to enable direct interrogation of sequence-to-ensemble relationships in all-atom simulations.

Finally, we calculated per-residue contact scores for each residue in our nine proteins (Fig. 7). These contact scores sum the length-normalized fraction of the simulation in which each residue is in contact with any other residue in the sequence25. While this collapses information on residue-specificity into a single number, it integrates information from the typically-sparse contact maps for IDR ensembles to identify residues that may have an outside contribution towards short (<6 Å) range molecular interactions. We and others have previously used this approach to identify “stickers” - regions or residues in IDRs that have an outsized contribution to intra- and inter-molecular interactions25,68,82,83.

Fig. 7. The AFRC enables an expected contract fraction to be calculated, such that normalized contact frequencies can be easily calculated for simulations.

Fig. 7

Across the nine different simulated disordered proteins, we computed the contact fraction (i.e., the fraction of simulations each residue is in contact with any other residue) and divided this value by the expected contact fraction from the AFRC model. This analysis revealed subregions within IDRs that contribute extensively to intramolecular interactions, mirroring finer-grain conclusions obtained in Fig. 6.

In some proteins, specific residues or subregions were identified as contact hotspots. This includes the aliphatic residues in ACTR, and hydrophobic residues in the p53 transactivation domains, in line with recent work identifying aliphatic residues as driving intramolecular interactions68,84. Most visually noticeable, aromatic residues in the A1-LCD appear as spikes that uniformly punctuate the sequence, highlighting their previously-identified role as evenly-spaced stickers25. Intriguingly, in alpha-synuclein, several regions in the aggregation-prone non-amyloid core (NAC) region (residues 61–95) appear as contact score spikes, potentially highlighting the ability of intramolecular interactions to guide regions or residues that may mediate inter-molecular interaction.

Comparison with SAXS-derived radii of gyration

Having compared AFRC-derived parameters with all-atom simulations, we next sought to determine if AFRC-derived polymeric properties compared reasonably with experimentally-measured values. As a reminder, the AFRC is not a predictor of IDR behavior; instead, it offers a null model against which IDR dimensions can be compared. To perform a comparison with experimentally derived data, we curated a dataset of 145 examples of radii of gyration measured by small-angle X-ray scattering (SAXS) of disordered proteins. We choose to use SAXS data because SAXS-derived radii of gyration offer a label-free, model-free means to determine the overall dimensions of a disordered protein. That said, SAXS-derived measurements are not without their caveats (see discussion), and where possible, we re-analyzed primary scattering data to ensure all radii of gyration reported here are faithful and accurate.

To assess our SAXS-derived radii of gyration, we calculated expected dimensions for denatured proteins, folded globular domains, or AFRC chains by fitting scaling laws with the form Rg = R0Nν against different polymer models. We used a denatured-state polymer model (ν = 0.59, R0 = 1.98, as defined by Kohn et al.) and a folded globular domain model (ν = 0.33, R0 = 2.86, as obtained from PDBSELECT25 originally plotted by Holehouse & Pappu)11,48,85. We also calculated the AFRC-derived radii of gyration for all 145 chains and fitted a polymer scaling model to the resulting data where the only free parameter was R0 (ν = 0.50, R0 = 2.50). This analysis showed that the majority of the 145 proteins have a radius of gyration above that of the AFRC-derived radius of gyration (see discussion), with some even exceeding the expected radius of gyration of a denatured protein (Fig. 8A). Based on these data, we determined an empirical upper and lower bound for the biologically accessible radii of gyration given a chain length (see discussion). This threshold suggests that, for a sequence of a given length, there is a wide range of possible IDR dimensions accessible (Fig. 8B, Fig. S5).

Fig 8. Comparison of AFRC-derived radii of gyration with experimentally-measured values.

Fig 8.

A. We compared 145 experimentally-measured radii of gyration against three empirical polymer scaling models that capture the three classes of polymer scaling (ν = 0.33 [globular domains], ν = 0.5 [AFRC], and ν = 0.59 [denatured state]). Individual points are colored by their normalized radius of gyration (SAXS-derived radius of gyration divided by AFRC-derived radius of gyration). B. The same data as in panel A with the empirically defined upper and lower bound. As with panel A, individual points are colored by their normalized radius of gyration. C. Comparison of SAXS-derived radii of gyration and AFRC-derived radii of gyration, as with panels A and B, individual points are colored by their normalized radius of gyration.

Finally, we wondered how well the AFRC-derived radii of gyration would correlate with experimentally-measured values. Based on the upper and lower bounds shown in Fig. 8B, we excluded four radii of gyration that appear to be spuriously large, leaving 141 data points. For these 141 points, we calculated the Pearson correlation coefficient (r) and the RMSE between the experimentally-measured radii of gyration and the AFRC-derived radii of gyration. This analysis yielded a correlation coefficient of 0.91 and an RMSE of 6.4 Å (Fig. 8C). To our surprise, these metrics outperform several established coarse-grained models for assessing intrinsically disordered proteins, as reported recently86. We again emphasize that the AFRC is not a predictor of IDR dimensions. However, we tentatively suggest that this result demonstrates that a reasonably good correlation between amino acid sequence and global dimensions can be obtained solely by recognizing that disordered proteins are flexible polymers. With this in mind, we conclude that the AFRC provides a convenient and easily-accessible control for experimentalists measuring the global dimensions of disordered proteins.

Reference implementation and distribution

Computational and theoretical tools are only as useful as they are usable. To facilitate the adoption of the AFRC as a convenient reference ensemble, we provide the AFRC as a stand-alone Python package distributed through PyPI (pip install afrc). We also implemented the additional polymer modes described in Fig. 4 with a consistent programmatic interface, making it relatively straightforward to apply these models to analyze and interpret computational and experimental data. Finally, to further facilitate access, we provide an easy-to-use Google colab notebook for calculating expected parameters for easy comparison with experiments and simulations. All information surrounding access to the AFRC model is provided at https://github.com/idptools/afrc.

DISCUSSION & CONCLUSION

In this work, we have developed and presented the Analytical Flory Random Coil (AFRC) as a simple-to-use reference model for comparing against simulations and experiments of unfolded and disordered proteins. We demonstrated that the AFRC behaves as a truly ideal chain and faithfully reproduces homo- and hetero-polymeric inter-residue and radius of gyration distributions obtained from explicit numerical simulations. We also compared the AFRC against several previously-established analytical polymer models, showing that ensemble-average or distribution data obtained from the AFRC are interoperable with existing models. Finally, we illustrated how the AFRC could be used as a null model for comparing data obtained from simulations and from experiments.

The AFRC differs from established polymer models in two key ways. While existing models define functional forms for polymeric properties, they do not prescribe specific length scales or parameters for those models. This is not a weakness - it simply reflects how analytical models work. However, the need to provide ‘appropriate’ parameters to ensure these models recapitulate behaviors expected for polypeptides places the burden on selecting and/or justifying those parameters on the user. The AFRC combines several existing analytical models (the Gaussian chain and the Lhuillier approximation for the radius of gyration distribution) with specific parameters obtained from numerical simulations to provide a “parameter-free” polymer model defined by its reference implementation (as opposed to the mathematical form of the underlying distributions). We place parameter free in quotation marks because the freedom from parameters is at the user level - the model itself is explicitly parameterized to reproduce polypeptides dimensions. However, from the user’s perspective, no information is needed other than the amino acid sequence.

Although the AFRC was explicitly parameterized to recapitulate numerical FRC simulations, sequence-specific effects do not generally have a major impact on the resulting dimensions. For example, Fig. S6 illustrates the radius of gyration or end-to-end distance obtained for varying lengths of poly-alanine and poly-glycine. This behavior is not a weakness of the model - it is the model. This relatively modest sequence dependence reflects the fact that for an ideal chain, both the second and third virial coefficients are set to zero (i.e., the integral of Mayer f-function should equal 0)87. As such, the AFRC does not enable explicitly excluded volume contributions to the chain’s dimensions from sidechain volume, although this is captured implicitly based on the allowed isomeric states (compare glycine to alanine in Fig. S1). In summary, the AFRC does not offer any new physics, but it does encapsulate previously derived physical models along with numerically-derived sequence-specific parameters to make it easy to construct null models explicitly for comparison with polypeptides.

In comparing AFRC-derived polymeric properties with those obtained from all-atom simulations, we recapitulate sequence-to-ensemble features identified previously 25,28,58,59. When comparing the normalized radii of gyration (RgSim/ RgAFRC), we noticed the lower and upper bounds obtained here appear to be approximately 0.8 and 1.4, respectively. To assess if this trend held true for experimentally derived radii of gyration, we calculated the normalized radii of gyration for the 141 values reported in Fig. 8C, recapitulating a similar range (0.8 to 1.46). Based on these values, we defined an empirical boundary condition for the anticipated range in which we would expect to see a disordered chain’s radius of gyration as between 0.8RgAFRC and 1.45RgAFRC (Fig. 8B). We emphasize this is not a hard threshold. However, it offers a convenient rule-of-thumb, such that measured radii of gyration can be compared against this value to assess if a potentially spurious radius of gyration has been obtained (either from simulations or experiments). Such a spurious value does not necessarily imply a problem, but may warrant further investigation to explain its physical origins.

Our comparison with experimental data focussed on radii of gyration obtained from SAXS experiments. We chose this route given the wealth of data available and the label-free and model-free nature in which SAXS data are collected and analyzed. Given the AFRC offers the expected dimensions for a polypeptide behaving qualitatively as if it is in a theta solvent, it may be tempting to conclude from these data that the vast majority of disordered proteins are found in a good solvent environment (Fig. 9A). The solvent environment reflects the mean-field interaction between a protein and its environment. In the good solvent regime, protein:solvent interactions are favored, while in the poor solvent regime protein:protein interactions are favored 2,6,44,48. However, it is worth bearing in mind that SAXS experiments generally require relatively high concentrations of protein to obtain reasonable signal-to-noise43. Recent advances in size exclusion chromatography (SEC) coupled SAXS have enabled the collection of scattering data for otherwise aggregation-prone proteins with great success88. However, there is still a major acquisition bias in the technical need of these experiments to work with high concentrations of soluble proteins when integrated over all existing measured data. By definition, such highly soluble proteins experience a good solvent environment. Given this acquisition bias, we remain agnostic as to whether these results can be used to extrapolate to the solution behavior of all IDRs.

Fig 9. AFRC-normalized radii of gyration from experimentally-measured proteins.

Fig 9.

A. Histogram showing the normalized radii of gyration for 141 different experimentally-measured sequences. B. Comparison of normalized radii of gyration for 141 different experimentally-measured sequences against the fraction of charge and proline residues in those sequences. Individual points are colored by their normalized radius of gyration. Grey bars reflect the average radius of gyrations obtained by binning sequences with the corresponding fraction of charge and proline residues. The colored sigmoidal curve is included to guide the eye across the transition region, suggesting that – on average – the midpoint of this transition is at a fraction of charged and proline residues of ~0.25. The Pearson correlation coefficient (r) for the fraction of charged and proline residues vs. normalized radius of gyration is 0.58).

Prior work has implicated the presence of charged and proline residues as mediating IDR chain expansion 33,34,42,46,49,55,65,8992. We took advantage of the fact that the AFRC enables a length normalization of experimental radii of gyration and assessed the normalized radius of gyration vs. the fraction of charged and proline residue (Fig. 9B). Our data support this conclusion as a first approximation, but also clearly demonstrate that while this trend is true on average, there is variance in this relationship. Notably, for IDRs with a fraction of charged and proline residues between 0.2 and 0.4, the full range of possible IDR dimensions are accessible. The transition from (on average) more compact to (on average) more expanded chains occurs around a fraction of proline and charged residues of around 0.25 – 0.30, in qualitative agreement with prior work exploring the fraction of charge residues required to drive chain expansion 33,34,42. However, we emphasize that there is massive variability observed on a per-sequence basis. In summary, while the presence of charged and proline residues clearly influences IDR dimensions, complex patterns of intramolecular interactions can further tune this behavior 2,17,28.

In summary, the AFRC offers a convenient, analytical approach to obtain a well-defined reference state for comparing and contrasting simulations and experiments of unfolded and disordered proteins. It can be easily integrated into complex analysis pipelines, or used for one-off analysis via a Google Colab notebook without requiring any computational expertise at all.

Supplementary Material

1

ACKNOWLEDGEMENTS

We thank members of the Pappu lab and Holehouse lab for many useful discussions over the years. We are indebted to Dr. Nick Lyle for the original implementation of the CAMPARI-based FRC engine. We thank Dr. Erik Martin for bringing the work of Lhuillier to our attention. Funding for this work was provided by the National Institute on Allergic and Infectious Diseases with R01AI163142 to A.S.H. and A.S., by the National Science Foundation with 2128068 to A.S.H., by the Longer Life Foundation, an RGA/Washington University in St. Louis Collaboration to A.S.H., and by the National Cancer Institute with F99CA264413 to J.J.A. We also thank members of the Water and Life Interface Institute (WALII), supported by NSF DBI grant #2213983, for helpful discussions.

Footnotes

SUPPORTING INFORMATION

Supporting information contains additional analyses

REFERENCES

  • (1).Dill KA; Shortle D Denatured States of Proteins. Annu. Rev. Biochem 1991, 60, 795–825. [DOI] [PubMed] [Google Scholar]
  • (2).Mao AH; Lyle N; Pappu RV Describing Sequence–ensemble Relationships for Intrinsically Disordered Proteins. Biochem. J 2013, 449 (2), 307–318. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (3).Chan HS; Dill KA Polymer Principles in Protein Structure and Stability. Annu. Rev. Biophys. Biophys. Chem 1991, 20, 447–490. [DOI] [PubMed] [Google Scholar]
  • (4).Pappu RV; Wang X; Vitalis A; Crick SL A Polymer Physics Perspective on Driving Forces and Mechanisms for Protein Aggregation - Highlight Issue: Protein Folding. Arch. Biochem. Biophys 2008, 469 (1), 132–141. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (5).Schuler B; Soranno A; Hofmann H; Nettels D Single-Molecule FRET Spectroscopy and the Polymer Physics of Unfolded and Intrinsically Disordered Proteins. Annu. Rev. Biophys 2016, 45, 207–231. [DOI] [PubMed] [Google Scholar]
  • (6).Soranno A Physical Basis of the Disorder-Order Transition. Arch. Biochem. Biophys 2020, 685, 108305. [DOI] [PubMed] [Google Scholar]
  • (7).Lin Y-H; Forman-Kay JD; Chan HS Theories for Sequence-Dependent Phase Behaviors of Biomolecular Condensates. Biochemistry 2018, 57 (17), 2499–2508. [DOI] [PubMed] [Google Scholar]
  • (8).Thirumalai D; O’Brien EP; Morrison G; Hyeon C Theoretical Perspectives on Protein Folding. Annu. Rev. Biophys 2010, 39, 159–183. [DOI] [PubMed] [Google Scholar]
  • (9).Cubuk J; Soranno A Macromolecular Crowding and Intrinsically Disordered Proteins: A Polymer Physics Perspective. ChemSystemsChem 2022. 10.1002/syst.202100051. [DOI] [Google Scholar]
  • (10).Wilkins DK; Grimshaw SB; Receveur V; Dobson CM; Jones JA; Smith LJ Hydrodynamic Radii of Native and Denatured Proteins Measured by Pulse Field Gradient NMR Techniques. Biochemistry 1999, 38 (50), 16424–16431. [DOI] [PubMed] [Google Scholar]
  • (11).Kohn JE; Millett IS; Jacob J; Zagrovic B; Dillon TM; Cingel N; Dothager RS; Seifert S; Thiyagarajan P; Sosnick TR; Hasan MZ; Pande VS; Ruczinski I; Doniach S; Plaxco KW Random-Coil Behavior and the Dimensions of Chemically Unfolded Proteins. Proc. Natl. Acad. Sci. U. S. A 2004, 101 (34), 12491–12496. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (12).Damaschun G; Damaschun H; Gast K; Zirwer D; Bychkova VE Solvent Dependence of Dimensions of Unfolded Protein Chains. Int. J. Biol. Macromol 1991, 13 (4), 217–221. [PubMed] [Google Scholar]
  • (13).Calmettes P; Durand D; Desmadril M; Minard P; Receveur V; Smith JC How Random Is a Highly Denatured Protein? Biophys. Chem 1994, 53 (1–2), 105–113. [DOI] [PubMed] [Google Scholar]
  • (14).Mok YK; Kay CM; Kay LE; Forman-Kay J NOE Data Demonstrating a Compact Unfolded State for an SH3 Domain under Non-Denaturing Conditions. J. Mol. Biol 1999, 289 (3), 619–638. [DOI] [PubMed] [Google Scholar]
  • (15).Peran I; Holehouse AS; Carrico IS; Pappu RV; Bilsel O; Raleigh DP Unfolded States under Folding Conditions Accommodate Sequence-Specific Conformational Preferences with Random Coil-like Dimensions. Proc. Natl. Acad. Sci. U. S. A 2019, 116 (25), 12301–12310. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (16).Meng W; Luan B; Lyle N; Pappu RV; Raleigh DP The Denatured State Ensemble Contains Significant Local and Long-Range Structure under Native Conditions: Analysis of the N-Terminal Domain of Ribosomal Protein L9. Biochemistry 2013, 52 (15), 2662–2671. [DOI] [PubMed] [Google Scholar]
  • (17).Das RK; Ruff KM; Pappu RV Relating Sequence Encoded Information to Form and Function of Intrinsically Disordered Proteins. Curr. Opin. Struct. Biol 2015, 32 (0), 102–112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (18).Fuertes G; Banterle N; Ruff KM; Chowdhury A; Mercadante D; Koehler C; Kachala M; Estrada Girona G; Milles S; Mishra A; Onck PR; Gräter F; Esteban-Martín S; Pappu RV; Svergun DI; Lemke EA Decoupling of Size and Shape Fluctuations in Heteropolymeric Sequences Reconciles Discrepancies in SAXS vs. FRET Measurements. Proc. Natl. Acad. Sci. U. S. A 2017, 114 (31), E6342–E6351. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (19).Milles S; Mercadante D; Aramburu IV; Jensen MR; Banterle N; Koehler C; Tyagi S; Clarke J; Shammas SL; Blackledge M; Gräter F; Lemke EA Plasticity of an Ultrafast Interaction between Nucleoporins and Nuclear Transport Receptors. Cell 2015, 163 (3), 734–745. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (20).Gomes G-NW; Krzeminski M; Namini A; Martin EW; Mittag T; Head-Gordon T; Forman-Kay JD; Gradinaru CC Conformational Ensembles of an Intrinsically Disordered Protein Consistent with NMR, SAXS, and Single-Molecule FRET. J. Am. Chem. Soc 2020, 142 (37), 15697–15710. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (21).Zosel F; Mercadante D; Nettels D; Schuler B A Proline Switch Explains Kinetic Heterogeneity in a Coupled Folding and Binding Reaction. Nat. Commun 2018, 9 (1), 3332. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (22).Borgia A; Borgia MB; Bugge K; Kissling VM; Heidarsson PO; Fernandes CB; Sottini A; Soranno A; Buholzer KJ; Nettels D; Kragelund BB; Best RB; Schuler B Extreme Disorder in an Ultrahigh-Affinity Protein Complex. Nature 2018, 555 (7694), 61–66. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (23).Zheng W; Zerze GH; Borgia A; Mittal J; Schuler B; Best RB Inferring Properties of Disordered Chains from FRET Transfer Efficiencies. J. Chem. Phys 2018, 148 (12), 123329. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (24).Chung HS; Piana-Agostinetti S; Shaw DE; Eaton WA Structural Origin of Slow Diffusion in Protein Folding. Science 2015, 349 (6255), 1504–1510. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (25).Martin EW; Holehouse AS; Peran I; Farag M; Incicco JJ; Bremer A; Grace CR; Soranno A; Pappu RV; Mittag T Valence and Patterning of Aromatic Residues Determine the Phase Behavior of Prion-like Domains. Science 2020, 367 (6478), 694–699. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (26).Alston JJ; Soranno A; Holehouse AS Integrating Single-Molecule Spectroscopy and Simulations for the Study of Intrinsically Disordered Proteins. Methods 2021, 193, 116–135. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (27).Bottaro S; Lindorff-Larsen K Biophysical Experiments and Biomolecular Simulations: A Perfect Match? Science 2018, 361 (6400), 355–360. [DOI] [PubMed] [Google Scholar]
  • (28).Lalmansingh JM; Keeley AT; Ruff KM; Pappu RV; Holehouse AS SOURSOP: A Python Package for the Analysis of Simulations of Intrinsically Disordered Proteins. bioRxiv 2023. 10.1101/2023.02.16.528879. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (29).Rubinstein M; Colby RH Polymer Physics; Oxford University Press: New York, 2003. [Google Scholar]
  • (30).de Gennes PG Scaling Concepts in Polymer Physics; Cornell University Press: Ithaca, N.Y., 1979. [Google Scholar]
  • (31).Flory PJ Statistical Mechanics of Chain Molecules; Oxford University Press: New York, 1969. [Google Scholar]
  • (32).Hofmann H; Soranno A; Borgia A; Gast K; Nettels D; Schuler B Polymer Scaling Laws of Unfolded and Intrinsically Disordered Proteins Quantified with Single-Molecule Spectroscopy. Proc. Natl. Acad. Sci. U. S. A 2012, 109 (40), 16155–16160. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (33).Mao AH; Crick SL; Vitalis A; Chicoine CL; Pappu RV Net Charge per Residue Modulates Conformational Ensembles of Intrinsically Disordered Proteins. Proc. Natl. Acad. Sci. U. S. A 2010, 107 (18), 8183–8188. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (34).Müller-Späth S; Soranno A; Hirschfeld V; Hofmann H; Rüegger S; Reymond L; Nettels D; Schuler B Charge Interactions Can Dominate the Dimensions of Intrinsically Disordered Proteins. Proc. Natl. Acad. Sci. U. S. A 2010, 107 (33), 14609–14614. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (35).Ruff KM; Holehouse AS SAXS versus FRET: A Matter of Heterogeneity? Biophys. J 2017, 113 (5), 971–973. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (36).Song J; Gomes G-N; Shi T; Gradinaru CC; Chan HS Conformational Heterogeneity and FRET Data Interpretation for Dimensions of Unfolded Proteins. Biophys. J 2017, 113 (5), 1012–1024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (37).Stenzoski NE; Zou J; Piserchio A; Ghose R; Holehouse AS; Raleigh DP The Cold-Unfolded State Is Expanded but Contains Long- and Medium-Range Contacts and Is Poorly Described by Homopolymer Models. Biochemistry 2020, 59 (36), 3290–3299. [DOI] [PubMed] [Google Scholar]
  • (38).Canchi DR; García AE Cosolvent Effects on Protein Stability. Annual Reviews of Physical Chemistry 2013, 64, 273–293. [DOI] [PubMed] [Google Scholar]
  • (39).Borgia A; Zheng W; Buholzer K; Borgia MB; Schüler A; Hofmann H; Soranno A; Nettels D; Gast K; Grishaev A; Best RB; Schuler B Consistent View of Polypeptide Chain Expansion in Chemical Denaturants from Multiple Experimental Methods. J. Am. Chem. Soc 2016, 138 (36), 11714–11726. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (40).Zheng W; Borgia A; Buholzer K; Grishaev A; Schuler B; Best RB Probing the Action of Chemical Denaturant on an Intrinsically Disordered Protein by Simulation and Experiment. J. Am. Chem. Soc 2016, 138 (36), 11702–11713. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (41).Tran HT; Pappu RV Toward an Accurate Theoretical Framework for Describing Ensembles for Proteins under Strongly Denaturing Conditions. Biophys. J 2006, 91 (5), 1868–1886. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (42).Marsh JA; Forman-Kay JD Sequence Determinants of Compaction in Intrinsically Disordered Proteins. Biophys. J 2010, 98 (10), 2383–2390. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (43).Bernadó P; Svergun DI Structural Analysis of Intrinsically Disordered Proteins by Small-Angle X-Ray Scattering. Mol. Biosyst 2011, 8 (1), 151–167. [DOI] [PubMed] [Google Scholar]
  • (44).Riback JA; Bowman MA; Zmyslowski AM; Knoverek CR; Jumper JM; Hinshaw JR; Kaye EB; Freed KF; Clark PL; Sosnick TR Innovative Scattering Analysis Shows That Hydrophobic Disordered Proteins Are Expanded in Water. Science 2017, 358 (6360), 238–241. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (45).Aznauryan M; Delgado L; Soranno A; Nettels D; Huang J-R; Labhardt AM; Grzesiek S; Schuler B Comprehensive Structural and Dynamical View of an Unfolded Protein from the Combination of Single-Molecule FRET, NMR, and SAXS. Proc. Natl. Acad. Sci. U. S. A 2016, 113 (37), E5389–E5398. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (46).Bremer A; Farag M; Borcherds WM; Peran I; Martin EW; Pappu RV; Mittag T Deciphering How Naturally Occurring Sequence Features Impact the Phase Behaviours of Disordered Prion-like Domains. Nat. Chem 2022, 14 (2), 196–207. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (47).Martin EW; Holehouse AS Intrinsically Disordered Protein Regions and Phase Separation: Sequence Determinants of Assembly or Lack Thereof. Emerg Top Life Sci 2020, 4 (3), 307–329. [DOI] [PubMed] [Google Scholar]
  • (48).Holehouse AS; Pappu RV Collapse Transitions of Proteins and the Interplay Among Backbone, Sidechain, and Solvent Interactions. Annu. Rev. Biophys 2018, 47, 19–39. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (49).Huihui J; Ghosh K An Analytical Theory to Describe Sequence-Specific Inter-Residue Distance Profiles for Polyampholytes and Intrinsically Disordered Proteins. J. Chem. Phys 2020, 152 (16), 161102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (50).Sawle L; Ghosh K A Theoretical Method to Compute Sequence Dependent Configurational Properties in Charged Polymers and Proteins. J. Chem. Phys 2015, 143 (8), 085101. [DOI] [PubMed] [Google Scholar]
  • (51).Firman T; Ghosh K Sequence Charge Decoration Dictates Coil-Globule Transition in Intrinsically Disordered Proteins. J. Chem. Phys 2018, 148 (12), 123305. [DOI] [PubMed] [Google Scholar]
  • (52).Das S; Lin Y-H; Vernon RM; Forman-Kay JD; Chan HS Comparative Roles of Charge, π, and Hydrophobic Interactions in Sequence-Dependent Phase Separation of Intrinsically Disordered Proteins. Proc. Natl. Acad. Sci. U. S. A 2020, 117 (46), 28795–28805. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (53).Lin Y-H; Chan HS Phase Separation and Single-Chain Compactness of Charged Disordered Proteins Are Strongly Correlated. Biophys. J 2017, 112 (10), 2043–2046. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (54).Holehouse AS; Garai K; Lyle N; Vitalis A; Pappu RV Quantitative Assessments of the Distinct Contributions of Polypeptide Backbone Amides versus Side Chain Groups to Chain Expansion via Chemical Denaturation. J. Am. Chem. Soc 2015, 137 (8), 2984–2995. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (55).Martin EW; Holehouse AS; Grace CR; Hughes A; Pappu RV; Mittag T Sequence Determinants of the Conformational Properties of an Intrinsically Disordered Protein Prior to and upon Multisite Phosphorylation. J. Am. Chem. Soc 2016, 138 (47), 15323–15335. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (56).Holehouse AS; Sukenik S Controlling Structural Bias in Intrinsically Disordered Proteins Using Solution Space Scanning. J. Chem. Theory Comput 2020, 16 (3), 1794–1805. [DOI] [PubMed] [Google Scholar]
  • (57).Das RK; Huang Y; Phillips AH; Kriwacki RW; Pappu RV Cryptic Sequence Features within the Disordered Protein p27Kip1 Regulate Cell Cycle Signaling. Proc. Natl. Acad. Sci. U. S. A 2016, 113 (20), 5616–5621. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (58).Sherry KP; Das RK; Pappu RV; Barrick D Control of Transcriptional Activity by Design of Charge Patterning in the Intrinsically Disordered RAM Region of the Notch Receptor. Proc. Natl. Acad. Sci. U. S. A 2017, 114 (44), E9243–E9252. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (59).Robustelli P; Piana S; Shaw DE Developing a Molecular Dynamics Force Field for Both Folded and Disordered Protein States. Proc. Natl. Acad. Sci. U. S. A 2018, 115 (21), E4758–E4766. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (60).Holehouse AS; Das RK; Ahad JN; Richardson MOG; Pappu RV CIDER: Resources to Analyze Sequence-Ensemble Relationships of Intrinsically Disordered Proteins. Biophys. J 2017, 112 (1), 16–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (61).O’Brien EP; Morrison G; Brooks BR; Thirumalai D How Accurate Are Polymer Models in the Analysis of Forster Resonance Energy Transfer Experiments on Proteins? J. Chem. Phys 2009, 130 (12), 124903. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (62).Vitalis A; Pappu RV Chapter 3 Methods for Monte Carlo Simulations of Biomacromolecules. In Annual Reports in Computational Chemistry; Wheeler RA, Ed.; Elsevier, 2009; Vol. 5, pp 49–76. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (63).Vitalis A; Pappu RV ABSINTH: A New Continuum Solvation Model for Simulations of Polypeptides in Aqueous Solutions. J. Comput. Chem 2009, 30 (5), 673–699. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (64).Volkenstein MV Molecular Biophysics; Academic Press, New York, 1977. [Google Scholar]
  • (65).Das RK; Pappu RV Conformations of Intrinsically Disordered Proteins Are Influenced by Linear Sequence Distributions of Oppositely Charged Residues. Proc. Natl. Acad. Sci. U. S. A 2013, 110 (33), 13392–13397. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (66).Kentsis A; Mezei M; Gindin T; Osman R Unfolded State of Polyalanine Is a Segmented Polyproline II Helix. Proteins 2004, 55 (3), 493–501. [DOI] [PubMed] [Google Scholar]
  • (67).Crick SL; Jayaraman M; Frieden C; Wetzel R; Pappu RV Fluorescence Correlation Spectroscopy Shows That Monomeric Polyglutamine Molecules Form Collapsed Structures in Aqueous Solutions. Proc. Natl. Acad. Sci. U. S. A 2006, 103 (45), 16764–16769. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (68).Ruff KM; Choi YH; Cox D; Ormsby AR; Myung Y; Ascher DB; Radford SE; Pappu RV; Hatters DM Sequence Grammar Underlying the Unfolding and Phase Separation of Globular Proteins. Mol. Cell 2022, 82 (17), 3193–3208.e8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (69).Lhuillier D A Simple-Model for Polymeric Fractals in a Good Solvent and an Improved Version of the Flory Approximation. Journal De Physique 1988, 49 (5), 705–710. [Google Scholar]
  • (70).Nygaard M; Kragelund BB; Papaleo E; Lindorff-Larsen K An Efficient Method for Estimating the Hydrodynamic Radius of Disordered Protein Conformations. Biophys. J 2017, 113 (3), 550–557. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (71).Kirkwood JG; Riseman J The Intrinsic Viscosities and Diffusion Constants of Flexible Macromolecules in Solution. J. Chem. Phys 1948, 16 (6), 565–573. [Google Scholar]
  • (72).Pesce F; Newcombe EA; Seiffert P; Tranchant EE; Olsen JG; Grace CR; Kragelund BB; Lindorff-Larsen K Assessment of Models for Calculating the Hydrodynamic Radius of Intrinsically Disordered Proteins. Biophys. J 2022. 10.1016/j.bpj.2022.12.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (73).Marsh JA; Neale C; Jack FE; Choy W-Y; Lee AY; Crowhurst KA; Forman-Kay JD Improved Structural Characterizations of the drkN SH3 Domain Unfolded State Suggest a Compact Ensemble with Native-like and Non-Native Structure. J. Mol. Biol 2007, 367 (5), 1494–1510. [DOI] [PubMed] [Google Scholar]
  • (74).Bezsonova I; Singer A; Choy W-Y; Tollinger M; Forman-Kay JD Structural Comparison of the Unstable drkN SH3 Domain and a Stable Mutant. Biochemistry 2005, 44 (47), 15550–15560. [DOI] [PubMed] [Google Scholar]
  • (75).Demarest SJ; Martinez-Yamout M; Chung J; Chen H; Xu W; Dyson HJ; Evans RM; Wright PE Mutual Synergistic Folding in Recruitment of CBP/p300 by p160 Nuclear Receptor Coactivators. Nature 2002, 415 (6871), 549–553. [DOI] [PubMed] [Google Scholar]
  • (76).Wells M; Tidow H; Rutherford TJ; Markwick P; Jensen MR; Mylonas E; Svergun DI; Blackledge M; Fersht AR Structure of Tumor Suppressor p53 and Its Intrinsically Disordered N-Terminal Transactivation Domain. Proc. Natl. Acad. Sci. U. S. A 2008, 105 (15), 5762–5767. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (77).Longhi S; Receveur-Bréchot V; Karlin D; Johansson K; Darbon H; Bhella D; Yeo R; Finet S; Canard B The C-Terminal Domain of the Measles Virus Nucleoprotein Is Intrinsically Disordered and Folds upon Binding to the C-Terminal Moiety of the Phosphoprotein. J. Biol. Chem 2003, 278 (20), 18638–18648. [DOI] [PubMed] [Google Scholar]
  • (78).Syme CD; Blanch EW; Holt C; Jakes R; Goedert M; Hecht L; Barron LD A Raman Optical Activity Study of Rheomorphism in Caseins, Synucleins and Tau. New Insight into the Structure and Behaviour of Natively Unfolded Proteins. Eur. J. Biochem 2002, 269 (1), 148–156. [DOI] [PubMed] [Google Scholar]
  • (79).Theillet F-X; Binolfi A; Bekei B; Martorana A; Rose HM; Stuiver M; Verzini S; Lorenz D; van Rossum M; Goldfarb D; Selenko P Structural Disorder of Monomeric α-Synuclein Persists in Mammalian Cells. Nature 2016, 530 (7588), 45–50. [DOI] [PubMed] [Google Scholar]
  • (80).Moses D; Guadalupe K; Yu F; Flores E; Perez A; McAnelly R; Shamoon NM; Cuevas-Zepeda E; Merg AD; Martin EW; Holehouse AS; Sukenik S Structural Biases in Disordered Proteins Are Prevalent in the Cell. bioRxiv, 2022, 2021.11.24.469609. 10.1101/2021.11.24.469609. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (81).Ghosh K; Huihui J; Phillips M; Haider A Rules of Physical Mathematics Govern Intrinsically Disordered Proteins. Annu. Rev. Biophys 2022. 10.1146/annurev-biophys-120221-095357. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (82).Mohanty P; Shenoy J; Rizuan A; Mercado Ortiz JF; Fawzi NL; Mittal J Aliphatic Residues Contribute Significantly to the Phase Separation of TDP-43 C-Terminal Domain. bioRxiv, 2022, 2022.11.10.516004. 10.1101/2022.11.10.516004. [DOI] [Google Scholar]
  • (83).Murthy AC; Tang WS; Jovic N; Janke AM; Seo DH; Perdikari TM; Mittal J; Fawzi NL Molecular Interactions Contributing to FUS SYGQ LC-RGG Phase Separation and Co-Partitioning with RNA Polymerase II Heptads. Nat. Struct. Mol. Biol 2021, 28 (11), 923–935. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (84).Rekhi S; Devarajan DS; Howard MP; Kim YC; Nikoubashman A; Mittal J Role of Strong Localized vs. Weak Distributed Interactions in Disordered Protein Phase Separation. bioRxiv, 2023, 2023.01.27.525976. 10.1101/2023.01.27.525976. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (85).Griep S; Hobohm U PDBselect 1992–2009 and PDBfilter-Select. Nucleic Acids Res. 2009, 38 (Database issue), D318–D319. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (86).Joseph JA; Reinhardt A; Aguirre A; Chew PY; Russell KO; Espinosa JR; Garaizar A; Collepardo-Guevara R Physics-Driven Coarse-Grained Model for Biomolecular Phase Separation with near-Quantitative Accuracy. Nat Comput Sci 2021, 1 (11), 732–743. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (87).Dill K; Bromberg S Molecular Driving Forces: Statistical Thermodynamics in Biology, Chemistry, Physics, and Nanoscience; Garland Science, 2010. [Google Scholar]
  • (88).Martin EW; Hopkins JB; Mittag T Small-Angle X-Ray Scattering Experiments of Monodisperse Intrinsically Disordered Protein Samples close to the Solubility Limit. Methods Enzymol. 2021, 646, 185–222. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (89).González-Foutel NS; Glavina J; Borcherds WM; Safranchik M; Barrera-Vilarmau S; Sagar A; Estaña A; Barozet A; Garrone NA; Fernandez-Ballester G; Blanes-Mira C; Sánchez IE; de Prat-Gay G; Cortés J; Bernadó P; Pappu RV; Holehouse AS; Daughdrill GW; Chemes LB Conformational Buffering Underlies Functional Selection in Intrinsically Disordered Protein Regions. Nat. Struct. Mol. Biol 2022, 29 (8), 781–790. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (90).Zerze GH; Best RB; Mittal J Sequence- and Temperature-Dependent Properties of Unfolded and Disordered Proteins from Atomistic Simulations. J. Phys. Chem. B 2015, 119 (46), 14622–14630. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (91).Sørensen CS; Kjaergaard M Effective Concentrations Enforced by Intrinsically Disordered Linkers Are Governed by Polymer Physics. Proc. Natl. Acad. Sci. U. S. A 2019, 116 (46), 23124–23131. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (92).Riback JA; Katanski CD; Kear-Scott JL; Pilipenko EV; Rojek AE; Sosnick TR; Drummond DA Stress-Triggered Phase Separation Is an Adaptive, Evolutionarily Tuned Response. Cell 2017, 168 (6), 1028–1040.e19. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1

RESOURCES