Abstract
Intrinsically disordered protein regions (IDRs) make up over 30% of the human proteome and exist in a dynamic conformational ensemble instead of a native, well-folded structure. Tethering IDRs to a surface (for example, the surface of a well-folded region of the same protein) can reduce the number of accessible conformations in these ensembles. This reduces the ensemble’s conformational entropy, generating an effective entropic force that pulls away from the point of tethering. Recent experimental work has shown that this entropic force causes measurable, physiologically relevant changes to protein function. But how the magnitude of this force depends on IDR sequence remains unexplored. Here, we use all-atom simulations to analyze how structural preferences in IDR ensembles contribute to the entropic force they exert upon tethering. We show that sequence-encoded structural preferences play an important role in determining the magnitude of this force: compact, spherical ensembles generate an entropic force that can be several times higher than more extended ensembles. We further show that changes in the surrounding solution’s chemistry can modulate the IDR entropic force strength. We propose that the entropic force is a sequence-dependent, environmentally tunable property of terminal IDR sequences.
Introduction
Intrinsically disordered proteins and protein regions (IDRs) do not have a native structure. Instead, IDRs exist in a constantly interchanging conformational ensemble that contains transient and relatively weak intramolecular interactions. These interactions define the structural preferences and the resulting average shape of the ensemble. Decades of work have linked the structural preferences of IDRs to their biological functions.1−4
Unlike well-folded proteins, IDR ensembles have a high conformational entropy. This conformational entropy can be reduced by covalently linking or tethering the IDR through one of its termini to a surface (Figure 1A). In this case, entropy is reduced due to the constraint placed upon the ensemble by the surface it is tethered to. As a result, upon tethering, an IDR will try to maximize its conformational entropy by producing an effective force that pulls up and away from the point of tethering, gaining entropy by increasing its number of accessible conformations generating an entropic force (Figure 1B).5,6
This tethering scenario may seem rare when considering naturally occurring proteins, but it is rather common: in eukaryotes, IDRs are often tethered to a more rigid surface that constrains the chain’s conformational entropy, and this tethering results in measurable effects. For example, IDRs tethered to a cell membrane can sense the curvature of the membrane and help to facilitate the endocytosis process through entropic force.7−10 The same entropic force can also help translocate IDRs through the bacterial cell wall to the extracellular environment, an essential process for bacterial infection.11,12 An even more prevalent scenario occurs when disordered N- or C-terminal IDRs are attached to a well-folded protein region (Figure 1A). The entropic force exerted by such disordered terminal regions can influence protein function, including ligand binding affinity and thermodynamic stability.13,14 These examples suggest that entropic force may be an important and prevalent mechanism unique to IDRs that mediates biological function.
To address the dynamics of IDR ensembles, previous studies have successfully applied analytical polymer models to describe IDR structural preferences (e.g., self-avoiding random chains and worm-like chains). These models can predict the average properties of IDR ensembles and have been systematically validated using experimental methods including small-angle X-ray scattering and single-molecule FRET experiments.3,15−21 In addition, polymer models have been used previously to understand how chains exert an entropic force. Beyond the steric and chemical features of the monomers, these studies have implicated the length and the geometry of the constraining surface as major factors affecting polymer entropic force strength.5,6,22
Thus, previous entropic force studies of IDRs also focused on the role of sequence length and the geometry of the constraining surface.9,11,13 IDR length is indeed a critical factor in determining entropic force magnitude8,13 since the longer the chain, the higher the number of conformations available. But is chain length always the most dominant factor affecting entropic force magnitude? Previous research has shown that, unlike homopolymers, IDR ensembles have distinct sequence-encoded structural preferences.17,18,20,23−28 These structural biases affect the average shape occupied by IDR ensembles,29,30 but their role in determining IDR entropic force strength has not been tested.
To link IDR structural biases with entropic force strength, we use all-atom Monte Carlo simulations to sample the conformational ensembles of over 90 experimentally validated IDR sequences. To gauge the magnitude of the entropic force sequences can exert, we measure the reduction in the number of allowed conformations upon tethering their ensembles to a flat surface. Our simulations show that the entropic force depends not only on the length of the IDR but also on its sequence-encoded ensemble shape, with more compact ensembles exerting a stronger entropic force. To further test this finding, we alter the dimensions of each ensemble by changing their interaction with the surrounding solution (while keeping the sequence intact). We show that solution-induced compaction also increases the entropic force but only for a subset of the sequences. Our findings reveal how sequence-encoded intramolecular and protein:solution interactions combine to modulate the magnitude of the entropic force exerted by a tethered IDR. They also suggest that the entropic force can be tuned by evolution to exert an optimized effect on full-length proteins.
Materials and Methods
Intrinsically Disordered Protein Prediction with the AlphaFold Database
Systematic evaluations of AlphaFold2 (AF2) previously showed that it is a good predictor of intrinsically disordered regions.31−33 We downloaded the predicted structures of three different proteomes (Saccharomyces cerevisiae: UP000002311, Arabidopsis thaliana: UP000006548, Homo sapiens: UP000005640) from the AF2 database version 3.34 The disorder predictions are obtained from the AF2’s pLDDT score. Based on a previous report,31 we used 30 consecutive residues with pLDDT <50% as an indicator for IDRs. Detected IDRs are labeled as terminal if they start at the N-terminal or end at the C-terminal of the protein in the AF2 database.
All-Atom Monte Carlo Simulation
All IDRs were simulated with the ABSINTH implicit solvent force field using the CAMPARI simulation suite v2_09052017.35 We chose the ABSINTH force field and the CAMPARI simulation suite because of their extensive benchmarking and their computational efficiency, allowing us to simulate sequences that are 30–100 residues long in 2–4 days on a single processor. The example parameter file and simulation settings are provided in the GitHub repository. Simulations were conducted at 310 K with 107 steps of equilibration. After equilibration, conformations were written every 12,500 steps. For each IDR, we performed five independent simulations with ∼5600 individual conformations in each repeat. This leads to a total of ∼28,000 conformations for each IDR (details in Table S1). For PUMA scrambles, we performed three individual repeats, leading to ∼16,800 conformations.
Calculation of Ensemble Properties
Normalized End-to-End Distance
The end-to-end distance Ree of polymers can be calculated based on the number of residues in the chain (N) using Ree = R0Nν.17 The scaling law ν can have a range of fractional values. Specifically, ν = 0.59 for expanded chains, ν = 0.5 for ideal (or θ-state) polymers, and ν = 0.33 for collapsed/compacted polymers.17,36 For homopolymers, the prefactor R0 is constant and depends on the segment length of the monomer.17,37 Seven glycine–serine dipeptide repeat (GS repeats) sequences with 8, 16, 24, 32, 40, 48, and 64 GS segments were simulated and analyzed as described above with five individual repeats. We use GS repeats because they maintain a consistent, experimentally validated point of reference for the entire dataset. The GS repeat Ree data were fitted using the Scipy curve_fit function to the power law equation above (Figure S1). The results of the fit gave a prefactor R0 = 0.55 ± 0.06 nm and an exponent ν = 0.48 ± 0.03 for GS repeats, demonstrating ideal polymer behavior. This is in agreement with previous experimental data.38,39 We use this fitted curve to normalize IDR Ree for comparison across IDRs of different lengths. We interpolate and extrapolate corresponding GS repeats Ree based on the length of the IDR of interest. We calculate the normalized Ree using the following equation.
1 |
Here, R̅ee is the normalized end-to-end distance, Ree is the end-to-end distance of the target IDR, and ReeGS is the calculated end-to-end distance of a GS repeat sequence of the same length as the target IDR (obtained from the fit shown in Figure S1).
Asphericity
IDR ensemble properties were analyzed using the MDTraj python library.40Ree was calculated between the Cα of the first and last residues of the IDR. Helicity was calculated using the DSSP algorithm integrated into MDTraj.41 Asphericity was calculated using the gyration tensor of the simulated IDR ensemble as described previously.42−44
2 |
Here, δ is the asphericity and λ1,2,3 are the three principal moments of the gyration tensor. The standard deviation of all of these properties was calculated based on the averages from the five independent repeats. Analysis scripts are available at the accompanying GitHub repository at https://github.com/sukeniklab/Entropic_Force.
Entropy Analysis
To calculate the effect of tethering on IDR conformational entropy, we count the number of allowed conformations in the ensemble upon tethering (Figure 1B). To do this, we first tether each conformation of each simulated IDR ensemble to a single point on a flat surface and then calculate the number of allowed conformations ΩT from the total number of conformations in the simulated ensemble ΩU. Tethering is done relative to the first, second, and third Cα coordinates of each conformation, labeled here as A, B, and C, respectively (Figure 1C). For each conformation, we move A to the origin of the coordinate system. We plot the constraining surface, Sc, perpendicular to the surface containing atoms A, B, and C.
Enhanced Sampling
In order to better understand the spatial relationship between the ensemble and the constraining surface, we perform several geometric transformations on each sampled conformation for calculating ΩT: (1) To account for the possibility of stretching at the point of tethering, we vary the distance d between point A and Sc. (2) To account for the possibility of rotation around the point of tethering, we vary the half-angle θ formed between the norm vector to Sc with vector AB. (3) We rotate the vector AB with an angle ϕ. All of the coordinates specified here are illustrated in Figure 1C. In total, we make 36 transformations (3 values for d, 3 values for θ, and 6 values for ϕ) for each conformation of each simulated ensemble.
Entropy Calculation
We consider the interaction between the IDR and the constraining surface as a hard sphere interaction. Accessible conformations are defined as those with no Cα that is positioned below the constraining surface. We use the dot product between the norm vector of Sc and the coordinate of Cα to calculate and determine the relative position of the Cα to the surface Sc. We then count the number of all accessible conformations in the tethered, original ensemble and all d, θ, and ϕ permutations. Finally, we sum the number of accessible states from these perturbations and calculate the entropic force strength. The entropic force is then given by13
3 |
Here, kB is the Boltzmann constant, ΩT is the total number of possible IDR conformations when the ensemble is tethered to a surface, and ΩU is the total number of conformations sampled for the same IDR ensemble when untethered. The entropic force strength is proportional to ΔS. The transformation and analysis scripts are provided as Jupyter notebooks at https://github.com/sukeniklab/Entropic_Force.
Ensemble XZ-Projections
For each IDR conformation, we move A to the origin of the coordinate system and rotate the conformation to make AB fall on the Z-axis (Z > 0). The XZ-coordinate of each Cα will provide an ensemble projection of the IDR ensemble on the XZ plane. The Cα density was normalized by the number of amino acids in the sequence, the frame number of trajectories, and the bin size.
Solution Space Scanning Simulations
Solution space scanning simulations are conducted as described previously.23,35,45 Briefly, we modify the effective Hamiltonian of the ABSINTH force field to alter protein backbone:solvent interactions. The ABSINTH hamiltonian is a sum of four energy terms
4 |
ULJ, Wel, and Ucorr represent Lennard-Jones (LJ) potential, electrostatic interaction, and torsional correction terms for dihedral angles, respectively. Wsolv is the solvation free energy and is equal to the transfer free energy between vacuum and diluted aqueous solution. Changing the free energy term Wsolv results in a change in the protein:solvent relative interaction strength, defined by
5 |
Wsolvmax is the solvation free energy calculated based on fully extended protein conformation in different solution conditions. Negative values of the protein:solvent interaction represent solutions that are attractive to the protein backbone, such as urea solutions, while positive values represent solutions that are repulsive to the protein backbone, such as those containing protective osmolytes. A value of 0 represents a buffered, aqueous solution with no cosolutes. We simulated seven different solution conditions for each IDR with a protein:solvent relative interaction strength ranging from +3% (equivalent roughly to 1 M TMAO) to −3% (equivalent roughly to 1.5 M urea).45 It is important to note, however, that even the most attractive solutions used here are not sufficient to unfold well-folded protein domains. We use the same temperature and sampling method for each solution condition as we do for aqueous solutions. The simulation averages of ensemble properties and entropic force in all solution conditions, as well as sequence details, are reported for all IDRs in Table S1.
Limitations and Drawbacks of Entropic Force Calculations
In our calculations, we completely neglect any interactions between the IDR and the surface other than steric, hard-core repulsions. We also assume that the constraining surface is completely flat. In the context of an actual, full-length protein, constraining surfaces will have distinct chemical moieties, including hydrophobic, polar, and charged residues. Specific surface chemistries will introduce an enthalpic component to the free energy change upon IDR tethering, which can alter and sometimes completely reverse the force induced by tethering. These effects are very important as shown in several cases, especially when charges are introduced.8,46,47
Another limitation is that the constraining surface we use is fixed, flat, and does not change over time. The surface of folded domains displays irregular shapes and fluctuations and motions that may change the number of allowed conformations or change the overall entropy of the entire system, which we did not consider here. Indeed, some of the solution chemistry changes we use in this work may also act to alter these fluctuations.
To mitigate these limitations, we stress that the entire dataset was obtained using the same methods and analysis and compared against the same GS repeat benchmarks. This self-consistency is what allows us to probe the role of the ensemble itself on the entropic force, all other factors being held constant.
Results and Discussion
Human Proteome Is Rich in Disordered Terminal Sequences
We define terminal IDRs as those that exist at the N- or C-termini of proteins. We reason that with one free end, such IDRs can exert an entropic force against the more rigid, folded protein domain to which they are connected (Figure 1A). To see if terminal IDRs are common in proteomes, we tested their prevalence in the yeast, arabidopsis, and human proteomes using the AlphaFold Protein Structure Database v348 (Figure 2A). The confidence score of AlphaFold2 (pLDDT) has been previously shown to be a good indicator of potential disordered regions31 and so was used to identify disordered regions in the three proteomes. A protein segment was marked as disordered when it had more than 30 consecutive residues with a “very low” pLDDT score (<50%). For the proteomes we tested, over 40% of proteins have at least one disordered segment, in line with previous studies49 (Figure 2A, left). In the human proteome specifically, over half of the proteins that contain IDRs have at least one at either the N- or C-terminal (Figure 2A, right). This result indicates that terminal-tethered IDRs exist widely in eukaryotes and that the entropic force scenario described above can occur in many proteins.
Based on past work, we reasoned that length is a factor that contributes strongly to the entropic force mechanism in these IDRs.8,13 We therefore wanted to test if there is a significant difference in the length distribution of terminal vs nonterminal IDRs.50,51 Our analysis reveals that the length distribution is roughly the same between terminal and nonterminal IDRs (Figure 2B).
IDR Simulation Database Reveals Structural Diversity
With IDR sequence length being roughly the same in both terminal and nonterminal sequences, we turned our attention to the structural preferences of their ensemble. Ensemble average end-to-end distance (Ree) has been widely used to quantify the global dimensions and the internal structure of dynamic IDR conformational ensembles.17,18 Since ensemble dimensions cannot be accurately predicted from the sequence, we used the ABSINTH force field to gain an atomic-level simulation of over 90 IDR ensembles. Most of these sequences are experimentally validated IDR sequences from the DisProt database52 (Table S1). These sequences have a diverse distribution of properties including the length, fraction of charged residues (FCR), and net charge per residue (NCPR) (Figure 3A–C).
Simulations reveal a large distribution of Ree (sometimes varying by more than a factor of 2 for sequences with the same number of amino acids), indicating distinct structural preferences in these sequences. To compare different IDRs of various lengths across the proteome, we use GS repeats as a homopolymer point of reference. It has been shown experimentally that GS repeats have a similar ensemble to an ideal homopolymer (a polymer where Ree scales as N0.5).17,36,53 We simulated several different lengths of GS repeat sequences using the ABSINTH force field. Our simulation data shows that Ree of GS repeats follows a scaling law with an exponent of 0.48 ± 0.03 (Figure S1), which matches previously reported experimental results.38 Our analysis shows that a large majority of the sequences measured deviate from the GS repeat line (Figure 3D).
Quantifying the Entropic Force of Disordered Ensembles Using Enhanced Sampling
We next wanted to probe if these structural preferences alter the magnitude of the entropic force these sequences exert. To assess how ensemble structural preferences change the entropic force, we quantified the change in IDR conformational entropy upon tethering the simulated ensemble to a flat surface and the change in allowed conformations/accessible states (as described in the Materials and Methods section and in eq 3). The change in conformational entropy upon tethering, ΔS/kB, is directly correlated to the magnitude of the entropic force (Figure 1B).
To obtain the number of allowed conformations in the tethered state, ΩT, we tethered our simulated IDR conformational ensemble to a flat surface through the N-terminal Ca. Beyond the conformations included in the ensemble, the geometry of the tethering point can also affect the magnitude of ΔS/kB. To account for this, we introduced an enhanced sampling method to vary tethering configurations and measure the entropic force at various ensemble orientations relative to the tethered surface (Figure 1C, Materials and Methods section). With these variations, we generate additional conformations and plot an accessible state heatmap to visualize the number of allowed conformations in each orientation (Figure 4A). To obtain a measure of the entropic force that will be comparable between all sequences, we sum the number of allowed conformations in all different orientations to provide a single entropic force strength for each sequence (Figure 4B).
Validation of the Entropic Force Calculation Using Experimental Data
Several studies have highlighted the importance of IDR length on the entropic force it exerts.9,11 A recent study by Keul et al. demonstrated that the length of a terminal IDR tail was the only factor determining its functional effect on the folded enzyme to which it was tethered.13 The study focused on the C-terminal IDR of a key glycolytic enzyme, UDP-glucose 6-dehydrogenase (UGDH). The study showed that the C-terminal IDR acts, through the entropic force it exerts, as an allosteric switch that alters the affinity of the protein to its allosteric feedback inhibitor UDP-xylose. The authors discovered that the entropic force (and the measured binding affinity) depends solely on the length of the terminal IDR and not on its amino acid composition or sequence (Table S1). As a test of our method, we wanted to see if this length-dependent behavior for the UGDH IDR sequence is reproduced in our simulations.
The homopolymeric GS repeat entropic force was fitted to an exponential decay function, indicating that it is solely determined by the sequence length.5 In agreement with Kuel et al.’s observations, UGDH-derived sequences of different lengths also fell on the same line as the GS repeats (Figure 4B). This indicates that the terminal UGDH IDR has entropic force strength similar to that of a homopolymer. However, UGDH might be a special case resulting from the specific amino acid composition. Indeed, two other IDR sequences display significantly different ΔS/kB despite having the same number of residues (Figure 4B). For example, we selected a disordered region of the type II methyltransferase (M.PvuII, Disprot ID: DP00060r010) from the DisProt database and compared it to the C-terminal intracellular region of the mu-type opioid receptor (MOR-1, Disprot ID: DP00974r002). Both sequences are 38 residues long. Despite this, the C-terminal region of the MOR-1 has half as many accessible states as M.Pvull when tethered to a constraining surface, generating a stronger entropic force (Figure 4B).
Is the magnitude of the entropic force dependent on amino acid composition alone, or on the sequence of the IDR? To answer this question, we generated a library of scrambled sequences of a naturally occurring sequence, the BH3 IDR domain of the p53-upregulated modulator of apoptosis (PUMA)54 (Figure 4C). Despite having the same sequence length and same amino acid composition, scrambles of the PUMA sequence demonstrated a significant difference in entropic force strength. The maximum entropic force of PUMA scrambles is more than two times the minimum force. We observed that scrambled sequences can exert both a stronger and a weaker entropic force upon tethering compared to the wild-type sequence. This result suggests that the order of amino acids in an IDR sequence, and not just amino acid composition, plays a vital role in determining entropic force strength (Figure 4C).
Overall, our simulations recapitulated experimental observables, which implicate IDR length as a key factor affecting IDR entropic force, and also highlighted the role of amino acid composition and sequence in the magnitude of this force.
Systematic Analysis of IDR Entropic Force
We next wanted to understand the role that sequence plays in determining IDR entropic force. We looked for sequence feature correlations with entropic force but found no strong correlations with most individual sequence features18,19 (Figure S2). One exception was the hydropathy decoration parameter proposed by Mittal and co-workers,21 which showed a strong negative correlation with entropic force (Figure S2H), though this can be largely attributed to the length dependence of this metric (Figure S2I). We therefore focused our attention on IDR ensemble dimensions, which are encoded in the sequence but are difficult to predict from the structure.3,18 We applied our enhanced sampling analysis to 94 IDR sequences obtained from the DisProt database. We observed IDRs generating both higher and lower entropic force compared to GS repeats, despite having the same length (Figure 5A).
To ascertain how ensemble dimensions may play a role in determining ΔS/kB, we must first find a way to compare the ensembles of IDRs of various lengths. To do this, we normalize the average Ree of all IDRs against the Ree of a GS repeat sequence of the same length to get normalized end-to-end distance R̅ee (eq 1, Materials and Methods section). R̅ee has a negative value when the ensemble is more compact than a GS repeat and a positive value when an ensemble is more expanded. We plot ΔS/kB for each sequence as a function of this normalized distance in Figure 5B. It is immediately noticeable that the vertical red line drawn at R̅ee = 0 separates sequences with a higher entropic force (purple markers) from those with a weaker entropic force (green markers). This means ensembles that are on average more compact than an equivalent GS repeat (as indicated by a negative R̅ee) tend to generate a stronger entropic force, while more expanded ensembles tend to generate a weaker entropic force than equivalent GS repeats.
This seemed counterintuitive since our initial thought was that an expanded ensemble should take up more space and would therefore lose more conformational entropy upon tethering to the constraining surface. However, a more expanded ensemble will tend to have a higher persistence length and a more ellipsoid shape.55 These properties mean that the backbone will point away from the tethered surface (because of this longer persistence length), reducing the number of conformations that will sterically clash with the surface. To validate this hypothesis, we calculated the average asphericity of the IDR ensemble.42 Similar to R̅ee, ensembles with low asphericity have a lower entropic force, and ensembles with a high asphericity have a stronger entropic force than that of GS repeats (Figures S3 and S4). This suggests that a more spherical ensemble tends to have a higher possibility of clashing with the constraining surface and thus generates a stronger entropic force, while a more elongated ensemble tends to have less interaction with the constraining surface. To verify this, we visualized the position of Cα atoms on an XZ plane that is normal to the constraining surface for several sequences (Figure 5C). This visualization highlights how spherical ensembles with a low asphericity tend to have more atoms at or under the constraining surface (located at Z = 0), while ellipsoidal ensembles with a high asphericity tend to expand with a higher atom density above the constraining surface.
Changes in Solution Chemistry Alter IDR Entropic Force Strength
An alternative way to change ensemble dimensions and one that does not involve a change in the IDR sequence is to expose IDRs to different solution environments.23,45 We designed the solution space scanning method to simulate IDR ensemble structural preferences under changing solution conditions.45 Briefly, the method alters IDR ensembles by tuning the protein backbone:solution interactions of the ABSINTH force field to be more or less repulsive than the value for water (see the Materials and Methods section). Usually, IDRs have a more compact conformational ensemble in repulsive solutions (e.g., in the presence of an osmolyte or a more crowded environment). In attractive solutions (e.g., urea or other denaturants), IDRs have an expanded conformational ensemble. However, this general trend can be mitigated and sometimes even reversed based on the IDR sequence.23,24,45
To see how solution-induced changes in the ensemble affect entropic force, we used solution space scanning to simulate the ensemble average Ree of the proteins shown in Figure 5A in seven different solution conditions. We observed significant compaction of the ensemble in the repulsive solution, and the ensemble change is correlated with protein:solvent interaction strength (Figure 6A). To quantify how ΔS changes with solution condition change, we use the change in entropic force between the solute and buffer with the following equation
6 |
Here, ΔΔS/kB represents the change in the entropic force in different protein:solution interactions. We calculate the entropic force change between the buffer/aqueous condition and other solution conditions. Our analysis shows that, on average, IDRs will generate a stronger entropic force when their ensemble is compacted due to the presence of a repulsive solution (Figure 6B). This result strengthens our conclusion that compact IDR ensembles tend to exert a larger entropic force than extended ensembles.
However, not every IDR is sensitive to solution condition changes. We observed that some IDRs do not have a significant entropic force change, despite significant changes in their ensemble (Figure 6C,D). For example, M.PvuII displays a significant change in Ree but almost no change in entropic force (Figure 6C). On the other hand, the ensemble of the IDR of the regulator of nonsense transcripts 2 (hUpf2, Disprot ID: DP00949r013) is very sensitive to solution changes, and ΔS/kB changes accordingly. This suggests that solution-driven changes in entropic force response are highly sequence-dependent. Different sequences encode diverse structural ensembles that in turn influence IDR environmental response. Interestingly, the UGDH IDR has a low sensitivity of entropic force despite its high sensitivity ensemble. Considering that UGDH performs an allosteric function through its entropic force, this indicates that some sequences may evolve to generate a stable entropic force for performing their function. This has been previously proposed as a general property of proline-rich domains in the Wnt signaling pathway.56
Conclusions
Here, we report on a computational method to quantify the conformational entropic force of tethered IDRs using all-atom Monte Carlo simulations. Compared to coarse-grained or analytical models, this method offers an accurate, quantitative metric of how IDR entropic force is determined by sequence-encoded conformational ensemble preferences. Our method is compared with and qualitatively matches previously published experimental measurements of entropic force (Figure 4B). Our results also support the current literature and highlight that IDR sequence length is indeed a key factor in the entropic force it exerts (Figure 4B). Despite its drawbacks and limitations (see the Materials and Methods section), our method offers an accessible description of the entropic force, which is computationally easy to calculate and a self-consistent dataset from which we draw conclusions linking between IDR sequence and entropic force.
Our simulations show that there is more to the story of entropic force than just the length of the sequence. We reveal that IDR structural preferences can determine the magnitude of entropic force strength. We show that the structural preferences of IDR ensembles are encoded not just in amino acid composition but also in their arrangement in the sequence, which can be an important factor in determining entropic force strength. Perhaps counterintuitively, we find that more expanded IDR ensembles can exert a weaker entropic force than more compact IDR ensembles when tethered to a flat surface (Figures 5A,B and S4).
We also show that the entropic force exerted by an IDR can change when the surrounding chemical environment changes. By modulating protein backbone:solvent interactions, we altered IDR ensembles and showed that the entropic force magnitude of most IDRs increased as their ensembles became more compact, validating the trend shown for different sequences (Figure 6B). This result also suggests the possibility of manipulating IDR entropic force by altering the physical–chemical composition of the cellular environment.23,24,57
Since the dimensional properties of IDR sequences are sequence-encoded,58 we propose that some sequences have evolved to exert an outsized entropic force on the protein they are tethered to, while other sequences have evolved to exert a weak force. Our study further suggests that this entropic force can be modulated by post-translational modifications, binding of small molecules or other proteins,59 and changes in the cellular environment that are known to alter IDR ensembles.1,2,24,60 Taken together, the entropic force is a sequence-encoded, tunable function that may be more common than previously realized in IDR-containing proteins.
Acknowledgments
The authors thank David Moses for his comments and suggestions. They also thank members of the Water and Life Interface Institute (WALII), supported by NSF DBI Grant 2213983, for helpful discussions. Research reported in this publication was supported by the NIH under Award R35GM137926 and NSF IntBIO Grant 2128067 to S.S. F.Y. was funded by a computational fellowship from the NSF-CREST Center for Cellular and Biomolecular Machines at UC Merced (NSF-HRD-1547848 and NSF-HRD-2112675). The authors acknowledge computing time on the MERCED cluster at UC Merced, NSF Grant ACI-1429783, on the XSEDE computational infrastructure framework, Grant TG-MCB190103 to SS, supported by NSF Grant ACI-154856, and on the Expanse cluster through HPC@UC, supported by NSF Grant OAC-1928224.
Supporting Information Available
The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.jpcb.3c00698.
Average end-to-end distance from five repeats vs the total number of residues for a series of Gly-Ser repeats (Figures S1); sequence features correlated with the entropic force strength (Figure S2); average asphericity of GS repeats vs the number of residues in the sequence (Figure S3); and entropic force as a function of average asphericity (Figure S4) (PDF)
Table S1 contains all IDR sequences, the ensemble analysis result, and the entropic force analysis result of the simulation database. All code and data used to generate the figures in this manuscript is provided at https://github.com/sukeniklab/Entropic_Force (TXT)
The authors declare no competing financial interest.
Special Issue
Published as part of The Journal of Physical Chemistry virtual special issue “Early-Career and Emerging Researchers in Physical Chemistry Volume 2”.
Supplementary Material
References
- Wright P. E.; Dyson H. J. Intrinsically Unstructured Proteins: Re-Assessing the Protein Structure-Function Paradigm. J. Mol. Biol. 1999, 293, 321–331. 10.1006/jmbi.1999.3110. [DOI] [PubMed] [Google Scholar]
- Dyson H. J.; Wright P. E. Intrinsically Unstructured Proteins and Their Functions. Nat. Rev. Mol. Cell Biol. 2005, 6, 197–208. 10.1038/nrm1589. [DOI] [PubMed] [Google Scholar]
- van der Lee R.; Buljan M.; Lang B.; Weatheritt R. J.; Daughdrill G. W.; Dunker A. K.; Fuxreiter M.; Gough J.; Gsponer J.; Jones D. T.; et al. Classification of Intrinsically Disordered Regions and Proteins. Chem. Rev. 2014, 114, 6589–6631. 10.1021/cr400525m. [DOI] [PMC free article] [PubMed] [Google Scholar]
- González-Foutel N. S.; Glavina J.; Borcherds W. M.; Safranchik M.; Barrera-Vilarmau S.; Sagar A.; Estaña A.; Barozet A.; Garrone N. A.; Fernandez-Ballester G.; et al. Conformational Buffering Underlies Functional Selection in Intrinsically Disordered Protein Regions. Nat. Struct. Mol. Biol. 2022, 29, 781–790. 10.1038/s41594-022-00811-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Polson J. M.; MacLennan R. G. Entropic Force of Cone-Tethered Polymers Interacting with a Planar Surface. Phys. Rev. E 2022, 106, 024501 10.1103/PhysRevE.106.024501. [DOI] [PubMed] [Google Scholar]
- Maghrebi M. F.; Kantor Y.; Kardar M. Entropic Force of Polymers on a Cone Tip. Europhys. Lett. 2011, 96, 66002. 10.1209/0295-5075/96/66002. [DOI] [Google Scholar]
- McMahon H. T.; Gallop J. L. Membrane Curvature and Mechanisms of Dynamic Cell Membrane Remodelling. Nature 2005, 438, 590–596. 10.1038/nature04396. [DOI] [PubMed] [Google Scholar]
- Zeno W. F.; Thatte A. S.; Wang L.; Snead W. T.; Lafer E. M.; Stachowiak J. C. Molecular Mechanisms of Membrane Curvature Sensing by a Disordered Protein. J. Am. Chem. Soc. 2019, 141, 10361–10371. 10.1021/jacs.9b03927. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zeno W. F.; Baul U.; Snead W. T.; DeGroot A. C. M.; Wang L.; Lafer E. M.; Thirumalai D.; Stachowiak J. C. Synergy between Intrinsically Disordered Domains and Structured Proteins Amplifies Membrane Curvature Sensing. Nat. Commun. 2018, 9, 4152 10.1038/s41467-018-06532-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fakhree M. A. A.; Blum C.; Claessens M. M. A. E. Shaping Membranes with Disordered Proteins. Arch. Biochem. Biophys. 2019, 677, 108163 10.1016/j.abb.2019.108163. [DOI] [PubMed] [Google Scholar]
- Halladin D. K.; Ortega F. E.; Ng K. M.; Footer M. J.; Mitić N. S.; Malkov S. N.; Gopinathan A.; Huang K. C.; Theriot J. A.. Entropy-Driven Translocation of Disordered Proteins through the Gram-Positive Bacterial Cell Wall. bioRxiv, 2020, 10.1101/2020.11.24.396366. [DOI] [PMC free article] [PubMed]
- Pizarro-Cerdá J.; Cossart P. Bacterial Adhesion and Entry into Host Cells. Cell 2006, 124, 715–727. 10.1016/j.cell.2006.02.012. [DOI] [PubMed] [Google Scholar]
- Keul N. D.; Oruganty K.; Schaper Bergman E. T.; Beattie N. R.; McDonald W. E.; Kadirvelraj R.; Gross M. L.; Phillips R. S.; Harvey S. C.; Wood Z. A. The Entropic Force Generated by Intrinsically Disordered Segments Tunes Protein Function. Nature 2018, 563, 584–588. 10.1038/s41586-018-0699-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dave K.; Gasic A. G.; Cheung M. S.; Gruebele M. Competition of Individual Domain Folding with Inter-Domain Interaction in WW Domain Engineered Repeat Proteins. Phys. Chem. Chem. Phys. 2019, 21, 24393–24405. 10.1039/C8CP07775D. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nettels D.; Gopich I. V.; Hoffmann A.; Schuler B. Ultrafast Dynamics of Protein Collapse from Single-Molecule Photon Statistics. Proc. Natl. Acad. Sci. U.S.A. 2007, 104, 2655–2660. 10.1073/pnas.0611093104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- O’Brien E. P.; Morrison G.; Brooks B. R.; Thirumalai D. How Accurate Are Polymer Models in the Analysis of Förster Resonance Energy Transfer Experiments on Proteins?. J. Chem. Phys. 2009, 130, 124903 10.1063/1.3082151. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hofmann H.; Soranno A.; Borgia A.; Gast K.; Nettels D.; Schuler B. Polymer Scaling Laws of Unfolded and Intrinsically Disordered Proteins Quantified with Single-Molecule Spectroscopy. Proc. Natl. Acad. Sci. U.S.A. 2012, 109, 16155–16160. 10.1073/pnas.1207719109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Das R. K.; Pappu R. V. Conformations of Intrinsically Disordered Proteins Are Influenced by Linear Sequence Distributions of Oppositely Charged Residues. Proc. Natl. Acad. Sci. U.S.A. 2013, 110, 13392–13397. 10.1073/pnas.1304749110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Firman T.; Ghosh K. Sequence Charge Decoration Dictates Coil-Globule Transition in Intrinsically Disordered Proteins. J. Chem. Phys. 2018, 148, 123305 10.1063/1.5005821. [DOI] [PubMed] [Google Scholar]
- Schuler B.; Soranno A.; Hofmann H.; Nettels D. Single-Molecule FRET Spectroscopy and the Polymer Physics of Unfolded and Intrinsically Disordered Proteins. Annu. Rev. Biophys. 2016, 45, 207–231. 10.1146/annurev-biophys-062215-010915. [DOI] [PubMed] [Google Scholar]
- Zheng W.; Dignon G.; Brown M.; Kim Y. C.; Mittal J. Hydropathy Patterning Complements Charge Patterning to Describe Conformational Preferences of Disordered Proteins. J. Phys. Chem. Lett. 2020, 11, 3408–3415. 10.1021/acs.jpclett.0c00288. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nowicki W.; Nowicka G.; Narkiewicz-Michałek J. Influence of Confinement on Conformational Entropy of a Polymer Chain and Structure of Polymer–Nanoparticles Complexes. Polymer 2009, 50, 2161–2171. 10.1016/j.polymer.2009.02.044. [DOI] [Google Scholar]
- Moses D.; Yu F.; Ginell G. M.; Shamoon N. M.; Koenig P. S.; Holehouse A. S.; Sukenik S. Revealing the Hidden Sensitivity of Intrinsically Disordered Proteins to Their Chemical Environment. J. Phys. Chem. Lett. 2020, 11, 10131–10136. 10.1021/acs.jpclett.0c02822. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Moses D.; Guadalupe K.; Yu F.; Flores E.; Perez A.; McAnelly R.; Shamoon N. M.; Cuevas-Zepeda E.; Merg A. D.; Martin E. W.. et al. Structural Biases in Disordered Proteins Are Prevalent in the Cell. bioRxiv, 2022, 10.1101/2021.11.24.469609. [DOI] [PMC free article] [PubMed]
- Brady J. P.; Farber P. J.; Sekhar A.; Lin Y.-H.; Huang R.; Bah A.; Nott T. J.; Chan H. S.; Baldwin A. J.; Forman-Kay J. D.; Kay L. E. Structural and Hydrodynamic Properties of an Intrinsically Disordered Region of a Germ Cell-Specific Protein on Phase Separation. Proc. Natl. Acad. Sci. U.S.A. 2017, 114, E8194–E8203. 10.1073/pnas.1706197114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sottini A.; Borgia A.; Borgia M. B.; Bugge K.; Nettels D.; Chowdhury A.; Heidarsson P. O.; Zosel F.; Best R. B.; Kragelund B. B.; Schuler B. Polyelectrolyte Interactions Enable Rapid Association and Dissociation in High-Affinity Disordered Protein Complexes. Nat. Commun. 2020, 11, 5736 10.1038/s41467-020-18859-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Theillet F.-X.; Binolfi A.; Bekei B.; Martorana A.; Rose H. M.; Stuiver M.; Verzini S.; Lorenz D.; van Rossum M.; Goldfarb D.; Selenko P. Structural Disorder of Monomeric α-Synuclein Persists in Mammalian Cells. Nature 2016, 530, 45–50. 10.1038/nature16531. [DOI] [PubMed] [Google Scholar]
- Martin E. W.; Holehouse A. S.; Grace C. R.; Hughes A.; Pappu R. V.; Mittag T. Sequence Determinants of the Conformational Properties of an Intrinsically Disordered Protein Prior to and upon Multisite Phosphorylation. J. Am. Chem. Soc. 2016, 138, 15323–15335. 10.1021/jacs.6b10272. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Baul U.; Chakraborty D.; Mugnai M. L.; Straub J. E.; Thirumalai D. Sequence Effects on Size, Shape, and Structural Heterogeneity in Intrinsically Disordered Proteins. J. Phys. Chem. B 2019, 123, 3462–3474. 10.1021/acs.jpcb.9b02575. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Song J.; Li J.; Chan H. S. Small-Angle X-Ray Scattering Signatures of Conformational Heterogeneity and Homogeneity of Disordered Protein Ensembles. J. Phys. Chem. B 2021, 125, 6451–6478. 10.1021/acs.jpcb.1c02453. [DOI] [PubMed] [Google Scholar]
- Reid Alderson T.; Pritišanac I.; Moses A. M.; Forman-Kay J. D.. Systematic Identification of Conditionally Folded Intrinsically Disordered Regions by AlphaFold2. bioRxiv, 2022 10.1101/2022.02.18.481080. [DOI] [PMC free article] [PubMed]
- Tunyasuvunakool K.; Adler J.; Wu Z.; Green T.; Zielinski M.; Žídek A.; Bridgland A.; Cowie A.; Meyer C.; Laydon A.; et al. Highly Accurate Protein Structure Prediction for the Human Proteome. Nature 2021, 596, 590–596. 10.1038/s41586-021-03828-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Piovesan D.; Monzon A. M.; Tosatto S. C. E. Intrinsic Protein Disorder and Conditional Folding in AlphaFoldDB. Protein Sci. 2022, 31, e4466 10.1002/pro.4466. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Varadi M.; Anyango S.; Deshpande M.; Nair S.; Natassia C.; Yordanova G.; Yuan D.; Stroe O.; Wood G.; Laydon A.; et al. AlphaFold Protein Structure Database: Massively Expanding the Structural Coverage of Protein-Sequence Space with High-Accuracy Models. Nucleic Acids Res. 2022, 50, D439–D444. 10.1093/nar/gkab1061. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vitalis A.; Pappu R. V. ABSINTH: A New Continuum Solvation Model for Simulations of Polypeptides in Aqueous Solutions. J. Comput. Chem. 2009, 30, 673–699. 10.1002/jcc.21005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Flory P. J. The Configuration of Real Polymer Chains. J. Chem. Phys. 1949, 17, 303–310. 10.1063/1.1747243. [DOI] [Google Scholar]
- Hammouda B.SANS from Homogeneous Polymer Mixtures: A Unified Overview. In Polymer Characteristics; Springer Berlin Heidelberg: Berlin, Heidelberg, 1993; pp 87–133. [Google Scholar]
- Sørensen C. S.; Kjaergaard M. Effective Concentrations Enforced by Intrinsically Disordered Linkers Are Governed by Polymer Physics. Proc. Natl. Acad. Sci. U.S.A. 2019, 116, 23124–23131. 10.1073/pnas.1904813116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Basak S.; Sakia N.; Dougherty L.; Guo Z.; Wu F.; Mindlin F.; Lary J. W.; Cole J. L.; Ding F.; Bowen M. E. Probing Interdomain Linkers and Protein Supertertiary Structure In Vitro and in Live Cells with Fluorescent Protein Resonance Energy Transfer. J. Mol. Biol. 2021, 433, 166793 10.1016/j.jmb.2020.166793. [DOI] [PMC free article] [PubMed] [Google Scholar]
- McGibbon R. T.; Beauchamp K. A.; Harrigan M. P.; Klein C.; Swails J. M.; Hernández C. X.; Schwantes C. R.; Wang L.-P.; Lane T. J.; Pande V. S. MDTraj: A Modern Open Library for the Analysis of Molecular Dynamics Trajectories. Biophys. J. 2015, 109, 1528–1532. 10.1016/j.bpj.2015.08.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kabsch W.; Sander C. Dictionary of Protein Secondary Structure: Pattern Recognition of Hydrogen-Bonded and Geometrical Features. Biopolymers 1983, 22, 2577–2637. 10.1002/bip.360221211. [DOI] [PubMed] [Google Scholar]
- Vitalis A.Probing the Early Stages of Polyglutamine Aggregation with Computational Methods; Washington University: St. Louis, 2009. [Google Scholar]
- Steinhauser M. O. A Molecular Dynamics Study on Universal Properties of Polymer Chains in Different Solvent Qualities. Part I. A Review of Linear Chain Properties. J. Chem. Phys. 2005, 122, 094901 10.1063/1.1846651. [DOI] [PubMed] [Google Scholar]
- Tran H. T.; Pappu R. V. Toward an Accurate Theoretical Framework for Describing Ensembles for Proteins under Strongly Denaturing Conditions. Biophys. J. 2006, 91, 1868–1886. 10.1529/biophysj.106.086264. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Holehouse A. S.; Sukenik S. Controlling Structural Bias in Intrinsically Disordered Proteins Using Solution Space Scanning. J. Chem. Theory Comput. 2020, 16, 1794–1805. 10.1021/acs.jctc.9b00604. [DOI] [PubMed] [Google Scholar]
- Knotts T. A. 4th; Rathore N.; de Pablo J. J. An Entropic Perspective of Protein Stability on Surfaces. Biophys. J. 2008, 94, 4473–4483. 10.1529/biophysj.107.123158. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Taneja I.; Holehouse A. S. Folded Domain Charge Properties Influence the Conformational Behavior of Disordered Tails. Curr. Res. Struct. Biol. 2021, 3, 216–228. 10.1016/j.crstbi.2021.08.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jumper J.; Evans R.; Pritzel A.; Green T.; Figurnov M.; Ronneberger O.; Tunyasuvunakool K.; Bates R.; Žídek A.; Potapenko A.; et al. Highly Accurate Protein Structure Prediction with AlphaFold. Nature 2021, 596, 583–589. 10.1038/s41586-021-03819-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Deiana A.; Forcelloni S.; Porrello A.; Giansanti A. Intrinsically Disordered Proteins and Structured Proteins with Intrinsically Disordered Regions Have Different Functional Roles in the Cell. PLoS One 2019, 14, e0217889 10.1371/journal.pone.0217889. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vuzman D.; Levy Y. Intrinsically Disordered Regions as Affinity Tuners in Protein–DNA Interactions. Mol. BioSyst. 2012, 8, 47–57. 10.1039/c1mb05273j. [DOI] [PubMed] [Google Scholar]
- Moesa H. A.; Wakabayashi S.; Nakai K.; Patil A. Chemical Composition Is Maintained in Poorly Conserved Intrinsically Disordered Regions and Suggests a Means for Their Classification. Mol. Biosyst. 2012, 8, 3262–3273. 10.1039/c2mb25202c. [DOI] [PubMed] [Google Scholar]
- Piovesan D.; Tabaro F.; Mičetić I.; Necci M.; Quaglia F.; Oldfield C. J.; Aspromonte M. C.; Davey N. E.; Davidović R.; Dosztányi Z.; Elofsson A.; et al. DisProt 7.0: A Major Update of the Database of Disordered Proteins. Nucleic Acids Res. 2017, 45, D219–D227. 10.1093/nar/gkw1056. [DOI] [PMC free article] [PubMed] [Google Scholar]
- van Rosmalen M.; Krom M.; Merkx M. Tuning the Flexibility of Glycine-Serine Linkers To Allow Rational Design of Multidomain Proteins. Biochemistry 2017, 56, 6565–6574. 10.1021/acs.biochem.7b00902. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wicky B. I. M.; Shammas S. L.; Clarke J. Affinity of IDPs to Their Targets Is Modulated by Ion-Specific Changes in Kinetics and Residual Structure. Proc. Natl. Acad. Sci. U.S.A. 2017, 114, 9882–9887. 10.1073/pnas.1705105114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chin A. F.; Toptygin D.; Elam W. A.; Schrank T. P.; Hilser V. J. Phosphorylation Increases Persistence Length and End-to-End Distance of a Segment of Tau Protein. Biophys. J. 2016, 110, 362–371. 10.1016/j.bpj.2015.12.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu C.; Yao M.; Hogue C. W. V. Near-Membrane Ensemble Elongation in the Proline-Rich LRP6 Intracellular Domain May Explain the Mysterious Initiation of the Wnt Signaling Pathway. BMC Bioinf. 2011, 12, S13. 10.1186/1471-2105-12-s13-s13. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cuevas-Velazquez C. L.; Vellosillo T.; Guadalupe K.; Schmidt H. B.; Yu F.; Moses D.; Brophy J. A. N.; Cosio-Acosta D.; Das A.; Wang L.; et al. Intrinsically Disordered Protein Biosensor Tracks the Physical-Chemical Effects of Osmotic Stress on Cells. Nat. Commun. 2021, 12, 5438 10.1038/s41467-021-25736-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Das R. K.; Ruff K. M.; Pappu R. V. Relating Sequence Encoded Information to Form and Function of Intrinsically Disordered Proteins. Curr. Opin. Struct. Biol. 2015, 32, 102–112. 10.1016/j.sbi.2015.03.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- De Los Rios P.; Ben-Zvi A.; Slutsky O.; Azem A.; Goloubinoff P. Hsp70 Chaperones Accelerate Protein Translocation and the Unfolding of Stable Protein Aggregates by Entropic Pulling. Proc. Natl. Acad. Sci. U.S.A. 2006, 103, 6166–6171. 10.1073/pnas.0510496103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dunker A. K.; Lawson J. D.; Brown C. J.; Williams R. M.; Romero P.; Oh J. S.; Oldfield C. J.; Campen A. M.; Ratliff C. M.; Hipps K. W.; et al. Intrinsically Disordered Protein. J. Mol. Graph. Model. 2001, 19, 26–59. 10.1016/S1093-3263(00)00138-8. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.