Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2021 Oct 29;118(44):e2111696118. doi: 10.1073/pnas.2111696118

Accurate model of liquid–liquid phase behavior of intrinsically disordered proteins from optimization of single-chain properties

Giulio Tesei a,1, Thea K Schulze a, Ramon Crehuet a,b, Kresten Lindorff-Larsen a,1
PMCID: PMC8612223  PMID: 34716273

Significance

Cells may compartmentalize proteins via a demixing process known as liquid–liquid phase separation (LLPS), which is often driven by intrinsically disordered proteins (IDPs) and regions. Protein condensates arising from LLPS may develop into insoluble protein aggregates, as in neurodegenerative diseases and cancer. Understanding the process of formation, dissolution, and aging of protein condensates requires models that accurately capture the underpinning interactions at the residue level. In this work, we leverage data from biophysical experiments on IDPs in dilute solution to develop a sequence-dependent model which predicts conformational and phase behavior of diverse and unrelated protein sequences with good accuracy. Using the model, we gain insight into the coupling between chain compaction and LLPS propensity.

Keywords: intrinsically disordered proteins, liquid–liquid phase separation, force field parameterization, biomolecular condensates, protein interactions

Abstract

Many intrinsically disordered proteins (IDPs) may undergo liquid–liquid phase separation (LLPS) and participate in the formation of membraneless organelles in the cell, thereby contributing to the regulation and compartmentalization of intracellular biochemical reactions. The phase behavior of IDPs is sequence dependent, and its investigation through molecular simulations requires protein models that combine computational efficiency with an accurate description of intramolecular and intermolecular interactions. We developed a general coarse-grained model of IDPs, with residue-level detail, based on an extensive set of experimental data on single-chain properties. Ensemble-averaged experimental observables are predicted from molecular simulations, and a data-driven parameter-learning procedure is used to identify the residue-specific model parameters that minimize the discrepancy between predictions and experiments. The model accurately reproduces the experimentally observed conformational propensities of a set of IDPs. Through two-body as well as large-scale molecular simulations, we show that the optimization of the intramolecular interactions results in improved predictions of protein self-association and LLPS.


Many intrinsically disordered proteins (IDPs) and proteins with disordered regions can condense into liquid-like droplets, namely, a biomolecule-rich phase coexisting with a more dilute solution (15). This demixing process is known as liquid–liquid phase separation (LLPS) and is one of the ways cells compartmentalize proteins, often together with nucleic acids (6). While LLPS plays crucial biological roles in the cell, its dysregulation leads to maturation of biomolecular condensates into hydrogel-like assemblies, promoting the formation of neurotoxic oligomers and amyloid fibrils (5,7). A quantitative model for the “molecular grammar” of LLPS, including the influence of disease-associated mutations and posttranslational modifications (PTMs) on the propensity to phase separate, is key to understand these processes. The sequences of IDPs and intrinsically disordered regions that easily undergo LLPS are often characterized by stretches enriched in small polar residues (spacers) interspersed by, e.g., aromatic or arginine residues (stickers), which are instrumental for the formation of reversible physical cross-links via ππ, cation–π, and sp2π interactions (812). Y and R residues were shown to be necessary for the LLPS of a number of proteins including FUS, hnRNPA1, LAF-1, and Ddx4 (8, 10, 11, 1317). While the propensity to undergo LLPS increases with the number of Y residues in the sequence, recent studies have revealed that the role of R residues is context dependent (16) and strongly affected by salt concentration (17), reflecting the unusual characteristics of the R side chain (18, 19).

Here we present the development of a coarse-grained (CG) model capable of predicting the phase behavior of IDPs based on amino acid sequence. CG models enable the combination of a sequence-dependent description with the computational efficiency necessary to explore the long time and large length scales involved in phase transitions (11, 20, 21). Although CG molecular simulations have been employed to explain the sequence dependence of the LLPS of a number of IDPs (11, 15, 17, 2022) as well as the effect of phosphorylation on LLPS propensities (23, 24), such models have proven difficult to use to predict the phase behavior of very diverse sequences (25). Building on recent developments, including experimental phase diagrams of a number of IDPs (3, 4, 15, 16), we trained and tested a robust sequence-dependent model of the LLPS of IDPs. In particular, due to the similarity between intramolecular interactions within IDPs and intermolecular interactions between IDPs (12, 26), we reasoned that by optimizing a model to capture structural preferences for a broad set of monomeric IDPs, we could obtain a good model for interactions between IDPs.

The starting point for our analyses is the hydrophobicity scale (HPS) model (21) (with minor modification; SI Appendix) wherein, besides steric repulsion and salt-screened charge–charge interactions, residue–residue interactions are determined by hydropathy parameters (λ) which were derived from the atomic partial charges of a classical all-atom force field (27). Recently, the development of the HPS-Urry model (28) presented substantial improvements in accuracy over the original HPS model. These were achieved using a hydrophobicity scale derived from transition temperatures of elastin-like peptides (29) and further shifting the λ parameters by -0.08 to improve agreement with experimentally measured radii of gyration.

To address the current limitations, we improve upon these models by optimizing the λ parameters through a Bayesian parameter-learning procedure (3033), leveraging as prior knowledge the probability distribution of the λ parameters evaluated from analyzing 87 hydrophobicity scales. The training set comprises small-angle X-ray scattering (SAXS) and paramagnetic relaxation enhancement (PRE) NMR data of 45 IDPs which we selected from the literature. First, we run Langevin dynamics simulations of single IDPs and estimate the experimental observables using state-of-the-art methods (34). Second, we employ a Bayesian regularization approach to prevent overfitting the training data and select three models which are equally accurate with respect to single-chain conformational properties. Third, through two-chain simulations, we validate the models by comparing predicted and experimental intermolecular PRE NMR data for the low-complexity domain (LCD) of the heterogeneous nuclear ribonucleoprotein (hnRNP) A2 (A2 LCD) (22) and the LCD of the RNA-binding protein fused in sarcoma (FUS LCD) (23). Fourth, we perform coexistence simulations to test the models against the phase behavior of A2 LCD (22, 24); FUS LCD (35, 36); variants of hnRNP A1 LCD (A1 LCD) (15, 16); the N-terminal region of the germ-granule protein Ddx4 (Ddx4 LCD) (8, 10, 13); and the N-terminal, R-/G-rich domain of the P granule protein LAF-1 (LAF-1 RGG domain). We use the final model to provide insight into the interactions between IDPs within condensates and to help elucidate the role of different amino acids to the driving force for LLPS.

Results and Discussion

Analysis of Hydrophobicity Scales

The λ values of the original HPS model are based on a hydrophobicity scale derived by Kapcha and Rossky from the atomic partial charges of an all-atom force field (27). Dozens of amino acid hydrophobicity scales have been derived from experimental as well as bioinformatics approaches such as the partitioning of amino acids between water and organic solvent, the partitioning of peptides to the lipid membrane interface, and the accessible surface area of residues in folded proteins (37, 38). To carry out the Bayesian optimization of the amino acid specific λ values, we sought to estimate the prior probability distribution of the hydropathy parameters from the analysis of 98 hydrophobicity scales collected by Simm et al. (38). Each scale was minimum–maximum (min–max) normalized, and after ranking in the ascending order of the HPS scale, we discarded all the scales yielding a linear fit with negative slope. This procedure allowed us to identify scales which were present in the set both in their original form and as the additive inverse of the hydropathy values (reversed scales). For most scales, the selection criterion resulted in discarding the reversed form. However, for scales where the most negative values of the hydropathy parameter correspond to the most hydrophobic amino acids–such as the scales by Bull and Breese (39), Guy (40), Bishop et al. (41). and Welling et al. (42)–we retained only the reversed form. The 87 scales that remained after this filtering were used to calculate the average scale (AVG) and the probability distribution of the λ values for the 20 amino acids, P(λ), which is normalized so that aaλaa=0λaa=1P(λaa)dλaa=20 (Fig. 1A). For the optimization described below we use the AVG scale as starting point, as well as an indication of the typical accuracy obtained from the prior knowledge encoded in P(λ).

Fig. 1.

Fig. 1

Assessing the HPS, AVG, and HPS-Urry models using experimental data reporting on single-chain conformational properties. (A) Probability distributions of the λ parameters calculated from 87 min–max normalized hydrophobicity scales. Lines are the λ parameters of the HPS model (blue), the average over the hydrophobicity scales (orange) and the HPS-Urry model (green) (28). Intramolecular PRE intensity ratios for (B) the S42C mutant of α-Synuclein and (C) the S143C mutant of A2 LCD from simulations and experiments (22, 43) (black). (D) χ2 values quantifying the discrepancy between simulated and experimental intramolecular PRE data, scaled by the hyperparameter η=0.1 (Materials and Methods). Relative difference between simulated and experimental radii of gyration (E) for proteins that do not readily undergo phase separation alone and (F) for variants of A1 LCD, with negative values corresponding to the simulated ensembles being more compact than in experiments.

We assessed the HPS, HPS-Urry, and AVG parameter sets by running simulations of 45 IDPs ranging in length between 24 and 334 residues and compared the results against experiments. Specifically, we compared the simulations with the radii of gyration, Rg, of 42 IDPs (SI Appendix, Table S1) and intramolecular PRE data of six IDPs (SI Appendix, Table S2) (16, 22, 23, 4357). Compared to the AVG scale, the HPS model overestimates the compaction of α-Synuclein whereas it closely reproduces the PRE data for A2 LCD (Fig. 1 B and C). In general, the HPS model accurately predicts the conformational properties of sequences with high LLPS propensity, e.g., FUS LCD, A2 LCD, and A1 LCD (Fig. 1 D and F), while the AVG scale is considerably more accurate at reproducing the Rg of proteins that do not readily undergo phase separation alone (Fig. 1E). The recently proposed HPS-Urry model (28) is the most accurate at predicting the intramolecular PRE data while it shows intermediate accuracy for the Rg values of both proteins that do not readily undergo phase separation alone and A1 LCD variants. The HPS-Urry model in particular differs significantly from the HPS and AVG models for the λ parameters for R and E as well as the reversal of the order of hydrophobicity of Y and F (Fig. 1A).

Optimization of Amino Acid–Specific Hydrophobicity Values

To obtain a model that more accurately predicts the conformational properties of IDPs of diverse sequences and LLPS propensities, we trained the λ values on a large set of experimental Rg and PRE data using a Bayesian parameter-learning procedure (30) shown schematically in Fig. 2 (Materials and Methods). We initially performed an optimization run starting from the AVG λ values and setting the hyperparameters to θ=η=0.1 (SI Appendix, Fig. S1A). We collected the optimized sets of λ values which yielded ηχPRE2<21 and χRg2<3 (circles in Fig. 3A). The optimization was repeated starting from all λ=0.5 to assess that the parameter space sampled by our method is independent of the initial conditions (SI Appendix, Figs. S1D and S2A). Thus, while we used the AVG model as starting point, our final parameters only depend on P(λ) via its use as the prior in the Bayesian optimization.

Fig. 2.

Fig. 2

Flowchart illustrating the Bayesian parameter-learning procedure (Materials and Methods).

Fig. 3.

Fig. 3

Selection and performance of the M1–3 models with respect to the training data. (A) Overview of the optimal λ sets with ηχPRE2<21 and χRg2<3 collected through the parameter learning procedures started from λ0= AVG (upward triangles), M1 (squares), and M2 (downward triangles). The gray gradient shows the Spearman’s correlation coefficient between experimental and simulated Rg values for the A1 LCD variants in the training set. Colored open symbols indicate the M1 (blue upward triangle), M2 (orange square), and M3 (green downward triangle) scales, whereas the adjacent values are the respective Spearman’s correlation coefficients. (B) Covariance matrix of the λ sets with ηχPRE2<21 and χRg2<3. (C) M1 (blue), M2 (orange), and M3 (green) scales. Solid lines are guides for the eye, whereas the gray shaded area shows the mean ±2 SD of the λ sets with ηχPRE2<21 and χRg2<3. Comparison between (D) ηχPRE2 and (E) χRg2 values for the HPS model (gray) and the optimized M1 (blue), M2 (orange), and M3 (green) models.

From the pool of optimized parameters, we selected the λ set which resulted in the largest Spearman’s correlation coefficient (ρ=0.78) between simulated and experimental Rg values for the A1 LCD variants. We base this final selection of the optimal λ set on the Spearman’s correlation coefficient of the A1 LCD variants because we expect that capturing the experimental ranking in chain compaction will result in accurate predictions of the relative LLPS propensities (15, 16, 20, 58, 59). Further, the systematic mutagenesis studies enable us to more clearly decouple the parameters for Y vs. F and R vs. K (15, 16). We note that while this selection uses only the A1 LCD variants, all three parameter sets result in good agreement with the full PRE and Rg dataset (Fig. 3A). The selected model, referred to as M1 hereafter, is the starting point for two consecutive optimization cycles (SI Appendix, Fig. S1B) which were performed with a lower weight for the prior (θ=0.05), yielding a new pool of optimized parameters (squares in Fig. 3A) and model M2 (largest ρ=0.75). To generate a third model, we further decreased the confidence parameter to θ=0.02 and performed an additional optimization run starting from M2 (SI Appendix, Fig. S1C). From the collected optimal parameters (triangles in Fig. 3A), we selected M3 (largest ρ=0.73). As shown in Fig. 3B, the optimal λ values collected through the four independent optimization runs (SI Appendix, Fig. S1 AD) are weakly intercorrelated. The covariance values range between –0.015 and 0.015 for most amino acids, with the exception of the SDs of N, C, T, M, W, and I. C, M, W, and I are among the least frequent amino acids in the training set (SI Appendix, Fig. S3), and unsurprisingly, we observe the largest covariance values for C–W (0.017), C–M (-0.02) and C–I (–0.016). Fig. 3C shows that M1–3 fall within two SDs above and below the mean of the λ values yielding ηχPRE2<21 and χRg2<3 (gray shaded area). Despite their differences, M1–3 fit the training data equally accurately and result in an improvement in χPRE2 and χRg2 of ∼ 30 and ∼ 95%, respectively, with respect to the HPS model (Fig. 3 D and E).

Notably, the optimization procedure captures the sequence dependence of the chain dimensions (Fig. 4) and results in accurate predictions of intramolecular PRE data for both highly soluble IDPs and proteins that more readily phase separate (SI Appendix, Figs. S4 BD and S5–S10), as well as in radii of gyration with relative errors 14%<ΔRg/Rg,exp<12% (SI Appendix, Fig. S4 E and F). Besides reproducing the experimental Rg values for the longer chains with high accuracy, the optimized models also capture the differences in Rg and scaling exponents, ν, for the variants of A1 LCD (Fig. 4B and SI Appendix, Fig. S11). The lower Pearson’s correlation coefficients observed for ν, compared to the corresponding Rg data, may originate from the different models used to infer ν from SAXS experiments and simulation data, i.e., the molecular form factor method (16, 52) and least-squares fit to long intramolecular pairwise distances, Rij, vs. |ij|>10 (60) (SI Appendix, Fig. S12).

Fig. 4.

Fig. 4

(A) Comparison between experimental and predicted radii of gyration (SI Appendix, Table S1), Rg, for the HPS, HPS-Urry, and M1–3 models. (B) Zoom-in on the Rg values of the A1 LCD variants, with Pearson’s r coefficients for this subset of the training data reported in the legend.

To assess the impact of phase separating proteins on the optimized models, we perform an optimization run wherein the A1 LCD variants are removed from the training set. The major difference between the resulting optimal λ set and models M1–3 is the considerably smaller values for R and Y residues (SI Appendix, Fig. S2C). Indeed, the large λ values for R and Y residues in M1–3 relative to the HPS, AVG, and HPS-Urry models is a striking feature which resonates with previous experimental findings pointing to the important role of R and Y residues in driving LLPS (8, 1416, 22, 61, 62).

To identify the hydrophobicity scales which most closely resemble M1–3, we construct a dendrogram (SI Appendix, Fig. S13) complementing the 87 scales retained from the set by Simm et al. (38) with the Urry, Kapcha–Rossky, and M1–3 scales and using average linkage-based hierarchical clustering and Euclidean distances as the metric. This analysis reveals that the hydrophobicity scales by Urry et al. (29), Bishop et al. (41), Wimley and White (63), and the membrane protein surrounding hydrophobicity scale by Ponnuswamy and Gromiha (64) are those with greatest similarity to M1–3. These scales, which are characterized by a λ value for the R residue above the 80% quantile, are possibly the best of the unmodified scales for the properties that we optimized M1–3 to reproduce.

Testing Protein–Protein Interactions

To test whether the parameters trained on single-chain conformational properties are transferable to protein–protein interactions, we compared experimental intermolecular PRE rates, Γ2, of FUS LCD and A2 LCD (22, 23) with predictions from two-chain simulations of the M1–3 models performed at the same conditions as the reference experiments. Intermolecular Γ2 values were obtained from solutions of spin-labeled 14N protein and 15N protein without a spin label in equimolar amount and report on the transient interactions between a paramagnetic nitroxide probe attached to a cysteine residue of the spin-labeled chain and all the amide protons of the 15N-labeled chain. We carried out the calculation of the PRE rates using the software DEER-PREdict (34), assuming an effective correlation time of the spin label, τt, of 100 ps and fitting an overall molecular correlation time, τc, within the interval 1τc20 ns. In agreement with experiments, Γ2 values predicted by the M1–3 models are characterized by no distinctive peaks along the protein sequence (Fig. 5 AE), which is consistent with transient and nonspecific protein–protein interactions. Notably, while PRE rates for FUS LCD are of the same magnitude for all spin-labeled sites, the A2 LCD presents larger Γ2 values for S99C than for S143C indicating that the tyrosine-rich aggregation-prone region (residues 84 to 107) is involved in more frequent intermolecular contacts with the entire sequence. The discrepancy between predicted and experimental intermolecular PRE data, χPRE2, varies significantly as a function of τc (Fig. 5 F and G). For both FUS LCD and A2 LCD, the optimal τc is larger for M1 than for M3, which suggests that the latter has more attractive intermolecular interactions. While for M1 the minimum of χPRE2 is at τc=17 ns for both proteins, for M3 the optimal τc value is ∼ 8 ns smaller for FUS LCD than for A2 LCD. Although the accuracy of τc is difficult to assess in the case of transiently interacting IDPs, this large difference in τc (Fig. 5) suggests that the protein–protein interactions predicted for FUS LCD by M3 may be overly attractive.

Fig. 5.

Fig. 5

Testing the M1–3 models using experimental findings on protein–protein interactions. Comparison between experimental (black) intermolecular PRE rates (SI Appendix, Table S3) and predictions from the M1 (blue), M2 (orange) and M3 (green) models for (AC) FUS LCD and (D and E) A2 LCD calculated using the best-fit correlation time, τc. (F and G) Discrepancy between calculated and experimental intermolecular PRE rates χPRE2 as a function of τc. (H) Second virial coefficients, B22, of FUS LCD (circles) and A2 LCD (squares) calculated from two-chain simulations of the M1–3 models. Error bars are SEMs estimated by bootstrapping 1,000 times 40 B22 values calculated from trajectory blocks of 875 ns. (I) Probability of the bound state estimated from protein-protein interaction energies in two-chain simulations of the M1–3 models. (J) Dissociation constants, Kd, of FUS LCD (circles) and A2 LCD (squares) calculated from two-chain simulations of the M1–3 models. For pB and Kd, error bars are SDs of 10 simulation replicas. Lines in H and J are guides for the eye.

To quantify protein–protein interactions with the optimized models, we calculated second virial coefficients, B22, from two-chain simulations (SI Appendix). The net interactions are attractive for both the sequences (B22<0) and considerably stronger for A2 LCD than for FUS LCD. As expected from the λ values and amino acid compositions, M3 presents the most negative B22 values (large λ values for Q, G, and P), followed by M2 and M1 (Fig. 5I).

To test whether predictions of protein self-association by M1–3 are sequence dependent, we compared the probability of finding proteins in the bound dimeric state, pB, in simulations of α-Synuclein, p15PAF, full-length tau (ht40), A2 LCD, and FUS LCD performed at the solution conditions of the reference experimental data (43, 50, 65) (SI Appendix). In agreement with experimental findings, we find that the highly soluble α-Synuclein, p15PAF, and ht40 proteins do not self-associate substantially in our simulations, whereas A2 LCD and FUS LCD have pB 4 and ∼ 1%, respectively. We further estimated the dissociation constants of A2 LCD and FUS LCD using Kd=(1pB)2/(NApBV) and Kd=1/(NApB(VB22)) self-consistently (66), where NA is Avogadro’s number (Fig. 5J and SI Appendix, Fig. S14).

Testing LLPS Propensities

To test the ability of the models to capture the sequence dependence of LLPS propensity, we performed multichain simulations in a slab geometry and calculated protein concentrations of the coexisting condensate, ccon, and dilute phase, csat. We compared our simulation results to an extensive set of sequences which have been shown to undergo LLPS below an upper critical solution temperature (UCST), namely, FUS LCD (23, 35, 36), A2 LCD (22, 24), the NtoS variant of A2 LCD (24), and LAF-1 RGG domain (11, 6769), as well as variants of A1 LCD (15, 16) and Ddx4 LCD (8, 10, 13). From simulations of the optimized models at 37 °C, we observed that for a number of sequences in the test set, the predicted csat values are too low to allow for converged estimates from µs-timescale trajectories (SI Appendix, Fig. S15). Conversely, the least LLPS-prone variants of Ddx4 LCD yielded one-phase systems when simulated at 37 °C using HPS-Urry and M1–3 models. Thus, to be able to estimate converged csat values (SI Appendix, Figs. S16, S17, and S18), simulations were carried out at 50 °C, except for the HPS-Urry model which we simulated at 24 °C (SI Appendix, Table S4). The FtoA and RtoA variants of Ddx4 LCD were also simulated at 24 °C using the M1–3 models as in simulations of the same systems at 50 °C we only observed a single phase.

Simulations using M1 at 50 °C most closely recapitulate the experimental trend in csat across the diverse sequences (Fig. 6 A, D, and G) and reproduce the reference ccon and csat values measured at room temperature. Conversely, HPS overestimates the relative LLPS propensity of FUS LCD, whereas simulations using HPS-Urry at 24 °C show deviations of about an order of magnitude from the reference csat values for A2 LCD, Ddx4 LCD, A1 LCD, and FUS LCD. Regarding the LAF-1 RGG domain, all of the models overestimate by at least a factor of ∼ 5 the experimental ccon (68, 69), whereas M1 reproduces within a factor of ∼ 2 the experimental csat value from temperature-dependent turbidity measurements (11), both for the wild type (WT) and for variants with randomly shuffled sequence (LAF-1 shuf) and without residues 21 to 30 (LAF-1 Δ 21 to 30) (SI Appendix, Fig. S19). Although M1–3 fit the training data equally well, the prediction of LLPS propensities for the diverse sequences in Fig. 6 A and D differ considerably, with Pearson’s correlation coefficients between simulation and experimental log10(csat) values ranging from 0.67 for M1 to 0.14 for M3 (Fig. 6G). The discrepancy is particularly evident for the Ddx4 LCD and FUS LCD which are rich in N and Q residues, respectively, i.e., the residues for which the M1 and M3 λ sets differ the most.

Fig. 6.

Fig. 6

Protein concentrations (AC) in the condensate and (DF) in the dilute phase from slab simulations of the M1–3, HPS, and HPS-Urry models performed at 50 °C (closed symbols), 37 °C (crosses in H), and 24 °C (open symbols). Red open squares show experimental measurements at ∼ 24 °C (A, C, D, and F) and ∼ 4 °C (B and E). Correlation between log10(csat/M) from simulations and experiments for (G) diverse sequences and (H) A1 LCD variants. Solid lines show linear fits to the simulation data at 50 °C. Dashed lines show linear fits to the HPS-Urry data at 24 °C (G and H) and to the M1–3 data at 37 °C (H). Values reported in the legends are Pearson’s correlation coefficients. Error bars are SEMs of averages over blocks of 0.3 µs. We note that the correlation coefficients reported in G are associated with a substantial uncertainty as they are calculated over only three (HPS), four (HPS-Urry), and five points (M1–3).

We further test our predictions against 15 variants of A1 LCD (Fig. 6 B and E). These include aromatic and charge variants, which were designed to decipher the role on the driving forces for phase separation of Y vs. F residues and of R, D, E, and K residues, respectively (16). The nomenclature, ±NXX±NZZ, denotes increase or decrease in the number of residues of type X and Z with respect to the WT, which is achieved by mutations to or from G and S residues while maintaining a constant G/S ratio. M1–3 are found to be equally accurate and present a considerable improvement over previous models with respect to their ability to recapitulate the trends in LLPS propensity for the aromatic and charged variants of A1 LCD. Since M1–3 were selected based on their performance in predicting the experimental ranking for the Rg values of 21 A1 LCD variants (SI Appendix, Table S1), this result supports our model development strategy. For M1–3, Pearson’s correlation coefficients exceed 0.7 between log10(csat) values measured at 4 °C (16) and simulation predictions at both 50 and 37 °C (Fig. 6H). Moreover, csat values from simulations at 37 °C are in agreement with the reference csat values at 4 °C (Fig. 6H and SI Appendix, Fig. S15). As we observed for the diverse sequences, quantitative agreement with the experimental csat values is achieved by carrying out simulations of the M1 model at a temperature systematically larger by 30°C than the experimental conditions. In addition to the lack of temperature dependence of the hydropathy parameters (70), the inconsistency between the temperature dependence of chain compaction and phase separation might be attributed to the long range of the nonelectrostatic interactions, which we compute up to distances of 4 nm (SI Appendix). Moreover, the significant decrease in the number of interaction sites upon coarse-graining at the amino acid level, and the resulting reduction in configurational entropy (71, 72), may promote LLPS by lowering the entropic penalty associated with partitioning a chain from the dilute solution to the condensate.

M1–3 reproduce the experimental ranking for LLPS propensity of the Ddx4 LCD variants, i.e., WT CS > FtoA RtoK (Fig. 6 C and F), and for all the variants, M1 and M3 consistently display the highest and lowest LLPS propensities, respectively. Simulations at 50 °C using M2 are in quantitative agreement with the experimental csat values (13) for both WT and the CS variant, which has the same net charge and amino acid composition as the WT but a more uniform charge distribution along the sequence. Moreover, as observed experimentally (13), M1–3 predict a single phase for the RtoK variant at 24 °C. As previously shown by Das et al. (25), the HPS model predicts a considerable increase in LLPS propensity upon replacement of all 24 R residues in the Ddx4 LCD with K (RtoK variant; Fig. 6C), in apparent contrast to experimental observations (10, 13). Interestingly, augmenting the HPS model with stronger cation–π interactions for R-aromatic than for K-aromatic pairs (25) has been shown to be insufficient to capture the lower LLPS propensity of the RtoK variant compared to WT. On the other hand, our data for the M1–3 and HPS-Urry models indicate that making all the interactions involving R more favorable results in more accurate predictions. In fact, a large λ value for R may better mimic its relatively unfavorable free energy of hydration (19) as well as the occurrence of R-aromatic cation–π interactions, R-R π-stacking, and R-D/E bidentate H-bonding (10, 17, 18, 73). Compared to the Kapcha–Rossky scale, it is noteworthy that the increase in the λ values of R, Y, and G in M1–3 is accompanied by an overall decrease in the average λ value. Hence, the optimization procedure led to the enhancement of specific attractive forces while maintaining a balance between electrostatic and nonelectrostatic interactions (25), which reveals itself, for example, in the ability of M1–3 to recapitulate the lower LLPS propensity of the CS variant with respect to Ddx4 LCD WT.

The M1 and M2 parameter sets differ mainly for the λ value of the N residue (Fig. 3C) and perform equally well against the test set (Fig. 6). Therefore, we further test the ability of M1 and M2 to predict the LLPS propensity of the NtoS variant of A2 LCD with respect to the WT. Only the M1 model, which has λ values for N and S of similar magnitude correctly predicts approximately the same LLPS propensity for variant and WT (SI Appendix, Fig. S20), in agreement with experiments (24).

Correlating Single-Chain Properties and Phase Separation

Motivated by recent experiments on the A1 LCD (15, 16), we perform a detailed analysis of the coupling between chain compaction and phase behavior of the A1 LCD variants. In agreement with previous observations (16), the log10(csat) values for the aromatic variants show a linear relationship with the scaling exponent, νsim, whereas changes in the number of charged residues (charge variants) result in significant deviations from the lines of best fit (Fig. 7 AC). Following the approach of Bremer et al. (16), we plot the residuals for the charge variants with respect to the lines of best fit as a function of the net charge per residue (NCPR) (Fig. 7 DF). The results for M1 and M2 show the V-shaped profile observed for the experimental data (16) and support the suggestion that mean-field electrostatic repulsion between the net charge of the proteins is responsible for breaking the coupling between chain compaction and LLPS propensity (16). In agreement with experimental data (16), we observe that for M1 and M2 the driving forces for LLPS are maximal for small positive values of NCPR (0.02).

Fig. 7.

Fig. 7

Correlation between chain compaction and LLPS propensity for aromatic and charge variants of A1 LCD. log10(csat/M) vs. νsim for A1 LCD variants from simulations performed using the (A) M1, (B) M2, and (C) M3 models. Black and colored circles indicate aromatic and charge variants, respectively. Black lines are linear fits to the aromatic variants. (DF) Residuals from the linear fits of AC for the charge variants of A1 LCD as a function of the NCPR. Values reported in the legends are Pearson’s correlation coefficients. Error bars of log10(csat) values are SEMs of averages over blocks of 0.3 µs. Error bars of νsim are SDs from fits to Rij=R0|ij|νsim in the long-distance region, |ij|>10. Solid lines are linear fits to the data. Dotted lines in DF are lines of best fit to the experimental data by Bremer et al. (16).

The dependence of LLPS on NCPR is clarified by comparing the residual nonelectrostatic energy maps of +8D (NCPR = 0), +4D (NCPR 0.03), and –4D (NCPR 0.09) with respect to the WT of A1 LCD (NCPR 0.06) (SI Appendix, Figs. S21 and S22). While in the case of NCPR = 0 the residual interaction patterns within the isolated chain and between chains in the condensate largely overlap, the energy baselines are clearly down- and up-shifted for NCPR 0.03 and NCPR 0.09, respectively (SI Appendix, Figs. S21 GI and S22 GI). Although the interaction patterns are still dominated by the stickers, deviations of the NCPR from 0.02 result in electrostatic mean-field repulsive interactions that disfavor LLPS. The LLPS-promoting effect of small positive NCPR values finds explanation in the amphiphilic character of the R side chains (18) which compensates for the repulsion introduced by the excess positive charge by allowing for favorable interactions with both Y and negatively charged residues. As opposed to M1 and M2, the readily phase-separating M3 model shows a weaker dependence on NCPR, especially for variants of net negative charge. This suggests that the experimental observations regarding the coupling between conformational and phase behavior of A1 LCD stem from a well-defined balance between mean-field repulsion and sticker-driven LLPS which can be offset by an overall moderate increase of 3 to 4% in the λ values of the residues present in A1 LCD.

Comparing Intramolecular and Intermolecular Interactions

After establishing the ability of model M1 to accurately predict trends in LLPS propensity for diverse sequences, we analyze the nonelectrostatic residue–residue energies for FUS LCD and A2 LCD within a single chain, as well as between pairs of chains in the dilute regime and in condensates. We find a striking similarity between intramolecular and intermolecular interaction patterns for both proteins (Fig. 8), consistent with a mostly uniform distribution of stickers along the linear sequence (Fig. 8 G and H) (15, 74). Notably, besides the aromatic F and Y residues, the analysis also identifies an M residue and four R residues as stickers in FUS LCD and A2 LCD, respectively. Therefore, the parameter-learning procedure presented herein corroborates the important role of R as a sequence-dependent sticker (16), whereby the large λ value for R in models M1–3 presumably reflects the ability of the amphiphilic guanidinium moiety to engage in H-bonding, as well as π stacking and charge–π interactions (18). Further, in the dilute regime, the intramolecular and intermolecular interactions are weaker in the N- and C-terminal regions than for the rest of the chain, as evident from the upturning baselines of the one-dimensional (1D) interaction energy projections. This result is consistent with the faster local motions of the terminal residues inferred from 15N NMR relaxation data for both unfolded proteins (75) and a number of phase separating IDPs (15, 22, 23). We also find that the aggregation-prone Y-rich region of A2 LCD (residues 84 to 107) interacts with the entire polypeptide chain (Fig. 8 DF) and thus likely drives chain compaction and self-association as well as LLPS. Finally, in line with previous observations from theory, simulations, and experiments (16, 76, 77), we observe that the polypeptide chains of A1 LCD, A2 LCD, and FUS LCD are more expanded in the condensed phase than in the dilute phase (SI Appendix, Fig. S23). In particular, we find that the scaling exponents of the LCDs increase toward ν=0.5 in the condensed phase and that differences in compaction between WT and charge variants of A1 LCD are greater in the dilute than in the condensed phase (SI Appendix, Fig. S23).

Fig. 8.

Fig. 8

Comparing residue–residue interactions in dilute solution and in the condensate. Energy maps from simulations of the M1 model of (AC) FUS LCD and (DF) A2 LCD calculated using nonelectrostatic interaction energies. The 1D projections of the energy maps for (G) FUS LCD and (H) A2 LCD, normalized by the absolute average interaction energy |E| and shifted vertically for clarity. Colors indicate that the energies were calculated within a single chain at infinite dilution (blue), between two chains in the dilute regime (orange), and between a chain located at the center of a condensate and the surrounding chains (green).

Conclusions

In this work we implement and validate an automated procedure to develop an accurate model of the LLPS of IDPs based on experimental data reporting on single-chain conformational properties. We show that this strategy succeeds, in agreement with the previously observed coupling between chain compaction and propensity for phase separation (15, 20, 58, 59), but also appears to recapitulate the recent discovery that charge effects may break this relationship (16). Our work differs from related previous studies (28, 30, 33, 78) in several ways including the size of the dataset used for optimization, the use of both NMR PREs and Rg values, and the introduction of a prior for the λ values. Moreover, by carrying out model optimizations with and without the A1 LCD variants, we show that the presence of phase-separating IDPs in the training set helps the parameter-learning procedure to capture the role of Y and R residues as stickers. The accuracy and general applicability of our model can be tested further by future experiments on systems that were not used for training or testing. We also note that our automated, Bayesian optimization approach makes it relatively straightforward to continue to develop and improve the model as additional data become available.

Simulations performed using the model optimized herein reveal that at least for sequences characterized by a relatively uniform distribution of stickers, residue–residue interactions determining chain compaction also drive self-association and LLPS. Moreover, we show that the experimentally observed dependency of LLPS on protein net charge appears to be captured by salt-screened electrostatic repulsion, even when assuming a uniform dielectric constant throughout the two-phase system.

We have here shown how our model may be used to help elucidate the residues that are important for LLPS of IDPs with UCST behavior. Further, we suggest the model could be applied to study the influence of disease-associated mutations on the material properties of protein self-coacervates (79, 80), the LLPS of protein mixtures as a function of composition, and the partitioning of proteins that do not readily undergo phase separation alone into condensates formed by other proteins (81, 82). Finally, owing to the generalized parameter-learning approach, the model could readily be refined as new experimental data are collected, and it should be possible to extend it to account for specific pairwise interactions such as cation–π interactions (25), PTMs (83), the salting-out effect (84), and the temperature dependence of solvent-mediated interactions (70).

Materials and Methods

We use the Cα-based model proposed by Dignon et al. (21) augmented with extra charges for the termini and a temperature-dependent treatment for dielectric constant of water (SI Appendix). Langevin dynamics simulations are conducted using HOOMD-blue v2.9 (85) in the NVT ensemble using the Langevin thermostat with a time step of 5 fs and friction coefficient of 0.01 ps– 1 (SI Appendix). Additionally, 100- and 300-chain simulations of LAF-1 RGG domain are also performed using openMM v7.5 (86) (SI Appendix, Fig. S20).

Bayesian Parameter-Learning Procedure

The λ values are optimized using a Bayesian parameter-learning procedure (30, 87, 88). The training set consists of the experimental Rg values of 42 IDPs (SI Appendix, Table S1) and the intramolecular PRE data of six proteins (SI Appendix, Table S2) (16, 22,23, 4357). To guide the optimization within physically reasonable parameters and to avoid overfitting the training set, we introduce a regularization term which penalizes deviations of the λ values from the probability distribution, P(λ), which is the prior knowledge obtained from the statistical analysis of 87 hydrophobicity scales. The optimization procedure consists of the following steps (Fig. 2):

  • 1.

    Single-chain CG simulation of the proteins of the training set (SI Appendix, Table S1).

  • 2.

    Conversion from CG to all-atom trajectories using the powerful chain restoration algorithm (PULCHRA v3.06) (89) for the proteins in SI Appendix, Table S2 for which we calculate the PRE data.

  • 3.

    Calculation of per-frame radii of gyration and PRE data. The PRE rates, Γ2, and intensity ratios, Ipara/Idia, are calculated using the rotamer library approach implemented in DEER-PREdict (34) with τt=100 ps and optimizing the correlation time, τc[1,10] ns, against the experimental data.

  • 4.

    Random selection of six λ values which are nudged by random numbers picked from a normal distribution of zero mean and SD 0.05. The prior probability distribution, P(λ), sets the bounds of the parameter space: any λi for which P(λi)=0 is further nudged until P(λi)0.

  • 5.
    Calculation of the Boltzmann weights for the ith frame as wi=exp{[U(ri,λk)U(ri,λ0)]/kBT}, where U(ri,λk) and U(ri,λ0) are the total Ashbaugh–Hatch energies of the ith frame for trial and initial λ values, respectively. If the effective fraction of frames,
    ϕeff=exp[iNframeswilog(wi×Nframes)], (1)
    is below 30%, the trial λk is discarded.
  • 6.
    The per-frame radii of gyration and PRE observables are reweighted, and the extent of agreement with the experimental data is estimated as
    χRg2=(RgexpRgcalcσexp)2 (2)
    and
    χPRE2=1NlabelsNresjNlabelsiNres(YijexpYijcalcσijexp)2, (3)
    where σijexp is the error on the experimental values, Y is either Ipara/Idia or Γ2, Nlabels is the number of spin-labeled mutants, and Nres is the number of measured residues.
  • 7.
    Following the Metropolis criterion (90), the kth set of λ values is accepted with probability
    Ak1k={exp[L(λk1)L(λλk)ξk],L(λk)>L(λk1)1,L(λk)L(λk1), (4)

where the control parameter, ξk, scales with the number of iterations as ξ=ξ0×0.99k. L is the cost function

L(λ)=χRg2(λ)+ηχPRE2(λ)θiln[P(λi)], (5)

where χRg2(λ) and χPRE2(λ) are averages over the proteins in the training sets. θ and η are hyperparameters of the optimization procedure. θ determines the trade-off between overfitting and underfitting the training set, whereas η sets the relative weight of the PRE data with respect to the radii of gyration.

Steps 4 to 7 are iterated until ξ<1015, when the reweighting cycle is interrupted and a new CG simulation is carried out with the trained λ values. A complete parameter-learning procedure consists of two reweighting cycles starting from ξ0=2 followed by three cycles starting from ξ0=0.1. The threshold on ϕeff results in average absolute differences between χ2 values estimated from reweighting and calculated from trajectories performed with the corresponding parameters of ∼ 1.8 and ∼ 0.8 for ηχPRE2 and χRg2, respectively (SI Appendix, Fig. S24).

Acknowledgments

We thank Veronica Ryan and Nicolas L. Fawzi for sharing the PRE data for FUS LCD, FUS12E LCD, and A2 LCD as well as Robert Konrat for sharing the intramolecular PRE data for Osteopontin. We thank Robert B. Best for sharing data on compaction of IDPs; Gregory L. Dignon and Jeetain Mittal for help setting up simulations with the HPS model; and Tanja Mittag, Massimiliano Bonomi, and Benjamin Schuler for helpful discussions. We acknowledge funding from the BRAINSTRUC structural biology initiative from the Lundbeck Foundation (Grant R155-2015-2666) and acknowledge access to computational resources from the Resource for Biomolecular Simulations (ROBUST) (supported by Novo Nordisk Foundation Grant NNF18OC0032608) and Biocomputing Core Facility at the Department of Biology, University of Copenhagen. T.K.S. acknowledges support from the Novo Scholarship Programme 2021. This project has received funding from the European Union’s Horizon 2020 research and innovation program under the Marie Skłodowska-Curie Grant Agreement 101025063.

Footnotes

The authors declare no competing interest.

This article is a PNAS Direct Submission.

This article contains supporting information online at https://www.pnas.org/lookup/suppl/doi:10.1073/pnas.2111696118/-/DCSupplemental.

Data Availability

Datasets, amino acid sequences, code, and Jupyter Notebooks for reproducing our simulations and analyses have been deposited in publicly accessible repositories on GitHub (https://github.com/KULL-Centre/papers/tree/main/2021/CG-IDPs-Tesei-et-al) (91) and on Zenodo (DOI: 10.5281/zenodo.5005953) (92).

References

  • 1.Patel A., et al., A liquid-to-solid phase transition of the ALS protein FUS accelerated by disease mutation. Cell 162, 1066–1077 (2015). [DOI] [PubMed] [Google Scholar]
  • 2.Wegmann S., et al., Tau protein liquid-liquid phase separation can initiate tau aggregation. EMBO J. 37, e98049 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Kanaan N. M., Hamel C., Grabinski T., Combs B., Liquid-liquid phase separation induces pathogenic tau conformations in vitro. Nat. Commun. 11, 2809 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Ray S., et al., a-Synuclein aggregation nucleates through liquid-liquid phase separation. Nat. Chem. 12, 705–716 (2020). [DOI] [PubMed] [Google Scholar]
  • 5.Hardenberg MC, et al., Observation of an α-synuclein liquid droplet state and its maturation into Lewy body-like assemblies. J. Mol. Cell Biol. 13, 282–294 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Shin Y., Brangwynne C. P., Liquid phase condensation in cell physiology and disease. Science 357, eaaf4382 (2017). [DOI] [PubMed] [Google Scholar]
  • 7.Nedelsky N. B., Taylor J. P., Bridging biophysics and neurology: Aberrant phase transitions in neurodegenerative disease. Nat. Rev. Neurol. 15, 272–286 (2019). [DOI] [PubMed] [Google Scholar]
  • 8.Nott T. J., et al., Phase transition of a disordered nuage protein generates environmentally responsive membraneless organelles. Mol. Cell 57, 936–947 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Brangwynne C. P., Tompa P., Pappu R. V., Polymer physics of intracellular phase transitions. Nat. Phys. 11, 899–904 (2015). [Google Scholar]
  • 10.Vernon R. M., et al., Pi-Pi contacts are an overlooked protein feature relevant to phase separation. eLife 7, e31486 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Schuster B. S., et al., Identifying sequence perturbations to an intrinsically disordered protein that determine its phase-separation behavior. Proc. Natl. Acad. Sci. U.S.A. 117, 11421–11431 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Dignon G. L., Best R. B., Mittal J., Biomolecular phase separation: From molecular driving forces to macroscopic properties. Annu. Rev. Phys. Chem. 71, 53–75 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Brady J. P., et al., Structural and hydrodynamic properties of an intrinsically disordered region of a germ cell-specific protein on phase separation. Proc. Natl. Acad. Sci. U.S.A. 114, E8194–E8203 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Wang J., et al., A molecular grammar governing the driving forces for phase separation of prion-like RNA binding proteins. Cell 174, 688–699.e16 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Martin E. W., et al., Valence and patterning of aromatic residues determine the phase behavior of prion-like domains. Science 367, 694–699 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Bremer A., et al., Deciphering how naturally occurring sequence features impact the phase behaviors of disordered prion-like domains. bioRxiv [Preprint] (2021). 10.1101/2021.01.01.425046 (Accessed 4 January 2021). [DOI] [PMC free article] [PubMed]
  • 17.Krainer G., et al., Reentrant liquid condensate phase of proteins is stabilized by hydrophobic and non-ionic interactions. Nat. Commun. 12, 1085 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Vazdar M., et al., Arginine “magic”: Guanidinium like-charge ion pairing from aqueous salts to cell penetrating peptides. Acc. Chem. Res. 51, 1455–1464 (2018). [DOI] [PubMed] [Google Scholar]
  • 19.Fossat M. J., Zeng X., Pappu R. V., Uncovering differences in hydration free energies and structures for model compound mimics of charged side chains of amino acids. J. Phys. Chem. B 125, 4148–4161 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Dignon G. L., Zheng W., Best R. B., Kim Y. C., Mittal J., Relation between single-molecule properties and phase behavior of intrinsically disordered proteins. Proc. Natl. Acad. Sci. U.S.A. 115, 9929–9934 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Dignon G. L., Zheng W., Kim Y. C., Best R. B., Mittal J., Sequence determinants of protein phase behavior from a coarse-grained model. PLOS Comput. Biol. 14, e1005941 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Ryan V. H., et al., Mechanistic view of hnRNPA2 low-complexity domain structure, interactions, and phase separation altered by mutation and arginine methylation. Mol. Cell 69, 465–479.e7 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Monahan Z., et al., Phosphorylation of the FUS low-complexity domain disrupts phase separation, aggregation, and toxicity. EMBO J. 36, 2951–2967 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Ryan V. H., et al., Tyrosine phosphorylation regulates hnRNPA2 granule protein partitioning and reduces neurodegeneration. EMBO J. 40, e105001 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Das S., Lin Y. H., Vernon R. M., Forman-Kay J. D., Chan H. S., Comparative roles of charge, p, and hydrophobic interactions in sequence-dependent phase separation of intrinsically disordered proteins. Proc. Natl. Acad. Sci. U.S.A. 117, 28795–28805 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Choi J. M., Holehouse A. S., Pappu R. V., Physical principles underlying the complex biology of intracellular phase transitions. Annu. Rev. Biophys. 49, 107–133 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Kapcha L. H., Rossky P. J., A simple atomic-level hydrophobicity scale reveals protein interfacial structure. J. Mol. Biol. 426, 484–498 (2014). [DOI] [PubMed] [Google Scholar]
  • 28.Regy R. M., Thompson J., Kim Y. C., Mittal J., Improved coarse-grained model for studying sequence dependent phase separation of disordered proteins. Protein Sci. 30, 1371–1379 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Urry D. W., et al., Hydrophobicity scale for proteins based on inverse temperature transitions. Biopolymers 32, 1243–1250 (1992). [DOI] [PubMed] [Google Scholar]
  • 30.Norgaard A. B., Ferkinghoff-Borg J., Lindorff-Larsen K., Experimental parameterization of an energy function for the simulation of unfolded proteins. Biophys. J. 94, 182–192 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Wang L. P., Martinez T. J., Pande V. S., Building force fields: An automatic, systematic, and reproducible approach. J. Phys. Chem. Lett. 5, 1885–1891 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Tiana G., Giorgetti L., “Coarse graining of a giant molecular system: The chromatin fiber” in Biomolecular Simulations: Methods in Molecular Biology, Bonomi M., Camilloni C., Eds. (Springer, New York, 2019), vol. 2022, pp. 399–411. [DOI] [PubMed] [Google Scholar]
  • 33.Dannenhoffer-Lafage T., Best R. B., A data-driven hydrophobicity scale for predicting liquid–liquid phase separation of proteins. J. Phys. Chem. B 125, 4046–4056 (2021). [DOI] [PubMed] [Google Scholar]
  • 34.Tesei G., et al., DEER-PREdict: Software for efficient calculation of spin-labeling EPR and NMR data from conformational ensembles. PLOS Comput. Biol. 17, e1008551 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Burke K. A., Janke A. M., Rhine C. L., Fawzi N. L., Residue-by-residue view of in vitro FUS granules that bind the C-terminal domain of RNA polymerase II. Mol. Cell 60, 231–241 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Murthy A. C., et al., Molecular interactions underlying liquid-liquid phase separation of the FUS low-complexity domain. Nat. Struct. Mol. Biol. 26, 637–648 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Chan H. S., Amino Acid Side-Chain Hydrophobicity (American Cancer Society, 2002). [Google Scholar]
  • 38.Simm S., Einloft J., Mirus O., Schleiff E., 50 years of amino acid hydrophobicity scales: Revisiting the capacity for peptide classification. Biol. Res. 49, 31 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Bull H. B., Breese K., Surface tension of amino acid solutions: A hydrophobicity scale of the amino acid residues. Arch. Biochem. Biophys. 161, 665–670 (1974). [DOI] [PubMed] [Google Scholar]
  • 40.Guy H. R., Amino acid side-chain partition energies and distribution of residues in soluble proteins. Biophys. J. 47, 61–70 (1985). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Bishop C. M., Walkenhorst W. F., Wimley W. C., Folding of β-sheets in membranes: Specificity and promiscuity in peptide model systems. J. Mol. Biol. 309, 975–988 (2001). [DOI] [PubMed] [Google Scholar]
  • 42.Welling G. W., Weijer W. J., van der Zee R., Welling-Wester S., Prediction of sequential antigenic regions in proteins. FEBS Lett. 188, 215–218 (1985). [DOI] [PubMed] [Google Scholar]
  • 43.Dedmon M. M., Lindorff-Larsen K., Christodoulou J., Vendruscolo M., Dobson C. M., Mapping long-range interactions in a-synuclein using spin-label NMR and ensemble molecular dynamics simulations. J. Am. Chem. Soc. 127, 476–477 (2005). [DOI] [PubMed] [Google Scholar]
  • 44.Jephthah S., Staby L., Kragelund B. B., Skepö M., Temperature dependence of intrinsically disordered proteins in simulations: What are we missing? J. Chem. Theory Comput. 15, 2672–2683 (2019). [DOI] [PubMed] [Google Scholar]
  • 45.Fagerberg E., Månsson L. K., Lenton S., Skepö M., The effects of chain length on the structural properties of intrinsically disordered proteins in concentrated solutions. J. Phys. Chem. B 124, 11843–11853 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Kjaergaard M., et al., Temperature-dependent structural changes in intrinsically disordered proteins: Formation of a-helices or loss of polyproline II? Protein Sci. 19, 1555–1564 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Gomes G. W., et al., Conformational ensembles of an intrinsically disordered protein consistent with NMR, SAXS, and single-molecule FRET. J. Am. Chem. Soc. 142, 15697–15710 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Shrestha U. R., et al., Generation of the configurational ensemble of an intrinsically disordered protein from unbiased molecular dynamics simulation. Proc. Natl. Acad. Sci. U.S.A. 116, 20446–20452 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Johnson C. L., et al., The two-state prehensile tail of the antibacterial toxin colicin n. Biophys. J. 113, 1673–1684 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.De Biasio A., et al., p15PAF is an intrinsically disordered protein with nonrandom structural preferences at sites of interaction with other proteins. Biophys. J. 106, 865–874 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Paz A., et al., Biophysical characterization of the unstructured cytoplasmic domain of the human neuronal adhesion protein neuroligin 3. Biophys. J. 95, 1928–1944 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Riback J. A., et al., Innovative scattering analysis shows that hydrophobic disordered proteins are expanded in water. Science 358, 238–241 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Ahmed M. C., et al., Refinement of α-synuclein ensembles against SAXS data: Comparison of force fields and methods. Front. Mol. Biosci. 8, 216 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Mylonas E., et al., Domain conformation of tau protein studied by solution small-angle X-ray scattering. Biochemistry 47, 10345–10353 (2008). [DOI] [PubMed] [Google Scholar]
  • 55.Platzer G., et al., The metastasis-associated extracellular matrix protein osteopontin forms transient structure in ligand interaction sites. Biochemistry 50, 6113–6124 (2011). [DOI] [PubMed] [Google Scholar]
  • 56.Mittag T., et al., Structure/function implications in a dynamic complex of the intrinsically disordered Sic1 with the Cdc4 subunit of an SCF ubiquitin ligase. Structure 18, 494–506 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Kurzbach D., et al., Detection of correlated conformational fluctuations in intrinsically disordered proteins through paramagnetic relaxation interference. Phys. Chem. Chem. Phys. 18, 5753–5758 (2016). [DOI] [PubMed] [Google Scholar]
  • 58.Panagiotopoulos A. Z., Wong V., Floriano M. A., Phase equilibria of lattice polymers from histogram reweighting Monte Carlo simulations. Macromolecules 31, 912–918 (1998). [Google Scholar]
  • 59.Lin Y. H., Chan H. S., Phase separation and single-chain compactness of charged disordered proteins are strongly correlated. Biophys. J. 112, 2043–2046 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Shrestha U. R., Smith J. C., Petridis L., Full structural ensembles of intrinsically disordered proteins from unbiased molecular dynamics simulations. Commun. Biol. 4, 243 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Greig J. A., et al., Arginine-enriched mixed-charge domains provide cohesion for nuclear speckle condensation. Mol. Cell 77, 1237–1250.e4 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Fisher R. S., Elbaum-Garfinkle S., Tunable multiphase dynamics of arginine and lysine liquid condensates. Nat. Commun. 11, 4628 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Wimley W. C., White S. H., Experimentally determined hydrophobicity scale for proteins at membrane interfaces. Nat. Struct. Biol. 3, 842–848 (1996). [DOI] [PubMed] [Google Scholar]
  • 64.Ponnuswamy P. K., Gromiha M. M., Prediction of transmembrane helices from hydrophobic characteristics of proteins. Int. J. Pept. Protein Res. 42, 326–341 (1993). [DOI] [PubMed] [Google Scholar]
  • 65.Mukrasch M. D., et al., Structural polymorphism of 441-residue tau at single residue resolution. PLoS Biol. 7, e34 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Jost Lopez A., Quoika P. K., Linke M., Hummer G., Köfinger J., Quantifying protein–protein interactions in molecular simulations. J. Phys. Chem. B 124, 4673–4685 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Elbaum-Garfinkle S., et al., The disordered P granule protein LAF-1 drives phase separation into droplets with tunable viscosity and dynamics. Proc. Natl. Acad. Sci. U.S.A. 112, 7189–7194 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Wei M. T., et al., Phase behaviour of disordered proteins underlying low density and high permeability of liquid organelles. Nat. Chem. 9, 1118–1125 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Taylor N. O., Wei M. T., Stone H. A., Brangwynne C. P., Quantifying dynamics in phase-separated condensates using fluorescence recovery after photobleaching. Biophys. J. 117, 1285–1300 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Dignon G. L., Zheng W., Kim Y. C., Mittal J., Temperature-controlled liquid-liquid phase separation of disordered proteins. ACS Cent. Sci. 5, 821–830 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Foley T. T., Shell M. S., Noid W. G., The impact of resolution upon entropy and information in coarse-grained models. J. Chem. Phys. 143, 243104 (2015). [DOI] [PubMed] [Google Scholar]
  • 72.Jin J., Pak A. J., Voth G. A., Understanding missing entropy in coarse-grained systems: Addressing issues of representability and transferability. J. Phys. Chem. Lett. 10, 4549–4557 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Tesei G., et al., Self-association of a highly charged arginine-rich cell-penetrating peptide. Proc. Natl. Acad. Sci. U.S.A. 114, 11428–11433 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Zeng X., Holehouse A. S., Chilkoti A., Mittag T., Pappu R. V., Connecting coil-to-globule transitions to full phase diagrams for intrinsically disordered proteins. Biophys. J. 119, 402–418 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Wirmer J., Peti W., Schwalbe H., Motional properties of unfolded ubiquitin: A model for a random coil protein. J. Biomol. NMR 35, 175–186 (2006). [DOI] [PubMed] [Google Scholar]
  • 76.Raos G., Allegra G., Chain collapse and phase separation in poor-solvent polymer solutions: A unified molecular description. J. Chem. Phys. 104, 1626–1645 (1996). [Google Scholar]
  • 77.Wen J., et al., Conformational expansion of tau in condensates promotes irreversible aggregation. J. Am. Chem. Soc. 143, 13056–13064 (2021). [DOI] [PubMed] [Google Scholar]
  • 78.Latham A. P., Zhang B., Maximum entropy optimized force field for intrinsically disordered proteins. J. Chem. Theory Comput. 16, 773–781 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Elbaum-Garfinkle S., Matter over mind: Liquid phase separation and neurodegeneration. J. Biol. Chem. 294, 7160–7168 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Brown D. G., Shorter J., Wobst H. J., Emerging small-molecule therapeutic approaches for amyotrophic lateral sclerosis and frontotemporal dementia. Bioorg. Med. Chem. Lett. 30, 126942 (2020). [DOI] [PubMed] [Google Scholar]
  • 81.Siegert A, et al., Interplay between tau and α-synuclein liquid–liquid phase separation. Protein Sci. 30, 1326–1336 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82.Ruff K. M., Dar F., Pappu R. V., Polyphasic linkage and the impact of ligand binding on the regulation of biomolecular condensates. Biophys. Rev. 2, 021302 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 83.Perdikari T. M., et al., A predictive coarse-grained model for position-specific effects of post-translational modifications. Biophys. J. 120, 1187–1197 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 84.Wohl S., Jakubowski M., Zheng W., Salt-dependent conformational changes of intrinsically disordered proteins. J. Phys. Chem. Lett. 12, 6684–6691 (2021). [DOI] [PubMed] [Google Scholar]
  • 85.Anderson J. A., Glaser J., Glotzer S. C., HOOMD-blue: A python package for high-performance molecular dynamics and hard particle Monte Carlo simulations. Comput. Mater. Sci. 173, 109363 (2020). [Google Scholar]
  • 86.Eastman P., et al., OpenMM 7: Rapid development of high performance algorithms for molecular dynamics. PLOS Comput. Biol. 13, e1005659 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 87.Cesari A., et al., Fitting corrections to an RNA force field using experimental data. J. Chem. Theory Comput. 15, 3425–3431 (2019). [DOI] [PubMed] [Google Scholar]
  • 88.Orioli S., Larsen A. H., Bottaro S., Lindorff-Larsen K., “How to learn from inconsistencies: Integrating molecular simulations with experimental data” in Computational Approaches for Understanding Dynamical Systems: Protein Folding and Assembly, Strodel B., Barz B., Eds. (Elsevier, 2020), pp. 123–176. [DOI] [PubMed] [Google Scholar]
  • 89.Rotkiewicz P., Skolnick J., Fast procedure for reconstruction of full-atom protein models from reduced representations. J. Comput. Chem. 29, 1460–1465 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 90.Schuur P. C., Classification of acceptance criteria for the simulated annealing algorithm. Math. Oper. Res. 22, 266–275 (1997). [Google Scholar]
  • 91.Tesei G., Schulze T. K., Crehuet R., Lindorff-Larsen K., CG model of liquid-liquid phase behaviour of IDPs. GitHub. https://github.com/KULL-Centre/papers/tree/main/2021/CG-IDPs-Tesei-et-al. Accessed 30 September 2021. [DOI] [PMC free article] [PubMed]
  • 92.Tesei G., Schulze T. K., Crehuet R., Lindorff-Larsen K., CG model of liquid-liquid phase behaviour of IDPs. Zenodo. 10.5281/zenodo.5005953. Accessed 10 September 2021. [DOI] [PMC free article] [PubMed]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Datasets, amino acid sequences, code, and Jupyter Notebooks for reproducing our simulations and analyses have been deposited in publicly accessible repositories on GitHub (https://github.com/KULL-Centre/papers/tree/main/2021/CG-IDPs-Tesei-et-al) (91) and on Zenodo (DOI: 10.5281/zenodo.5005953) (92).


Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES