Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2026 Feb 19.
Published in final edited form as: J Chem Theory Comput. 2025 Oct 25;21(21):11282–11292. doi: 10.1021/acs.jctc.5c01329

Sequence-Dependent Conformational Landscapes of Intrinsically Disordered Proteins Reveal Asymmetric Chain Compaction

Cong Wang 1, Bin Zhang 1,*
PMCID: PMC12914743  NIHMSID: NIHMS2135286  PMID: 41137856

Abstract

Intrinsically disordered proteins (IDPs) exhibit highly dynamic and heterogeneous conformational ensembles that are strongly influenced by sequence features. While global properties such as chain compaction and scaling behavior have been widely studied, they often fail to resolve the fine-grained, sequence-specific structural variation that underlies IDP function. Here, we perform long-timescale atomistic simulations of 47 representative IDP sequences from the yeast proteome to systematically investigate the relationship between sequence composition and conformational ensemble. To analyze the high-dimensional structural data, we apply uniform manifold approximation and projection (UMAP), a nonlinear dimensionality reduction technique that preserves local structural relationships. The resulting low-dimensional embeddings effectively differentiate IDP ensembles and reveal a novel descriptor—local compactness asymmetry—that quantifies directional differences in chain organization. This metric, denoted γRg, captures conformational features orthogonal to traditional global measures such as radius of gyration and end-to-end distance. We show that γRg correlates with sequence-level asymmetries in charge and hydropathy, and that conformational dynamics preferentially occur in the more extended region of the chain. The simulation dataset generated in this work also provides a valuable resource for training machine learning models and developing improved coarse-grained force fields for disordered proteins.

For Table of Contents Only

graphic file with name nihms-2135286-f0006.jpg

Introduction

Intrinsically disordered proteins (IDPs) are widely present in nature,14 exhibiting high sequence diversity and playing essential roles in biological processes such as signal transduction, molecular recognition, and regulation.58 Dysregulation of IDPs can result in a variety of human diseases, including neurodegeneration and cancer.914 Unlike ordered proteins, IDPs lack a stable, well-defined three-dimensional structure due to their enrichment in hydrophilic amino acids and weak intramolecular interactions.1517 Instead, they adopt dynamic conformations, forming a heterogeneous conformational ensemble that is influenced by both their sequence and environmental conditions. Understanding this ensemble is crucial for elucidating IDPs’ functional mechanisms, characterizing their interactions with other biomolecules, and designing therapeutic strategies targeting disordered regions.18,19

Extensive experimental efforts have been devoted to characterizing the conformational properties of IDPs, using techniques such as nuclear magnetic resonance (NMR) spectroscopy,2022 small-angle X-ray scattering (SAXS),23,24 and fluorescence resonance energy transfer (FRET).2527 These methods have provided important insights into the conformational diversity of IDPs. However, their ensemble-averaged nature limits the resolution at which individual conformations can be discerned. Moreover, their inherently low throughput poses practical challenges for large-scale studies aimed at extracting general sequence–ensemble relationships.

Computational modeling offers a valuable complement to experimental approaches by enabling direct access to structural ensembles. Among these, coarse-grained molecular dynamics (MD) simulations are particularly popular due to their efficiency.2834 Numerous coarse-grained force fields have been developed to simulate IDPs with increasing accuracy, facilitating the rapid generation of conformational ensembles.3555 However, their reduced resolution limits the ability to capture local structural fluctuations and sequence-specific interactions that are likely critical for function. While recent deep learning models have achieved remarkable success in predicting the structure of folded proteins from sequence,56,57 the high conformational heterogeneity of IDPs renders analogous predictions substantially more difficult.5861

Beyond the limitations of current simulation models, there is also a need to revisit the metrics used to describe IDP conformational ensembles. Traditional descriptors—such as radius of gyration, end-to-end distance, asphericity, and the Flory–Huggins scaling exponent—are largely inspired by polymer physics studies of homopolymers.6268 These global measures offer important summary statistics but often fail to capture the sequence-encoded features that distinguish individual IDPs.

In this study, we seek to enrich our understanding of the sequence–ensemble relationship by performing extensive atomistic simulations of IDPs in explicit solvent. Recent advances in force field development have improved the accuracy of atomistic modeling of IDPs,6974 while modern hardware and software now allow simulations to reach biologically relevant timescales. To systematically explore sequence space, we selected a diverse set of representative IDP sequences from the yeast proteome. The resulting simulation data provide a valuable resource, which we make freely available to the community.

To analyze the high-dimensional conformational ensembles generated by these simulations, we employed uniform manifold approximation and projection (UMAP),75 a nonlinear dimensionality reduction technique that preserves local geometric features of the data. By comparing IDP ensembles across different sequences, we identified sequence-dependent structural features not captured by traditional global metrics. In particular, we discovered a novel descriptor that quantifies asymmetric chain compaction, revealing directional preferences in local conformational organization. This analysis also uncovered dynamic correlations between global expansion and local asymmetry, further highlighting the structured nature of IDP fluctuations. Together, these findings provide a refined view of IDP conformational landscapes and demonstrate how high-resolution simulations integrated with data-driven analyses can reveal previously inaccessible features of IDP behavior.

Results

Selection of representative intrinsically disordered proteins from the yeast proteome

IDPs exhibit immense diversity in their sequences and compositions, contributing to their varied conformations and functional roles.76,77 A comprehensive characterization of the relationship between sequence and conformational ensemble necessitates a sufficient coverage of the whole sequence space. We select representative sequences from the yeast proteome because of the extensive sequence analysis in prior studies.

Our sequence selection criteria followed the clustering work by Zarin et al.78, with custom modifications to ensure amino acid diversity. Using sequence-based evolutionary features, Zarin et al.78 clustered the IDRs from the S. cerevisiae proteome and their orthologous IDRs into 53 groups using a hierarchical clustering algorithm (Figure 1A). These clusters represent IDPs with distinct sequence motifs that encode unique biological functions, providing excellent representation of the diverse biological sequences.

Figure 1: Selection and characterization of representative IDP sequences from the yeast proteome.

Figure 1:

(A) Schematic overview of the workflow, including sequence clustering, representative selection, atomistic simulations, and ensemble analysis. (B) Criteria used to select one representative 40-residue IDR per cluster, including sequence length, disorder content (pLDDT), and amino acid diversity. (C) Distribution of pLDDT scores across all 40-residue IDPs (red) and the selected representative sequences (blue). The threshold of pLDDT < 72, used to define disorder, is indicated. (D) Amino acid composition of all candidate 40-residue IDPs (red) and selected representative sequences (blue), demonstrating that the selected set preserves the overall compositional diversity of the yeast IDR proteome.

We attempted to select one sequence per cluster for further characterization using the following criteria (Figure 1B). First, we restrict the protein sequence length to 40 amino acids, balancing the need to capture essential sequence features and maintaining simulation efficiency. A consistent sequence length further facilitates the comparison of structural ensembles between different IDPs. Secondly, highly disordered sequences were prioritized to minimize the inclusion of ordered regions, which can occasionally be present in IDPs. The degree of disorder was evaluated using per-residue pLDDT scores predicted by ColabFold.79 According to the threshold established by Wilson et al.80, residues with pLDDT scores < 72 were classified as disordered. Sequences with more than 90% of disordered residues were retained for further consideration (Figure 1C).

Finally, for the remaining 47 clusters with non-zero sequences, we ranked the sequences in each cluster based on cysteine (C) and tryptophan (W) content. If multiple sequences had identical C and W content, we used amino acid diversity as a secondary criterion, giving priority to sequences with a broader variety of amino acid types. This ranking criterion was adopted because random selection of a single sequence per cluster could have led to an imbalanced amino acid representation for highly underrepresented residues including C and W (Figure 1D). As a result, our systematic approach ensured that the top-ranked sequence from each cluster contributed to a balanced and representative amino acid composition across all selected sequences (Figure 1D). The complete list of selected sequences is available in the Supporting Information.

Atomistic simulations generate conformational ensembles for IDPs

We carry out long-time scale simulations of the selected representative sequences using explicit solvent atomistics force fields, a99SB-disp,81 which has been shown to accurately capture IDP conformations.8184 Each simulation lasted 10.5 microseconds. To verify equilibration of the trajectories, we compared the radius of gyration distributions between the first and second halves of the simulations (Figure S1) and further assessed convergence using block-averaged distributions of multiple properties (Figures S2S4, S6S7, and S20S21).

Consistent with previous studies,68,8789 the Flory–Huggins scaling exponents obtained by fixing the prefactor to 0.55 nm90 fall within the range of 0.5–0.6 for most proteins (Figure 2A), indicative of expanded conformations. The average radius of gyration (Rg) values also cluster around 1.6 nm (Figure 2D, inset), suggesting a broadly comparable level of compaction across proteins.

Figure 2: Diverse conformational ensembles revealed by atomistic simulations of distinct IDP sequences.

Figure 2:

(A) Distribution of Flory–Huggins scaling exponents computed for all simulated IDPs. (B) Scatter plot of mean hydropathy versus absolute mean net charge, with each point representing an IDP sequence and colored by its average Rg. Mean hydropathy is calculated as the average normalized Kyte–Doolittle score85 across all residues in the sequence, as introduced by Uversky et al.86. (C) Scatter plot of the fraction of positively charged residues versus the fraction of negatively charged residues for each IDP sequence, with points colored by average Rg (Pappu plot). (D) Probability density distributions of Rg for each IDP ensemble, with curves colored by their respective average Rg. The inset shows the distribution of average Rg values across all sequences. (E) Probability density distributions of end-to-end distance (Ree) for each IDP ensemble, colored by their average Ree. (F) Probability density distributions of asphericity for each ensemble, with curves colored by the corresponding average asphericity. (G) Representative IDPs with nearly identical average Rg values, highlighting their distinct local secondary structure propensities along the sequence. Line plots show residue-wise probabilities of helix and strand formation.

The distributions of Rg, end-to-end distance (Ree), and asphericity further reinforce this observation, showing substantial overlap across different IDP ensembles (Figures 2DF and Figures S2S4). These similarities imply a degree of global conformational resemblance when assessed using conventional shape descriptors.

This resemblance is in line with sequence-level features known to influence protein size. As shown in Figure 2B, most IDPs share moderate mean hydropathy and net charge, conditions associated with extended ensembles. IDPs exhibiting larger average Rg values tend to deviate from this trend, typically displaying lower hydropathy and higher net charge. A similar pattern emerges in the Pappu plot (Figure 2C), where most IDPs show comparable fractions of positively and negatively charged residues, while sequences with more asymmetric charge distributions tend to have larger Rg values.

However, visual inspection of representative 3D structures reveals substantial differences (Figure S5), even among IDPs with nearly identical Rg values (Figure 2G). This diversity is further reflected in the residue-wise probability of secondary structure formation (Figure 2G and S6), and in the distribution of root-mean-square deviations (RMSD) relative to IDP 1 (Figure S7).

Together, these results underscore the limitations of global descriptors such as Rg in capturing critical, sequence-specific conformational features of IDPs. While secondary structure profiles can reveal distinctions between ensembles, their discrete nature makes them less convenient for detailed structural analyses. This motivates the development of continuous, numeric metrics that can more effectively characterize and differentiate IDP conformations.

UMAP embeddings differentiate IDP conformational landscapes

To systematically distinguish IDP conformations and identify new structural descriptors, we applied Uniform Manifold Approximation and Projection (UMAP) analysis.75 As described in the Methods section, UMAP is a dimensionality reduction technique that generates low-dimensional projections that best preserve the topological structure of the original high-dimensional data.75 This approach maintains local neighborhood relationships—conformations that are similar in the high-dimensional space remain close in the projection—while also ensuring that structurally divergent conformations are well separated. This allows the resulting low-dimensional embeddings to effectively consolidate structural information and accentuate key differences across the ensemble of 3D conformations.

Each conformation, sampled from atomistic simulations, was represented as a vector encoding the non-redundant pairwise distances between all α-carbon atoms. To provide a reference for comparison, we also included conformations generated from the self avoiding worm like chain (SAWLC) polymer model91 (see Methods). These vectors served as input to the UMAP algorithm, which mapped the high-dimensional structural data onto two variables, denoted as UMAP1 and UMAP2, collectively referred to as the UMAP embedding.

As shown in Figure S8, the UMAP embeddings of different IDP sequences form distinct distributions, underscoring the sequence dependence of IDP conformational ensembles. To illustrate the utility of these embeddings, we focus on IDPs 9, 12, and 30, which differ markedly in sequence and secondary structure content (Figure 3A) but exhibit nearly identical average Rg values and similar Rg distributions (Figure 3B). Despite these similarities in global metrics, their UMAP embeddings are clearly separated (Figure 3C), particularly along the UMAP2 axis (Figure 3D), indicating differences in their dominant conformational states. This demonstrates that UMAP effectively resolves sequence-specific conformational features that are not captured by Rg alone.

Figure 3: UMAP embeddings differentiate IDP conformational landscapes.

Figure 3:

(A) Top: Amino acid sequences of IDPs 9, 12, and 30, with positively charged residues shown in red and negatively charged residues in blue. Bottom: Representative conformations from atomistic simulations of the three IDPs, and for the SAWLC model. Structures are colored from blue (N-terminus) to red (C-terminus). (B) Probability density distributions of Rg for the four ensembles, illustrating their similar global compaction. (C) Two-dimensional probability density distributions of UMAP embeddings for the same ensembles, showing clear separation in conformational space despite similar Rg. (D) One-dimensional probability density distributions along the UMAP2 axis, highlighting differences in dominant conformational features not resolved by global metrics.

Although the UMAP1 and UMAP2 distributions peak at different values for each protein, the ranges of the embeddings exhibit considerable overlap. This reflects the flexible nature of IDPs, whose conformational states form a continuum rather than discrete structural classes. This pattern holds across all simulated IDPs: the overall distribution of UMAP embeddings spans a continuous space with no sharply defined clusters (Figure S9), consistent with the inherent heterogeneity and dynamic nature of IDP ensembles.

To evaluate the robustness of our approach, we also tested an alternative representation for UMAP input: the RMSD of each conformation relative to a set of reference structures. The resulting embeddings closely resemble those obtained from pairwise α-carbon distances (Figure S10A), further supporting the reliability and consistency of the analysis.

Linking UMAP Embeddings to Structural and Sequence Features

While the UMAP embeddings effectively differentiate IDP conformations, their physical interpretation is not immediately apparent. To better understand the structural features encoded by each axis, we examined representative conformations along UMAP1 and UMAP2. As shown in Figure 4A, structures evolve smoothly from compact to extended as UMAP1 increases, indicating that this axis primarily reflects changes in global size. In contrast, UMAP2 captures asymmetry in chain compaction—specifically, how structural compactness is distributed between the two termini—revealing features not accessible through conventional global metrics.

Figure 4: Linking UMAP embeddings to structural and sequence Features.

Figure 4:

(A) Representative conformations illustrating structural variation along the UMAP1 and UMAP2 axes. (B) Pearson correlation coefficients between UMAP1 or UMAP2 and various conformational descriptors, including Rg, Ree, asphericity, and local compactness asymmetry γRg. (C) Scatter plot of radius of gyration (Rg) versus γRg, illustrating their low correlation. (D) UMAP embedding of all conformational ensembles, with each point colored by Rg. (E) UMAP embedding of all conformational ensembles, with each point colored by γRg. (F) Correlation between γRg and differences in sequence features between the N- and C-terminal halves of each IDP. (G) Correlation between UMAP2 values and the same sequence feature differences as in (F).

To provide a more quantitative interpretation of the UMAP embeddings, we evaluated their correlations with several commonly used structural descriptors: the radius of gyration (Rg), end-to-end distance (Ree), asphericity, and a newly defined metric, γRg. The latter quantifies local compactness asymmetry along the chain and is defined as

γRg=Rg1Rg2Rg1+Rg2, (1)

where Rg(1) and Rg(2) are the radii of gyration computed from the first and second halves of the sequence, respectively.

As summarized in Figure 4B and Figures S12S12, UMAP1 exhibits strong correlations with Rg, Ree, and asphericity, with the highest correlation observed for Rg. This confirms that UMAP1 retains key global size and shape features after dimensionality reduction. In contrast, UMAP2 shows weak correlations with these conventional descriptors but is strongly correlated with γRg, suggesting that UMAP2 captures orthogonal structural variation related to local asymmetry. These relationships are further illustrated in Figures 4D, 4E, S13 and S14.

Notably, γRg is nearly uncorrelated with Rg itself (Figure 4C), whereas Ree and asphericity exhibit strong correlations with Rg (Figure S15), highlighting γRg as a complementary descriptor. While global metrics characterize overall compactness, γRg resolves asymmetry along the sequence, providing a more nuanced view of IDP structural variability.

To investigate the sequence determinants of this asymmetry, we computed the correlation between γRg and the difference in several common sequence features across the N- and C-terminal halves of each chain (Figure 4F). For a given sequence feature f, the difference is defined as

Δf=f1f2, (2)

where f (1) and f (2) are the values of f computed over the first and second halves of the sequence, respectively.

The sequence features analyzed include the absolute net charge per residue (|NCPR|), sequence charge decoration (SCD),66 fraction of charged residues (FCR), fraction of aromatic residues (faromatic), sequence hydropathy difference (SHD),66 mean hydropathy (⟨h⟩), κ parameter,63 and the fractions of polar and aliphatic residues (fpolar and faliphatic, respectively). Among these, Δ|NCPR| exhibits the strongest correlation with γRg, indicating that asymmetry in charge density along the chain is a key determinant of local compactness differences. Specifically, a higher |NCPR| in the N-terminal half or a lower |NCPR| in the C-terminal half is associated with a larger γRg, corresponding to greater extension of the N-terminal segment.

Other sequence features also show non-zero correlations with γRg, though none exceed an absolute value of 0.4, suggesting that γRg reflects a composite effect of multiple sequence characteristics rather than being dominated by a single factor. These results highlight the multifactorial nature of sequence-dependent conformational asymmetry in IDPs. We observe similar trends in the correlation between UMAP2 and these same sequence feature differences (Figure 4G), further supporting the interpretation that UMAP2 captures local structural asymmetry and its underlying sequence determinants.

Sequence-encoded bias in IDP conformational fluctuations

The pronounced asymmetry observed at the termini of various IDP conformations prompted us to further investigate their underlying dynamics. We hypothesize that the conventional coil–globule transition model,92 which resembles a first-order phase transition, may be insufficient to describe the conformational behavior of IDPs. Classical coil–globule transitions typically assume an isotropic, global collapse of the chain without directional bias. In contrast, the variation observed along UMAP2 and the γRg metric suggests a potential preference for asymmetric compaction—where one end of the chain may collapse more readily than the other. This directional bias implies that IDP dynamics may deviate from the uniform behavior predicted by standard polymer models and instead reflect sequence-encoded preferences for localized structural reorganization.

To investigate the coupling between global and local conformational dynamics in IDPs, we computed the time-lagged cross-correlation between fluctuations in radius of gyration (Rg) and local compactness asymmetry γRg:

CΔRg,ΔγRgτ=ΔRgt,ΔγRgt+τt (3)

where ΔRg(t) = Rg(t + 1, ns) − Rg(t), and ΔγRgt is defined analogously. Figure 5A (and Figure S16) summarizes C(τ) for all IDP systems alongside the SAWLC polymer model as a reference.

Figure 5: Dynamic coupling between global expansion and local asymmetry in IDP conformations.

Figure 5:

(A) Time-lagged cross-correlation between fluctuations in ΔRgt and local compactness asymmetry ΔγRgt for three representative IDP ensembles. (B) Grid-wise cross-correlation at zero lag time (τ = 0) between ΔRgt and ΔγRgt mapped onto the UMAP embeddings of IDPs 9, 12, and 30, and the SAWLC model. (C) Zero-lag cross-correlation between ΔRgt and ΔγRgt across all IDP ensembles, visualized over the UMAP embedding. (D) Schematic illustrations of positive and negative coupling between ΔRg and ΔγRg, corresponding to extended N-terminal or C-terminal regions, respectively, highlighting directional bias captured by UMAP2.

Despite exhibiting similar Rg distributions, the IDPs (e.g., IDP 9, 12, and 30) display distinct correlation profiles, indicating heterogeneity in their conformational dynamics. In particular, the zero-lag correlation C(0) varies substantially across systems—near zero for IDP 9, but positive or negative for IDPs 12 and 30, respectively. These correlations decay rapidly with increasing lag, typically vanishing beyond 1 ns, consistent with the fast internal dynamics of disordered proteins.

To further resolve the spatial origin of this dynamic coupling, we mapped C(0) values across the UMAP embedding (Figure 5B and Figure S17). As expected, the SAWLC ensemble displays negligible correlations throughout the landscape, consistent with its lack of structured dynamics. In contrast, the IDP ensembles exhibit a clear spatial trend: conformations with large UMAP2 values (corresponding to γRg>0) are enriched in positive C(0) values, whereas those with negative γRg show negative correlations. These observations indicate that the sign and magnitude of dynamic coupling between global and local shape fluctuations are linked to specific regions of conformational space.

This trend can be rationalized by considering the physical interpretation of γRg. When γRg>0, the N-terminal half of the chain is more extended than the C-terminal half. In this case, increases in Rg are primarily driven by fluctuations in the already extended N-terminal region, while the compact C-terminal region remains relatively unchanged—resulting in a positive correlation between ΔRg and ΔγRg. The opposite scenario occurs when γRg<0: fluctuations in Rg arise from the extended C-terminal segment, leading to negative correlations. Despite the sign reversal, both cases point to a shared physical mechanism: dynamic fluctuations preferentially originate from the more extended, and therefore more flexible, region of the chain. Compact regions, stabilized by local interactions, contribute less to the overall conformational change.

Conclusions and Discussion

In this study, we employed atomistic simulations to generate conformational ensembles for representative IDP sequences selected from the yeast proteome. To analyze the resulting high-dimensional structural data, we applied UMAP for dimensionality reduction, which revealed the heterogeneous nature of IDP ensembles and enabled systematic comparisons across sequences. Through examining the physical interpretation of the UMAP embeddings, we identified a new descriptor, local compactness asymmetry γRg, which captures sequence-dependent structural features not reflected in conventional global conformational metrics such as radius of gyration and end-to-end distance. This descriptor provides complementary insight by quantifying asymmetric chain compaction within IDP ensembles, thereby uncovering structural variations that are not apparent from global shape alone. Moreover, we demonstrated that γRg correlates with sequence features, including properties and residue hydrophobicities, highlighting the influence of sequence features on local conformational organization. These results establish a framework for integrating atomistic simulations with dimensionality reduction to investigate sequence-encoded structural diversity in disordered proteins.

Beyond its immediate use in revealing sequence-dependent conformational features, the dataset generated in this study provides a valuable resource for the broader community. The large collection of atomistic simulations across diverse IDP sequences offers a high-resolution benchmark for investigating sequence-to-ensemble relationships. These data can be directly leveraged to train or evaluate machine learning models aimed at predicting structural ensembles from sequence, particularly those employing deep generative frameworks.53,5861,93 Furthermore, the conformational ensembles serve as a rich training set for parameterizing and validating coarse-grained force fields, enabling the development of transferable models that more accurately capture IDP behavior across different sequence and environmental contexts.33 However, we note that, despite our best efforts, statistical variations can arise within the IDP conformational space, as indicated by both the comparison between the first and second halves of the simulations (Figures S18 and S19) and the block-averaged UMAP distributions (Figures S20 and S21). Future model development should account for these variations inherent to the dataset due to the finite sampling and statistical convergence of the simulations.

Methods

Explicit solvent atomistic simulations

We performed explicit-solvent atomistic simulations for each individual IDP as follows. Initial structures were generated using ColabFold,79 and the system setup, including coordinate, topology, and force field files, was carried out using GROMACS.94 Each IDP was solvated in a dodecahedral water box with a box length of 8.0 nm, and NaCl was added to a final concentration of 0.15 M. Simulations employed the a99SB-disp force field for proteins, along with its corresponding water model.81

Molecular dynamics simulations were executed using the OpenMM package.95 After energy minimization, production simulations were run in the constant pressure, and temperature (NPT) ensemble at 300 K and 1 bar for 10.5 µs. We applied the hydrogen mass repartitioning scheme96 to enable a 4 fs integration timestep and used the Langevin middle integrator97 for temperature control. Configurations were saved every 100 ps, yielding 105,000 frames per trajectory for each of the 47 simulated IDPs.

To verify the adequacy of the simulation box, we monitored the minimum image distances between atoms across periodic boundaries; no artifacts from inter-box interactions were observed (Figure S22).

Self avoiding worm like chain (SAWLC) model

The SAWLC model was used to generate a reference ensemble for comparison with ensembles obtained from atomistic simulations. In this model, the number of beads and bond length were set to 40 and 0.38 nm, respectively. The model incorporated a persistence length of 0.4 nm and a linker diameter of 0.38 nm. SAWLC ensembles were generated using the PolymerCpp Python package,98 with a total of 105,000 samples produced.

Conformational ensemble analysis

We applied several metrics to characterize IDP conformations: the radius of gyration (Rg), the end-to-end distance (Ree), the asphericity, the radius-of-gyration symmetry ratio γRg, and the Flory–Huggins scaling exponent. The values of Rg, Ree, γRg and the Flory–Huggins scaling exponent were obtained using the Python package MDTraj,99 while the asphericity was calculated using the Python package SOURSOP.100

UMAP analysis

We applied Uniform Manifold Approximation and Projection (UMAP)75 to perform dimensionality reduction across all 47 IDP ensembles, along with a reference ensemble generated from the SAWLC polymer model, enabling direct comparisons across systems. To reduce computational cost, we subsampled each trajectory by retaining every 10th frame, resulting in 10,500 conformations per ensemble.

Two types of structural descriptors were used as input to the UMAP algorithm: (1) non-redundant pairwise distances between all α-carbon atoms in each conformation, and (2) root-mean-square deviation (RMSD) of each conformation relative to a common set of reference structures. Reference conformations were uniformly selected every 2,000 frames across all ensembles to ensure a consistent comparison basis.

Unless stated otherwise, the pairwise distance descriptor was used for all figures in the main text. UMAP parameters were set with n_neighbors = 200 and min_dist = 0.99. All other settings were kept at their default values.

Supplementary Material

supplementary material

Acknowledgement

This work was supported by the National Institutes of Health (Grant R35GM133580).

Footnotes

Competing interests: The authors declare that they have no competing interests.

Data and code availability:

MD simulation trajectories of IDPs and the structural ensemble generated using the SAWLC model are available at Zenodo. The analysis code is available at GitHub.

References

  • (1).Dunker AK; Romero P; Obradovic Z; Garner EC; Brown CJ Intrinsic protein disorder in complete genomes. Genome informatics 2000, 11, 161–171. [PubMed] [Google Scholar]
  • (2).Ward JJ; Sodhi JS; McGuffin LJ; Buxton BF; Jones DT Prediction and functional analysis of native disorder in proteins from the three kingdoms of life. Journal of molecular biology 2004, 337, 635–645. [DOI] [PubMed] [Google Scholar]
  • (3).Pancsa R; Tompa P Structural disorder in eukaryotes. PloS one 2012, 7, e34687. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (4).Latham AP; Zhang B Molecular Determinants for the Layering and Coarsening of Biological Condensates. Aggregate 2022, 3, 1–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (5).Xie H; Vucetic S; Iakoucheva LM; Oldfield CJ; Dunker AK; Uversky VN; Obradovic Z Functional anthology of intrinsic disorder. 1. Biological processes and functions of proteins with long disordered regions. Journal of proteome research 2007, 6, 1882–1898. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (6).Wright PE; Dyson HJ Intrinsically disordered proteins in cellular signalling and regulation. Nature reviews Molecular cell biology 2015, 16, 18–29. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (7).Bah A; Forman-Kay JD Modulation of intrinsically disordered protein function by post-translational modifications. Journal of Biological Chemistry 2016, 291, 6696–6705. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (8).Tantos A; Han K-H; Tompa P Intrinsic disorder in cell signaling and gene transcription. Molecular and cellular endocrinology 2012, 348, 457–465. [DOI] [PubMed] [Google Scholar]
  • (9).Vavouri T; Semple JI; Garcia-Verdugo R; Lehner B Intrinsic protein disorder and interaction promiscuity are widely associated with dosage sensitivity. Cell 2009, 138, 198–208. [DOI] [PubMed] [Google Scholar]
  • (10).Babu MM The contribution of intrinsically disordered regions to protein function, cellular complexity, and human disease. Biochemical Society Transactions 2016, 44, 1185–1200. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (11).Elbaum-Garfinkle S Matter over mind: Liquid phase separation and neurodegeneration. Journal of Biological Chemistry 2019, 294, 7160–7168. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (12).Uversky VN Intrinsic disorder, protein–protein interactions, and disease. Advances in protein chemistry and structural biology 2018, 110, 85–121. [DOI] [PubMed] [Google Scholar]
  • (13).Uversky VN; Davé V; Iakoucheva LM; Malaney P; Metallo SJ; Pathak RR; Joerger AC Pathological unfoldomics of uncontrolled chaos: intrinsically disordered proteins and human diseases. Chemical reviews 2014, 114, 6844–6879. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (14).Zhang Y; Zheng J; Zhang B Protein Language Model Identifies Disordered, Conserved Motifs Driving Phase Separation. bioRxiv 2024, 2024–12. [DOI] [PMC free article] [PubMed]
  • (15).Uversky VN Intrinsically disordered proteins from A to Z. The international journal of biochemistry & cell biology 2011, 43, 1090–1103. [DOI] [PubMed] [Google Scholar]
  • (16).Oldfield CJ; Dunker AK Intrinsically disordered proteins and intrinsically disordered protein regions. Annual review of biochemistry 2014, 83, 553–584. [DOI] [PubMed] [Google Scholar]
  • (17).Hansen JC; Lu X; Ross ED; Woody RW Intrinsic protein disorder, amino acid composition, and histone terminal domains. Journal of Biological Chemistry 2006, 281, 1853–1856. [DOI] [PubMed] [Google Scholar]
  • (18).Biesaga M; Frigolé-Vivas M; Salvatella X Intrinsically disordered proteins and biomolecular condensates as drug targets. Current Opinion in Chemical Biology 2021, 62, 90–100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (19).Joshi P; Vendruscolo M Druggability of intrinsically disordered proteins. Intrinsically disordered proteins studied by NMR spectroscopy 2015, 383–400. [DOI] [PubMed]
  • (20).Kosol S; Contreras-Martos S; Cedeño C; Tompa P Structural characterization of intrinsically disordered proteins by NMR spectroscopy. Molecules 2013, 18, 10802–10828. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (21).Jensen MR; Zweckstetter M; Huang J.-r.; Blackledge M Exploring free-energy landscapes of intrinsically disordered proteins at atomic resolution using NMR spectroscopy. Chemical reviews 2014, 114, 6632–6660. [DOI] [PubMed] [Google Scholar]
  • (22).Gibbs EB; Cook EC; Showalter SA Application of NMR to studies of intrinsically disordered proteins. Archives of biochemistry and biophysics 2017, 628, 57–70. [DOI] [PubMed] [Google Scholar]
  • (23).Kikhney AG; Svergun DI A practical guide to small angle X-ray scattering (SAXS) of flexible and intrinsically disordered proteins. FEBS letters 2015, 589, 2570–2577. [DOI] [PubMed] [Google Scholar]
  • (24).Sibille N; Bernado P Structural characterization of intrinsically disordered proteins by the combined use of NMR and SAXS. Biochemical society transactions 2012, 40, 955–962. [DOI] [PubMed] [Google Scholar]
  • (25).Schuler B; Soranno A; Hofmann H; Nettels D Single-molecule FRET spectroscopy and the polymer physics of unfolded and intrinsically disordered proteins. Annual Review of Biophysics 2016, 45, 207–231. [DOI] [PubMed] [Google Scholar]
  • (26).Regmi R; Srinivasan S; Latham AP; Kukshal V; Cui W; Zhang B; Bose R; Schlau-Cohen GS Phosphorylation-Dependent Conformations of the Disordered Carboxyl-Terminus Domain in the Epidermal Growth Factor Receptor. The Journal of Physical Chemistry Letters 2020, 11, 10037–10044. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (27).LeBlanc SJ; Kulkarni P; Weninger KR Single molecule FRET: A powerful tool to study intrinsically disordered proteins. Biomolecules 2018, 8, 140. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (28).Tozzini V Coarse-grained models for proteins. Current opinion in structural biology 2005, 15, 144–150. [DOI] [PubMed] [Google Scholar]
  • (29).Riniker S; Allison JR; van Gunsteren WF On developing coarse-grained models for biomolecular simulation: a review. Physical Chemistry Chemical Physics 2012, 14, 12423–12430. [DOI] [PubMed] [Google Scholar]
  • (30).Dignon GL; Zheng W; Mittal J Simulation methods for liquid–liquid phase separation of disordered proteins. Current opinion in chemical engineering 2019, 23, 92–98. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (31).Liu S; Wang C; Latham AP; Ding X; Zhang B OpenABC enables flexible, simplified, and efficient GPU accelerated simulations of biomolecular condensates. PLoS Computational Biology 2023, 19, e1011442. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (32).Dhamankar S; Webb MA Chemically specific coarse-graining of polymers: methods and prospects. Journal of Polymer Science 2021, 59, 2613–2643. [Google Scholar]
  • (33).Liu S; Wang C; Zhang B Toward Predictive Coarse-Grained Simulations of Biomolecular Condensates. Biochemistry 2025, 64, 1750–1761. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (34).Latham AP; Zhang B On the Stability and Layered Organization of Protein-DNA Condensates. Biophysical Journal 2022, 121, 1727–1737. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (35).Vitalis A; Pappu RV ABSINTH: a new continuum solvation model for simulations of polypeptides in aqueous solutions. Journal of computational chemistry 2009, 30, 673–699. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (36).Dignon GL; Zheng W; Best RB; Kim YC; Mittal J Relation between singlemolecule properties and phase behavior of intrinsically disordered proteins. Proceedings of the National Academy of Sciences 2018, 115, 9929–9934. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (37).Wu H; Wolynes PG; Papoian GA AWSEM-IDP: a coarse-grained force field for intrinsically disordered proteins. The Journal of Physical Chemistry B 2018, 122, 11115–11125. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (38).Joseph JA; Reinhardt A; Aguirre A; Chew PY; Russell KO; Espinosa JR; Garaizar A; Collepardo-Guevara R Physics-driven coarse-grained model for biomolecular phase separation with near-quantitative accuracy. Nature Computational Science 2021, 1, 732–743. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (39).Latham AP; Zhang B Consistent force field captures homologue-resolved hp1 phase separation. Journal of chemical theory and computation 2021, 17, 3134–3144. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (40).Regy RM; Thompson J; Kim YC; Mittal J Improved coarse-grained model for studying sequence dependent phase separation of disordered proteins. Protein Science 2021, 30, 1371–1379. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (41).Dannenhoffer-Lafage T; Best RB A data-driven hydrophobicity scale for predicting liquid–liquid phase separation of proteins. The Journal of Physical Chemistry B 2021, 125, 4046–4056. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (42).Tesei G; Schulze TK; Crehuet R; Lindorff-Larsen K Accurate model of liquid–liquid phase behavior of intrinsically disordered proteins from optimization of single-chain properties. Proceedings of the National Academy of Sciences 2021, 118, e2111696118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (43).Zhang Y; Liu X; Chen J Toward accurate coarse-grained simulations of disordered proteins and their dynamic interactions. Journal of chemical information and modeling 2022, 62, 4523–4536. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (44).Zhang Y; Li S; Gong X; Chen J Toward accurate simulation of coupling between protein secondary structure and phase separation. Journal of the American Chemical Society 2023, 146, 342–357. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (45).Thomasen FE; Pesce F; Roesgaard MA; Tesei G; Lindorff-Larsen K Improving Martini 3 for disordered and multidomain proteins. Journal of chemical theory and computation 2022, 18, 2033–2041. [DOI] [PubMed] [Google Scholar]
  • (46).Tesei G; Lindorff-Larsen K Improved predictions of phase behaviour of intrinsically disordered proteins by tuning the interaction range. Open Research Europe 2023, 2, 94. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (47).Cao F; von Bülow S; Tesei G; Lindorff-Larsen K A coarse-grained model for disordered and multi-domain proteins. Protein Science 2024, 33, e5172. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (48).Souza PC et al. Martini 3: A General Purpose Force Field for Coarse-Grained Molecular Dynamics. Nature Methods 2021, 18, 382–388. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (49).Ding X; Zhang B Contrastive Learning of Coarse-Grained Force Fields. Journal of Chemical Theory and Computation 2022, 18, 6334–6344. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (50).Ding X Optimizing force fields with experimental data using ensemble reweighting and potential contrasting. The Journal of Physical Chemistry B 2024, 128, 6760–6769. [DOI] [PubMed] [Google Scholar]
  • (51).Tejedor AR; Aguirre Gonzalez A; Maristany MJ; Chew PY; Russell K; Ramirez J; Espinosa JR; Collepardo-Guevara R Chemically Informed Coarse-Graining of Electrostatic Forces in Charge-Rich Biomolecular Condensates. ACS Central Science 2025, 11, 302–321. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (52).Airas J; Ding X; Zhang B Transferable Implicit Solvation via Contrastive Learning of Graph Neural Networks. ACS Cent. Sci 2023, 9, 2286–2297. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (53).Lotthammer JM; Ginell GM; Griffith D; Emenecker R; Holehouse AS Direct prediction of intrinsically disordered protein conformational properties from sequence. Biophysical Journal 2024, 123, 43a. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (54).Chakraborty D; Mondal B; Thirumalai D Brewing coffee: A sequence-specific coarse-grained energy function for simulations of DNA- protein complexes. Journal of Chemical Theory and Computation 2024, 20, 1398–1413. [DOI] [PubMed] [Google Scholar]
  • (55).Mugnai ML; Chakraborty D; Nguyen HT; Maksudov F; Kumar A; Zeno W; Stachowiak JC; Straub JE; Thirumalai D Sizes, conformational fluctuations, and SAXS profiles for intrinsically disordered proteins. Protein Science 2025, 34, e70067. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (56).Jumper J; Evans R; Pritzel A; Green T; Figurnov M; Ronneberger O; Tunyasuvunakool K; Bates R; Žídek A; Potapenko A; others Highly accurate protein structure prediction with AlphaFold. nature 2021, 596, 583–589. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (57).Baek M; DiMaio F; Anishchenko I; Dauparas J; Ovchinnikov S; Lee GR; Wang J; Cong Q; Kinch LN; Schaeffer RD; others Accurate prediction of protein structures and interactions using a three-track neural network. Science 2021, 373, 871–876. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (58).Janson G; Feig M Transferable deep generative modeling of intrinsically disordered protein conformations. PLOS Computational Biology 2024, 20, e1012144. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (59).Janson G; Valdes-Garcia G; Heo L; Feig M Direct generation of protein conformational ensembles via machine learning. Nature Communications 2023, 14, 774. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (60).Zhu J; Li Z; Zheng Z; Zhang B; Zhong B; Bai J; Hong X; Wang T; Wei T; Yang J; others Precise generation of conformational ensembles for intrinsically disordered proteins via fine-tuned diffusion models. bioRxiv 2024, 2024–05. [DOI] [PMC free article] [PubMed]
  • (61).Novak B; Lotthammer JM; Emenecker RJ; Holehouse AS Accurate predictions of conformational ensembles of disordered proteins with STARLING. bioRxiv 2025, 2025–02.
  • (62).Mao AH; Crick SL; Vitalis A; Chicoine CL; Pappu RV Net charge per residue modulates conformational ensembles of intrinsically disordered proteins. Proceedings of the National Academy of Sciences 2010, 107, 8183–8188. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (63).Das RK; Pappu RV Conformations of intrinsically disordered proteins are influenced by linear sequence distributions of oppositely charged residues. Proceedings of the National Academy of Sciences 2013, 110, 13392–13397. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (64).Müller-Späth S; Soranno A; Hirschfeld V; Hofmann H; Rüegger S; Reymond L; Nettels D; Schuler B Charge interactions can dominate the dimensions of intrinsically disordered proteins. Proceedings of the National Academy of Sciences 2010, 107, 14609–14614. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (65).Das RK; Ruff KM; Pappu RV Relating sequence encoded information to form and function of intrinsically disordered proteins. Current opinion in structural biology 2015, 32, 102–112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (66).Zheng W; Dignon G; Brown M; Kim YC; Mittal J Hydropathy patterning complements charge patterning to describe conformational preferences of disordered proteins. The journal of physical chemistry letters 2020, 11, 3408–3415. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (67).Holla A; Martin EW; Dannenhoffer-Lafage T; Ruff KM; König SL; Nüesch MF; Chowdhury A; Louis JM; Soranno A; Nettels D; others Identifying Sequence Effects on Chain Dimensions of Disordered Proteins by Integrating Experiments and Simulations. JACS Au 2024, 4, 4729–4743. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (68).Baul U; Chakraborty D; Mugnai ML; Straub JE; Thirumalai D Sequence effects on size, shape, and structural heterogeneity in intrinsically disordered proteins. The Journal of Physical Chemistry B 2019, 123, 3462–3474. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (69).Zheng W; Dignon GL; Jovic N; Xu X; Regy RM; Fawzi NL; Kim YC; Best RB; Mittal J Molecular details of protein condensates probed by microsecond long atomistic simulations. The Journal of Physical Chemistry B 2020, 124, 11671–11679. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (70).Galvanetto N; Ivanović MT; Chowdhury A; Sottini A; Nüesch MF; Nettels D; Best RB; Schuler B Extreme dynamics in a biomolecular condensate. Nature 2023, 619, 876–883. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (71).Rekhi S; Garcia CG; Barai M; Rizuan A; Schuster BS; Kiick KL; Mittal J Expanding the molecular language of protein liquid–liquid phase separation. Nature Chemistry 2024, 16, 1113–1124. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (72).Wang C; Kilgore HR; Latham AP; Zhang B Nonspecific Yet Selective Interactions Contribute to Small Molecule Condensate Binding. J. Chem. Theory Comput 2024, 20, 10247–10258. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (73).Latham AP; Zhu L; Sharon DA; Ye S; Willard AP; Zhang X; Zhang B Microphase Separation Produces Interfacial Environment within Diblock Biomolecular Condensates. eLife 2024, 12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (74).Zhou L; Zhu L; Wang C; Xu T; Wang J; Zhang B; Zhang X; Wang H Multiphasic Condensates Formed with Mono-Component of Tetrapeptides via Phase Separation. Nat Commun 2025, 16, 2706. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (75).McInnes L; Healy J; Melville J Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426 2018,
  • (76).Van Der Lee R et al. Classification of Intrinsically Disordered Regions and Proteins. Chemical Reviews 2014, 114, 6589–6631. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (77).Abyzov A; Blackledge M; Zweckstetter M Conformational Dynamics of Intrinsically Disordered Proteins Regulate Biomolecular Condensate Chemistry. Chemical Reviews 2022, 122, 6719–6748. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (78).Zarin T; Strome B; Nguyen Ba AN; Alberti S; Forman-Kay JD; Moses AM Proteome-wide signatures of function in highly diverged intrinsically disordered regions. Elife 2019, 8, e46883. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (79).Mirdita M; Schütze K; Moriwaki Y; Heo L; Ovchinnikov S; Steinegger M ColabFold: making protein folding accessible to all. Nature methods 2022, 19, 679–682. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (80).Wilson CJ; Choy W-Y; Karttunen M AlphaFold2: a role for disordered protein/region prediction? International Journal of Molecular Sciences 2022, 23, 4591. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (81).Robustelli P; Piana S; Shaw DE Developing a molecular dynamics force field for both folded and disordered protein states. Proceedings of the National Academy of Sciences 2018, 115, E4758–E4766. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (82).Chen J; Liu X; Chen J Targeting intrinsically disordered proteins through dynamic interactions. Biomolecules 2020, 10, 743. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (83).Liu X; Chen J Residual structures and transient long-range interactions of p53 transactivation domain: Assessment of explicit solvent protein force fields. Journal of chemical theory and computation 2019, 15, 4708–4720. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (84).Mukherjee S; Schäfer LV Thermodynamic forces from protein and water govern condensate formation of an intrinsically disordered protein domain. Nature Communications 2023, 14, 5892. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (85).Kyte J; Doolittle RF A simple method for displaying the hydropathic character of a protein. Journal of molecular biology 1982, 157, 105–132. [DOI] [PubMed] [Google Scholar]
  • (86).Uversky VN; Gillespie JR; Fink AL Why are “natively unfolded” proteins unstructured under physiologic conditions? Proteins: structure, function, and bioinformatics 2000, 41, 415–427. [DOI] [PubMed] [Google Scholar]
  • (87).Le Guillou J; Zinn-Justin J Critical exponents for the n-vector model in three dimensions from field theory. Physical Review Letters 1977, 39, 95. [Google Scholar]
  • (88).Gates ZP; Baxa MC; Yu W; Riback JA; Li H; Roux B; Kent SB; Sosnick TR Perplexing cooperative folding and stability of a low-sequence complexity, polyproline 2 protein lacking a hydrophobic core. Proceedings of the National Academy of Sciences 2017, 114, 2241–2246. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (89).Shrestha UR; Smith JC; Petridis L Full structural ensembles of intrinsically disordered proteins from unbiased molecular dynamics simulations. Communications biology 2021, 4, 243. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (90).Hofmann H; Soranno A; Borgia A; Gast K; Nettels D; Schuler B Polymer scaling laws of unfolded and intrinsically disordered proteins quantified with single-molecule spectroscopy. Proceedings of the National Academy of Sciences 2012, 109, 16155–16160. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (91).Burke KA; Janke AM; Rhine CL; Fawzi NL Residue-by-residue view of in vitro FUS granules that bind the C-terminal domain of RNA polymerase II. Molecular cell 2015, 60, 231–241. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (92).Grosberg A Statistical Physics of Macromolecules; AIP Press: New York, 1994. [Google Scholar]
  • (93).Tesei G; Trolle AI; Jonsson N; Betz J; Knudsen FE; Pesce F; Johansson KE; Lindorff-Larsen K Conformational ensembles of the human intrinsically disordered proteome. Nature 2024, 626, 897–904. [DOI] [PubMed] [Google Scholar]
  • (94).Abraham MJ; Murtola T; Schulz R; Páll S; Smith JC; Hess B; Lindahl E GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers. SoftwareX 2015, 1, 19–25. [Google Scholar]
  • (95).Eastman P; Swails J; Chodera JD; McGibbon RT; Zhao Y; Beauchamp KA; Wang L-P; Simmonett AC; Harrigan MP; Stern CD; others OpenMM 7: Rapid development of high performance algorithms for molecular dynamics. PLoS computational biology 2017, 13, e1005659. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (96).Hopkins CW; Le Grand S; Walker RC; Roitberg AE Long-time-step molecular dynamics through hydrogen mass repartitioning. Journal of chemical theory and computation 2015, 11, 1864–1874. [DOI] [PubMed] [Google Scholar]
  • (97).Zhang Z; Liu X; Yan K; Tuckerman ME; Liu J Unified efficient thermostat scheme for the canonical ensemble with holonomic or isokinetic constraints via molecular dynamics. The Journal of Physical Chemistry A 2019, 123, 6056–6079. [DOI] [PubMed] [Google Scholar]
  • (98).Stefko M; Douglass K; Manley S PolymerCpp (0.1.3). 2020; 10.5281/zenodo.3928659. [DOI] [Google Scholar]
  • (99).McGibbon RT; Beauchamp KA; Harrigan MP; Klein C; Swails JM; Hernández CX; Schwantes CR; Wang L-P; Lane TJ; Pande VS MDTraj: a modern open library for the analysis of molecular dynamics trajectories. Biophysical journal 2015, 109, 1528–1532. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (100).Lalmansingh JM; Keeley AT; Ruff KM; Pappu RV; Holehouse AS SOURSOP: A Python package for the analysis of simulations of intrinsically disordered proteins. Journal of Chemical Theory and Computation 2023, 19, 5609–5620. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

supplementary material

Data Availability Statement

MD simulation trajectories of IDPs and the structural ensemble generated using the SAWLC model are available at Zenodo. The analysis code is available at GitHub.

RESOURCES