Abstract
Over a third of residues in the canonical human proteome are predicted to fall within intrinsically disordered protein regions (IDRs), which do not adopt stable folded structures. These IDRs play critical roles in biological regulation and organization, including as targets for post-translational modifications, scaffolds and mediators of biomolecular condensates. To address the pressing need for valid structural models providing biological relevance and enabling functional insight, we developed the AlphaFlex workflow, using IDPConformerGenerator or IDPForge to calculate fully atomistic conformer ensembles for proteins predicted to have disordered regions, modeled in the context of highly confident folded domains from AlphaFold2. We illustrate our approach by generating conformational ensembles of the human proteins in the AlphaFold2 database, with completed AlphaFlex models deposited in the Protein Ensemble Database that is mirrored in UniProt. This transformative resource of AlphaFlex ensembles provides more realistic and biologically relevant full-length protein models for proteins with IDRs, which we illustrate for scaffold proteins with folded domains connected by IDRs, those with IDRs that interact with folded domains, regulatory and condensate proteins requiring exposed binding elements, and a conditionally folding IDR.
INTRODUCTION
Structure has driven functional insight into biology ever since the elucidation of protein α-helical structure(1) in 1951, followed by years of experimental structure determinations, creating the necessary data for the AlphaFold breakthrough for predicting folded protein structure(2). Today, the AlphaFold approach to predict protein structures has had a dramatic impact on the biological community, advancing the widely accepted protein structure-function paradigm central to the folded state(2). However, it is increasingly appreciated that all proteomes also encode intrinsically disordered proteins and regions (IDPs/IDRs, referred to here as IDRs), which do not adopt a well-defined tertiary structure, but instead populate fluctuating and heterogeneous structural ensembles(3–6). Their highly dynamic nature facilitates diverse biological functions (e.g., transcription, translation, signaling) via a variety of mechanisms including direct protein and nucleic acid binding(7, 8), scaffolding cellular structures and protein complexes(5, 9, 10), mediating the formation of biomolecular condensates(11), exerting pressure through excluded volume(12), and creating appropriate spacing between folded domains(13, 14). For roughly a third of the residues in the canonical human proteome that are located in IDRs(15), the powerful AlphaFold2 tool can only provide low confidence predictions(2) that represent IDRs as a single conformational state rather than a conformational ensemble. In general, the single conformations AlphaFold2 provides for IDRs often appear as featureless extended curves surrounding clustered folded domains which obscures binding motifs. A further aspect of this is that AlphaFold2 incorrectly predicts confident structures for conditionally folding IDRs (i.e., those that fold upon binding or post-translational modification), estimated to involve about 15% of human IDR residues(16).
The poor modeling of IDRs and hence also the orientations/proximities of folded domains connected by IDRs presents a major issue, since understanding IDR biology is hampered by the lack of conformational ensembles that effectively represent the heterogenous sampling so critical for their function. Having full-length structural ensembles that correctly represent IDRs in the context of folded domains is crucial for understanding protein structure-function relationships and providing biologically relevant models. For example, regulatory IDRs can block catalytic, allosteric or binding sites, with the block “removed” by changes in post-translational modifications or binding to other partners(17). Conformational ensembles that show the extent of conformational heterogeneity, including multiple potential interactions or lack thereof, provide insights into such regulatory mechanisms. Another example is for conditional folding, where having structural models of the disordered and the folded states is key for understanding how the condition impacts downstream events(16). An important example is the case of scaffolds and proteins involved in biomolecular condensates. These most often contain multiple folded domains that bind proteins and/or nucleic acids along with IDRs that function as linkers, low-complexity phase-separating regions, and/or modulators of chain solvation (8, 18). The long “reach” of extended IDRs is essential for facilitating binding to various targets, which may be geometrically constrained, and the conformational heterogeneity is key for exchanging interactions that underlie condensate formation. An example of the need for chain extension required for function is found with the N-terminal IDR (150 aa) of SMC4, a key component of the condensin complex required for condensed chromatin. This SMC4 IDR contains DNA-binding regions and is involved in forming biomolecular condensates that promote proper chromosome compaction, segregation, and genomic stability(7). The extension of the N-IDR of SMC4 is proposed to enhance the activity of the condensin complex by contacting distant DNA base-pairs.
Hence, beyond the AlphaFold2 database that offers full-length atomistic models focused on folded proteins, a corresponding database of all-atom protein ensembles that represent the IDRs within the context of folded domains is sorely needed. Here, we present the AlphaFlex scalable workflow that leverages AlphaFold’s strengths for predicting the folded domains, while overcoming its known deficiencies by generating all-atom IDR ensembles across the localized disordered regions. Of the 23,391 proteins from the canonical human proteome in the AlphaFold2 database(19), we identify 14,792 proteins (63%) as having an IDR of 15 or more consecutive amino acids using a union of multiple independent IDR predictions. We use two independent approaches, IDPConformerGenerator(20, 21) and IDPForge(22), to generate full-length protein ensembles of atomistic conformations including the IDRs and the folded domains, which are made easily accessible in the AlphaFlex database. At present there are 7,783 completed protein ensembles deposited in the community Protein Ensemble Database (PED)(3) (https://proteinensemble.org) that is mirrored in the UniProt database.
These completed AlphaFlex protein ensembles represent the full range of complexity, from terminal IDRs through to multiple IDR regions which separate both non-interacting and interacting folded domains, and a range of sequence lengths up to over a thousand amino acids, allowing us to consider their biological implications. Analysis of the AlphaFlex ensembles reveals that IDRs generated within the context of interacting folded domains have statistically different global structural properties compared to previous work that produced IDR ensembles in isolation(20, 23). AlphaFlex ensembles have global structural properties and Cα-Cα distances that are better aligned with the scaffold role of a subset of these proteins than the structures provided by AlphaFold2(2). Furthermore, AlphaFlex ensembles contain more fractional α-helical content which can correspond to biological functionality(24, 25), especially at longer sequence lengths, than published representations of IDRs generated by methods such as CALVADOS(23), and, as expected, than AlphaFold2 protein regions of low confidence(2). Additionally, the AlphaFlex structural ensembles allow for accessibility to protein binding sites and regions of post-translational modifications, sites that are more occluded by the AlphaFold2 models. The AlphaFlex ensembles provide a realistic representation of the entirety of proteins containing disorder within the context of the high confidence AlphaFold2-predicted folded domains, while providing meaningful interpretations regarding biological function that we illustrate for proteins whose IDRs regulate cell morphology(26), transcription(27) and translation(24), phase separation(28), and RNA stability(29). These ensembles are publicly available as a workflow tool and through deposition in the PED(3) as a transformative resource for the biological community, paralleling the previous impact of the AlphaFold2 database.
RESULTS
AlphaFlex workflow for generating atomistic ensembles of proteins with IDRs
A diagram representing the AlphaFlex workflow is given in Fig. 1. Stage 1 of the workflow assigns each residue in a protein sequence as part of an IDR or folded domain. The predicted local distance difference test (pLDDT) metric provided by AlphaFold has been used as an indicator of intrinsic disorder(16, 30). However, pLDDT is not intended to be a disorder predictor and it is incorrectly confident in predicting structure for cases of conditionally folding IDRs(16). Therefore, we define an IDR as 15 or more consecutive amino acids predicted to be disordered by a union of 5 metrics, including pLDDT < 70 and disorder prediction from 4 bona-fide predictors of disorder: metapredict(31), flDPnn(32), ADOPT(33), and SPOT-Disorder(34). Statistics for the number of IDRs from each metric can be found in fig. S1. A folded domain region is defined by residues that are not classified as disordered by the union of the 5 metrics.
Fig. 1. AlphaFlex Workflow.
(A) Stages of the AlphaFlex workflow. In stage 1, IDRs are defined as the union of 5 different predictors/indicators of disorder(2, 31–34). Stage 2 defines whether two confidently predicted folded domains or segments are interacting based on AlphaFold PAE (predicted alignment error). Calculations of atomistic ensembles are done in stage 3 using information from stages 1 and 2, with either IDPConformerGenerator(20, 21) or IDPForge(22). In stage 4, ensembles are processed and deposited into the Protein Ensemble Database(3). (B) AlphaFold2(2) structural prediction of Zinc finger protein 675 (AF-Q8TD23-F1-v6), colored by pLDDT score values(19), with yellow and orange representing low (50 < pLDDT < 70) and very low (pLDDT < 50) confidence, respectively, an indicator of disorder (pLDDT < 70), and light blue (70 < pLDDT < 90) and blue (90 < pLDDT) representing high and very confidence, respectively, for folded structure. (C) PAE matrix, with darker green indicating smaller error, from the AlphaFold2 database for AF-Q8TD23-F1-v6. (D) Left: AlphaFold2 prediction scaled and oriented to the same size and direction as its ensemble on the Right: 100 AFX-IDPCG conformations of Zinc finger protein 675 aligned to the C-terminal folded domain colored in grey (residues 231–568), with IDRs in orange and other folded elements from the AlphaFold2 template in blue.
For proteins with at least one IDR identified above, Stage 2 classifies the entire protein sequence into one of three categories of increasing structural complexity, which have implications for how conformational sampling for the IDR regions is accomplished in Stage 3. We define the following three categories of proteins containing IDRs as:
completely disordered IDPs or proteins with a folded domain and having only N- and/or C-terminal IDRs (often referred to as “tails”);
proteins with IDRs between non-interacting folded domains (often referred to as “linkers”), as well as potentially also having N- and C-terminal IDRs; and
IDRs within a single folded domain or separating two domains that are likely to form stable interactions (often referred to as “loops”), while also potentially having N- and C-terminal IDRs and/or IDRs between non-interacting folded domains.
Note that the words “tail”, “linker” and “loop” have sometimes been used to imply lack of function beyond non-interacting tethers or elements, but that we use them here only to delineate connectivity, fully appreciating the rich functional repertoires of all IDRs (35). Out of the 14,792 identified proteins in the canonical human proteome, 46% (6,751) are in category 1, 22% (3,275) are in category 2, and the final 32% (4,766) are in category 3.
To distinguish categories 2 and 3, we must consider whether the two folded segments connected by an IDR interact (e.g., two parts of a single domain separated by a loop or two tightly bound domains), or whether the folded domains are largely independent of each other (i.e., as “beads-on-a-string”). We use the mean predicted alignment error (PAE) from AlphaFold to discriminate between interacting (PAE ≤ 15 Å, category 3) and non-interacting (PAE > 15 Å, category 2) folded domains. Justification for this cutoff is provided by the solute carrier family 26 member 9 (SLC26A9) protein (fig. S2), which has a mean PAE between the folded element of just over 10 Å, a standard PAE cutoff value(36–38). However, an X-ray crystal structure (RCSB PDB ID 7CH1)(39) of the SLC26A9 domain aligns to the predicted full-length structure with an RMSD of 1.4 Å, suggesting that PAE values above 10 Å may be found for structurally oriented elements. We therefore took a more conservative approach by increasing the PAE cutoff from 10 to 15 Å. In the AlphaFlex workflow, for proteins that fall into category 3, we generate IDR ensembles while maintaining the AlphaFold relative orientations for interacting folded domains, whereas IDR ensemble generation for proteins in category 2 does not maintain the AlphaFold relative orientations for the folded domains.
Stage 3 of the workflow can utilize either IDPConformerGenerator(20, 21) or IDPForge(22) (fig. S3), both of which provide all-atom ensembles including hydrogen atom positions. IDPConformerGenerator statistically samples backbone torsion angles (ω, φ, ψ) based on distributions for similar sequences in the RCSB PDB(20). IDPForge(22) is a machine learning tool trained on IDPConformerGenerator(20, 21), CALVADOS(40), and AlphaFold configurations(2), and can generate atomistic conformations of proteins containing folded domains and IDRs. IDPForge ensembles have flexibility in the folded structures at the IDR boundaries(22), which may be more physically realistic than having fixed backbone coordinates at this boundary, as in the IDPConformerGenerator approach.
Due to the heavy computational requirements of generating conformations for IDRs between two interacting (fixed) folded domains with IDPConformerGenerator, IDPForge was used to generate category 3 proteins (203 total deposited AFX-IDPForge ensembles). IDPForge is roughly 3 times faster on this task than IDPConformerGenerator. Both IDPConformerGenerator and IDPForge ensembles have been validated to reflect experimental structural properties for IDRs(20, 22), hence we chose to provide the community with both sets of ensembles using AlphaFlex; the AlphaFlex ensembles generated with IDPConformerGenerator are referred to as AFX-IDPCG, while those generated with IDPForge are referred to as AFX-IDPForge. The resulting ensembles of 100 conformations per protein have been analyzed for global shape properties, intramolecular contacts, and local secondary structure elements.
As of the time of writing this work, over half of the AlphaFlex database has been calculated (a total of 7,783 unique proteins, with 6,755 AFX-IDPCG from category 1, 824 AFX-IDPCG from category 2, 203 AFX-IDPForge and 112 redundant and 1 unique AFX-IDPCG from category 3). These have reached stage 4 and are deposited into the PED(3). With the majority of the currently calculated AlphaFlex ensembles being AFX-IDPCG models and evidence for similarity of global structural metrics between the AFX-IDPCG and AFX-IDPForge ensembles (see below), much of the analysis below focuses on AFX-IDPCG models. Note that we use “AlphaFlex” to refer to the general computational workflow, the resulting ensemble models and the collective database.
Global metrics of ensembles of proteins with IDRs in absence or presence of folded domains
All previous proteome-wide conformational databases of IDRs were generated by defining an IDR region and then generating ensembles in isolation without fixed endpoints with the folded domains, such as that reported by CALVADOS(23), or pertain to low confidence single conformer predictions, as is the case for AlphaFold2. In contrast, IDPConformerGenerator and IDPForge generate all-atom IDR ensembles in the context of the folded domains(21) (Fig. 1D). This difference in approach for the AlphaFlex ensembles compared to CALVADOS and AlphaFold2 is evident when we compare global structural metrics of radius of gyration (Rg), end-to-end distance (Ree), hydrodynamic radius (Rh), solvent accessible surface area (SASA), asphericity (A), and mean curvature (κ).
Fig. 2 compares AlphaFlex and AlphaFold2 full-length IDR-containing protein ensembles as a function of sequence length. While the two sets have some overlapping distributions below sequence length of 1000, AlphaFold2’s hydrodynamic sizes plateau at much lower values (Fig 2A, B), with Ree being relatively flat, as a function of increasing sequence length (Fig 2B), suggesting that the conformers AlphaFold2 generates are anomalously collapsed. In contrast, AlphaFlex shows expected behavior of increasing Rg, Rh and Ree as the length increases (Fig 2 A–C). AlphaFold2 has a tendency towards spherical overall shape above a length of 750, while AlphaFlex has a broader distribution of asphericity, centered around more rod-like shapes (Fig 2D). SASA follows similar trends between the two sets (Fig 2E). In addition, we analyzed the curvature for IDRs extracted AlphaFold2 and AlphaFlex. We find that those from AlphaFold, particularly the longer ones, cluster at lower curvature (~ 1), reflecting more extended chains, while the curvature of extracted AlphaFlex IDRs have a more normal distribution centered around 2, reflecting the presence of turns in the chains (Fig. 2F).
Fig. 2. Global structural properties of full-length proteins from AlphaFold (red) compared to AlphaFlex (black) ensembles and extracted IDRs.
Lines for each red dot represent the mean standard error for the AFX-IDPCG ensembles (N = 100 conformers per protein, total of 7,783 deposited proteins analyzed). Units for A-C are in Angstroms (Å). AlphaFlex plotted in red and AlphaFold plotted in black. (A) Radius of gyration (Rg). (B) Hydrodynamic radius (Rh). (C) End-to-end distance (Ree). (D) Asphericity (A). (E) Solvent accessible surface area (SASA). (F) Mean curvature (κ) of extracted IDRs versus arc length.
Because CALVADOS generates IDRs in isolation, we also compared Rg, Rh, Ree, A and SASA metrics for isolated IDRs from AFX-IDPCG and CALVADOS. Since CALVADOS uses a different definition for IDRs compared to AlphaFlex, only those proteins with the exact same IDR boundaries defined by both CALVADOS and AlphaFlex were analyzed. A total of 1,387 IDR sequences of the human proteome were found to meet this criterion and the normalized Jensen-Shannon divergence metrics of the distribution of structural properties were calculated (table S1.1). IDPConformerGenerator and CALVADOS show no statistically significant differences in 4 of the 5 global structural properties for isolated IDR ensembles (fig. S4), with only small differences found for asphericity (also see table S1.1)
To understand these global shape differences better, we analyzed proteins in each of the 3 categories of proteins with IDRs. We find that IDRs derived from the full-length AlphaFlex ensembles versus IDRs generated in isolation by IDPConformerGenerator(20, 21) show no significant differences in all 5 global metrics for categories 1 and 2 (fig. S5.1 and S5.2 and table S1.2 and S1.3). However, there are global structural differences for the IDRs derived from full-length ensembles versus those generated in isolation for category 3 proteins. To provide a more straightforward analysis, we consider only the 112 proteins from category 3 proteins with a single IDR connecting two interacting folded domains (“loop” IDR). Fig. 3 (and fig. S6 and table S1.4) show that IDRs extracted from AlphaFlex ensembles have Ree and asphericity values representing statistically more compact and more spherical conformations than for IDRs built in isolation by IDPConformerGenerator. This is expected given that IDRs between two fixed folded domains predicted to interact with each other would constrain their relative distance and orientation, unlike the IDRs in categories 1 and 2. Comparison of structural property distributions of full-length redundant category 3 proteins between AFX-IDPCG and AFX-IDPForge show no significant differences (fig. S7, table S1.5 and S1.6).
Fig. 3. Distributions of ensemble structural properties (category 3, single loop IDR subset, N = 112 proteins) between IDRs generated by IDPConformerGenerator in isolation and IDRs extracted from AFX-IDPCG ensembles.
Blue distributions represent IDR ensembles generated in isolation by IDPConformerGenerator. Orange distributions represent IDRs extracted from the context of the folded domain from the AlphaFlex workflow. (A) Rg (Å) with 1 Å bin sizes. (B) Rh (Å) with 1 Å bin sizes. (C) Ree (Å) with 1 Å bin sizes. (D) Asphericity with bin sizes of 0.01 units. (E) SASA with 20 nm2 bin sizes. (F) Normalized Jensen-Shannon (JS) statistics of all structural properties. Black dashed line represents a 0.15 cutoff for significantly different properties. SHANK3 is excluded from these analyses since it contains terminal IDRs, IDRs between non-interacting domains and more than one loop IDR between interacting domains, such that it is out-of-distribution with respect to the single loop IDR-containing proteins that currently comprise category 3 proteins.
Local secondary structural features and domain contacts of proteins with IDRs
Many IDRs adopt transient α-helical structure, which is functionally relevant for binding in complexes involving IDRs(5). Disorder within IDRs is a measure of conformational heterogeneity, not the lack of fractional secondary structure features. Hence, predicted conformational ensembles should not violate the known presence of intramolecular hydrogen-bonds and torsion angle distributions present in the Ramachandran maps of all proteins. We expect the torsion angle sampling for IDRs to fall between distributions found in the full non-redundant high resolution database taken from the RCSB PDB and the torsional distributions of residues not within α-helix or β-strand (defined as loops, L+). We compared AlphaFlex IDRs with AlphaFold2 single-state IDRs and CALVADOS IDR ensembles generated in isolation(23) across the AlphaFold2 database (N = 14,792 proteins, 1 conformation per IDR), CALVADOS (NIDR = 28,058 isolated IDRs, 100 conformations per IDR), and AlphaFlex (N = 7,783 proteins, 100 conformations per IDR) ensembles. For the AlphaFold2 dataset, IDRs are defined as regions of pLDDT < 70. The entire CALVADOS representation of IDRs for the human proteome was down sampled from 1000 frames to 100 frames and then back-mapped to all-atom using cg2all(41) to obtain secondary-structure features. In contrast, IDPConformerGenerator and IDPForge used in the AlphaFlex workflow provide atomistic models directly.
AFX-IDPCG and AFX-IDPForge IDRs show distributions of the α and β regions of the Ramachandran diagram that define the upper and lower bound expected from proteins from the RCSB and RCSB-L+, reflecting unbiased sampling of these energetically favorably basins of torsion angle space in disordered regions(42). AlphaFold2 low pLDDT regions and CALVADOS IDRs have similar distributions, but show reductions in the α region compared to AlphaFlex IDRs, instead favoring the β and coil regions, as seen Table 1. The low level of α region torsion angles within AlphaFold2 low confidence regions has consequences for the chain curvature. As noted previously, longer AlphaFold2 IDRs cluster at lower curvature (~ 1), reflecting more extended chains, while the curvatures of AlphaFlex IDRs cluster around 2, due to the more prominent α region torsion angles (Fig. 2F).
Table 1. Torsion angle distributions of the AlphaFold, CALVADOS, and AlphaFlex databases.
Alpha (α) regions of the Ramachandran diagram are defined as −180° < φ < 10° and −120° < ψ < 45°. Beta (β) regions of the Ramachandran diagram are defined as −180° < φ < 0° and −180° < ψ < −120° or 45° < ψ < 180°. Coil regions are defined as φ, ψ torsion angles not belonging to α nor β regions. Torsion angle distributions of the entire IDPConformerGenerator database which include RCSB PDB X-Ray crystal structures with resolution better than or equal to 2.0 Å as of 2024 are given as a reference, including for all residues and those not within regular α-helical or β structure (L+). Protein regions with pLDDT < 70 from the AlphaFold database were analyzed (N = 14,792 structures). 100 conformers for each of the CALVADOS ensembles (N = 28,085 IDRs, back mapped to all-atom), AFX-IDPCG extracted IDRs (N = 10,921 extracted IDRs from the 7,783 deposited proteins), and AFX-IDPForge extracted IDRs (N = 203 ensembles) were analyzed. Values for the AlphaFlex and CALVADOS databases are given as the mean percentage ± standard deviation for all structural conformations.
| Ensemble | % α ± STD | % β ± STD | % coil ± STD |
|---|---|---|---|
|
| |||
| RCSB PDB Database | 50.1* | 43.5* | 6.4* |
| RCSB PDB Database L+ | 35.8* | 53.1* | 11.1* |
| AlphaFold | 17.6* | 58.9* | 23.5* |
| CALVADOS | 21.1 ± 1.0 | 57.9 ± 1.9 | 21.0 ± 1.5 |
| AFX-IDPCG | 46.5 ± 7.0 | 47.1 ± 6.7 | 6.4 ± 3.0 |
| AFX-IDPForge | 36.6 ± 5.3 | 44.9 ± 7.7 | 18.5 ± 4.7 |
Values with an * do not have an associated mean standard deviation since they are not ensembles.
We also calculated the percentages of DSSP(43)-defined secondary structure of IDRs from AlphaFlex, AlphaFold and CALVADOS (table S2). While the degree of stable hydrogen-bonded helix in IDRs is not precisely known, the experimental observation of fractional α-helix (44, 45) and biological importance of α-helical elements in IDR recognition elements (46, 47) suggests that it should be above 10%. We observe that extracted IDRs from the AlphaFlex ensembles have the highest percentage of DSSP-defined α-helices, with up to ~24% for IDRs containing up to 500 residues and 23% for IDRs with at least 1000 residues (table S2.3), consistent with IDRs having substantial fractional α-helical structure(44, 45). Although AlphaFold2 has ~10% DSSP-defined α-helix across all the pLDDT < 70 residues, this decreases to ~6% for IDRs with sequence lengths above 100, and minimal α-helical sampling (~1%) for IDRs with sequence length of 1000 amino acids (table S2.4). CALVADOS completely lacks hydrogen-bonded helix in its IDR ensembles, which is to be expected since coarse grain protein models represent residues as single beads which fail to coordinate hydrogen-bonding and electrostatic interactions required to form local secondary structures. This is also consistent with the fact that CALVADOS is primarily optimized for Rg but provides less good agreement for contacts from NOEs and PREs(22).
AlphaFlex ensembles also have different intramolecular contacts between IDRs and folded domains than AlphaFold2. The PAE metric can be used to quantify the likelihood of contacts between IDRs (D) and folded domains (F). The mean PAE values () reveal that approximately 6.7% (or 992) of the 14,792 AlphaFold2 proteins contain at least one ≤ 15 Å when IDRs and folded domains are separated by at least 5 amino acid residues. This supports the presence of intramolecular IDR-folded domain interactions, which, due to the dynamic nature of IDRs, would be expected to reflect a distribution of intramolecular Cα (IDR) to Cα (folded domain) distances. Figure 4 highlights two examples, the putative CENPB DNA-binding domain-containing protein 1 (UniProt ID B2RD01, Fig. 4A–C) and Antigen peptide transporter 1 (UniProt ID Q03518, Fig. 4D–F). This comparison shows AlphaFold2’s tendency to underrepresent IDR-folded domain contacts (Fig. 4A, 4D) with AlphaFlex exhibiting intramolecular IDR-folded domain contacts that vary from close to distant (Fig. 4B, 4E). The distribution of distances is biologically relevant, providing accessibility to folding domains and IDR segments containing post-translational modification or protein binding sites, while also providing potential regulatory contacts.
Fig. 4. Comparison of Cα-Cα distance matrices derived from AlphaFold and AlphaFlex representations.
First column depicts the Cα-Cα distance matrix from AlphaFold(2) predicted structure. The second column depicts the mean Cα-Cα distance matrix from N = 100 AlphaFlex conformers. Matrices have been normalized (with 1 representing the longest distance). The last column depicts the differences of normalized Cα-Cα distance matrices where AlphaFold is subtracted from mean AlphaFlex Cα-Cα distance matrices. (A-C) Category 1 Putative CENPB DNA-binding domain-containing protein 1 (UniProt ID B2RD01). Total protein length: 187 aa. IDR ranges: 1–21, 66–187. (D-F) Category 2 Antigen peptide transporter 1 (UniProt ID Q03518). Total protein length: 808 aa. IDR ranges: 1–114, 135–231.
These Cα-Cα distance matrices also reflect the different IDR boundaries between AlphaFold2 (based on pLDDT confidence values) and AlphaFlex, the latter having more residues defined as IDRs. The AlphaFlex ensembles for Putative CENPB DNA-binding domain-containing protein 1 (B2RD01) have N-terminal (residues 1–21) and C-terminal (residues 66–187) IDRs while AlphaFold2 predicts residues 84–140 to be folded. The AlphaFlex ensemble for Antigen peptide transporter 1 (Q03518) contains an N-terminal (residues 1–114) IDR as well as a linker IDR (residues 135–231) and ends with a C-terminal folded domain, while AlphaFold2 has three confidently predicted folded regions (residues 89–92, 115–134, and 232–808). Misrepresenting an IDR as a potential folded domain is not only a mistake in assignment but also under-represents the conformational heterogeneity of IDRs which have the potential to be closer to or further from the folded domain, as shown in Fig. 4C and 4F. In particular, IDRs between two folded domains that do not have tight interactions ( > 15 Å) lead to wide ranges of distances between folded domains (Fi and Fj).
Biological relevance of AlphaFlex IDR ensembles
The AlphaFlex ensembles across all three categories are structurally different than the AlphaFold structure, as well as the CALVADOS ensembles. This difference is not merely a physical manifestation of possible conformations but is critically relevant for biological function. Single-state structural representations from AlphaFold can inhibit understanding of biological mechanisms by their often-misleading implications, while ensemble representations, such as those provided by AlphaFlex, can illuminate regulatory roles of IDR-containing proteins, including scaffolding and other binding processes(10, 28). These are especially evident for proteins in categories 2 and 3 with IDRs between non-interacting and interacting folded domains. In the case of category 2 proteins, in which the IDRs are generated between non-interacting folded domains (in addition to any N- and C-terminal IDRs), the AlphaFlex ensembles demonstrate a distribution of Rg and Ree consistent with a “bead-on-a-string” picture, whereas AlphaFold2 provides a single compact conformation of the full-length structure with IDRs appearing as lower curvature chains that avoid steric clashes, often appearing as IDR “lassos” around central folded domains (Fig. 5 A–F). The distributions of Rg and Ree show that AlphaFlex ensembles have on average a broader distribution for compaction and extension. The difference in scale when viewing all structures of the ensemble at once is apparent in Fig. 5, where a single AlphaFold2 structure is about 1/5th the scale of the full ensemble sampled by AlphaFlex. This would have significant implications for the “reach” of the protein, with the AlphaFlex conformations enabling the IDR chain to bridge farther distances important for scaffolding.
Fig. 5. Comparisons between AlphaFold2 predictions and AlphaFlex ensemble representations and biological relevance for five example proteins.
(A-B) Zinc finger MYM-type protein 6 (UniProt ID O95789), 1325 aa long, 449 disordered residues. Ensembles are presented aligned to the C-terminal folded domain (residues 808–1325) colored in grey. (C-D) Histone deacetylase complex subunit SAP130 (UniProt ID Q9H0E3), 1048 aa long, 918 disordered residues. Ensembles are presented aligned to the C-terminal folded domain (residues 916–1033) colored in grey. (E-F) SH3 and multiple ankyrin repeat domains protein 3 (SHANK3, UniProt ID Q9BYB0), 1806 aa long, 1360 disordered residues. Ensembles are presented aligned to the N-terminal folded domain (residues 76–414) colored in grey. (G-H) CCR4-NOT transcription complex subunit 7 (CNOT7, UniProt ID Q9UIV1), 285 aa long, 25 disordered residues. Ensembles presented aligned to the N-terminal folded domain (residues 1–260). (I-J) Eukaryotic translation initiation factor 4E-binding protein 2 (4E-BP2, Uniprot ID Q13542), 120 aa long fully predicted to be disordered. The first column depicts AlphaFold representations of the full-length protein colored by pLDDT ranges (see Fig. 1 legend). The second column depicts AlphaFlex ensemble representations of the full-length protein where orange regions are defined to be disordered from the union of 5 disorder metrics with blue regions being the resulting defined folded domains with structures taken from AlphaFold. AlphaFold2 structures are also presented in the second column if a different scale is applied to better represent the scale of the conformational ensemble. The third column are boxplots of the Rg and Ree (Å) for the AlphaFlex ensemble with the single Rg and Ree values of the AlphaFold structure overlaid (red triangle). (K) AlphaFlex conformers (N = 10) of SHANK3 highlighting residues that undergo post-translational modifications (defined by UniProt) as green spheres. Topmost structure is the AlphaFold predicted model of SHANK3 outlined in a black box, colored by AlphaFold pLDDT (defined in Fig. 1 legend). Bottom 10 conformations are the same scale and orientation. IDRs are colored orange while folded domains are blue for the AlphaFlex conformers.
In addition, Fig. 5 demonstrates that the hydrodynamic properties (Rg and Ree) have clear functional consequences, with folded domains which have binding interactions required for function (blue folded domains in Fig. 5 A–F) and post-translational modification sites (Fig. 5, green residues) being accessible in the AlphaFlex ensembles but more occluded in the AlphaFold conformation. Zinc finger MYM-type protein 6 is a nuclear protein involved in regulating cell morphology(26), with 8 DNA- (and potentially RNA-) binding zinc fingers and many post-translational modifications (PTMs), including SUMOylation, ubiquitination and phosphorylation, providing binding sites for SUMO interacting and ubiquitin interacting domains(48). The AlphaFold2 model with the IDRs wrapped around the folded zinc finger domains does not provide accessibility for nucleic acid binding or the enzymes needed for PTMs, and does not have space for the SUMO and ubiquitin additions (also see fig. S8.1). In contrast, the AlphaFlex ensemble with its range of conformations provides accessibility and space for binding partners, enzymes and PTMs (Fig. 5A–B).
Another case is the histone deacetylase complex subunit SAP130 which functions in the assembly, regulatory interactions, and/or enzymatic activity of the mSin3A corepressor complex that plays a role in transcription(27). Similar to the Zinc finger MYM-type protein 6, SAP130 is highly post-translationally modified, including arginine methylation, phosphorylation, SUMOylation, and ubiquitination(48) (Fig. 5C–D and also fig. S8.2). The AlphaFold2 model with very long IDRs wrapped around a small, folded domain does not provide accessibility to the folded element that engages in regulatory interactions within the mSin3A complex or for the enzymes needed for PTMs and subsequent regulatory binding for SAP130, in contrast to the AlphaFlex ensemble that has a range of accessibilities.
An important example of a scaffold protein is the SH3 and multiple ankyrin repeat domains protein 3 (SHANK3, Fig. 5E–F, K), which phase separates and mediates assembly of the postsynaptic density (PSD), a biomolecular condensate on the postsynaptic side of synapses that controls biological responses to synaptic stimulation, including local actin cytoskeletal dynamics, in neurons(28). SHANK3 has multiple folded binding domains (ankyrin repeats, SH3, PDZ, and SAM) and long IDRs that are required for multivalent interactions leading to phase separation and scaffolding functions. In addition, it has many PTM sites(48), highlighted in Fig. 5K. As in the previous examples, the AlphaFold2 representation puts all the folded binding domains in the center of the long IDRs, which wrap around in a spherical shape, incompatible with the phase separation and scaffold function of the protein and accessibility of enzymes to the PTM sites. The AlphaFlex model, in contrast, demonstrates the long reach of the individual binding domains in this primarily “beads-on-a-string” scaffold.
Other examples highlighting the biological relevance of AlphaFlex ensembles relative to AlphaFold2 structures include enzymes and conditionally folding IDRs. CNOT7 is a subunit in the major cellular polyA-RNA deadenylase CCR4-NOT complex that regulates RNA stability(49), with a low confidence α-helix predicted by AlphaFold2 to lie in the active site, which would block enzymatic function, while the AlphaFlex conformational ensemble provides heterogeneous conformations, most with the active site open, consistent with enzymatic activity, with some closer to the active site that could be important for regulating activity (Fig. 5G–H). The IDP 4E-BP2 has a conditionally folding IDR. In isolation, NMR data shows a fractionally populated α-helix stabilized by binding to eIF4E (the mRNA 5’ “cap-binding protein”)(50), which inhibits mRNA translation by blocking eIF4E binding to eIF4G in the translation initiation complex(51). Translation is stimulated by multi-site phosphorylation of 4E-BP2(52), leading to conditional folding of a β-sheet domain that includes the α-helical residues, reducing its binding to eIF4E and enabling eIF4G to compete(24). The AlphaFold2 4E-BP2 conformation has the β-sheet domain, confusing the mechanism of phosphoregulation of translation by presenting a folded conformation that lacks accessibility to the key phosphoregulatory sites and that cannot bind to eIF4E. In contrast, the AlphaFlex ensemble (Fig. 5I–J) represents 4E-BP2 as an IDP with up to 50% fractional α-helical structure for the eIF4E-binding α-helix and accessibility to the phosphorylation sites (fig. S9), consistent with its functional state in regulating translation.
DISCUSSION AND CONCLUSION
Physically meaningful models of full-length proteins are a key first step to generating biological insight regarding structure-function relationships and are better described as structural ensemble-function relationships. These ensemble representations are essential, since single conformations, including those provided by AlphaFold2, do not explain most biology. AlphaFold2 incorrectly predicts confident structures for conditionally folding IDRs(16), with particular challenges for those that have multiple conformations in different biological contexts, such as 4E-BP2(24). More generally, the single conformation AlphaFold2 provides for IDRs is often unphysically collapsed into “lasso-like” conformations surrounding clustered folded domains that only satisfy steric clash restraints rather than physically reasonable global shape properties. Such structural models also imply contacts between domains that are not confidently predicted by AlphaFold2’s own PAE metric, but just as importantly they (unintentionally) lead to structures that violate known biological principles and functional roles. Thus, while AlphaFold has been transparent that their IDR structure predictions are of low confidence, they are qualitatively wrong in their physical dimensional scale and, most critically, they are biologically implausible.
The AlphaFlex ensembles provide powerful insights into the functional roles of the proteins with intrinsic disorder by more effective modeling of conformational sampling. AlphaFlex utilizes the strength of AlphaFold(2) predictions of folded domains together with IDPConformerGenerator and IDPForge, low-computational-cost (table S3) atomistic IDR modeling strategies that are well-validated against experimental solution data(20, 22) for proteins with intrinsic disorder. Both show strong agreement with not just experimental Rg, but NOE and PRE distance data, as well as chemical shifts. Hence, AlphaFlex IDRs sample local secondary structure, predominantly fractional α-helical elements, due to the use of IDPConformerGenerator and IDPForge(20, 22). The observed local structure of IDRs within AlphaFlex ensembles, also leads to greater curvature than low confidence regions of AlphaFold2, which is also much better aligned with biological significance.
Our AlphaFlex workflow enables interchangeable modeling strategies and holistic approaches to define IDR boundaries on a proteome-wide scale, and also provides an accessible method to easily visualize full-length ensembles containing IDRs through the PED(3). Other recently described methods with different philosophies for interpreting IDR regions include Ensemblify(36) and AF-CALVADOS(53). Both Ensemblify and AF-CALVADOS heavily rely on AlphaFold’s pLDDT and PAE metrics to define IDR boundaries and interacting folded domains. Although we agree that the PAE score is a valuable metric for interactions between confidently predicted folded domains, the per-residue pLDDT is not the best predictor of intrinsic protein disorder(54), which can lead to biologically inconsistent IDR assignments that obscure functional states, particularly in the case of conditionally folding IDRs(54). Thus, AlphaFlex defines IDRs based on the union of various disorder predictors.
Using these boundaries and PAE values, our AlphaFlex workflow discriminated between three categories of proteins with IDRs, based on presence of N- and/or C-terminal IDRs, IDRs between non-interacting domains, and IDRs between stably interacting domains. Ensembles for all categories are created in the context of the folded domains to obtain representative structures that provide better models for understanding biological function, for example mediating complex formation or cellular scaffolding(5, 10, 28). We found that folded domains bias the structural landscape of IDRs in the context of full-length proteins compared to the same IDRs generated in isolation for folded domains that interact (category 3). However, although terminal IDRs (category 1) or linker IDRs between non-interacting folded domains (category 2) exhibit no differences in global structural properties when compared to IDRs generated in isolation, full-length ensembles must be modeled to avoid steric clashes and generate Rg and Ree distributions that enable the reach of folded domains and accessibility to fulfill scaffolding and regulatory binding functions, such as in large multi-domain complexes.
For the AlphaFold database entries of the human proteome having IDRs of 15 residues or more (N = 14,792), we have currently calculated AFX-IDPCG ensembles for all category 1 proteins, AFX-IDPCG ensembles for a subset of proteins (24%) in category 2, and a small subset (4%) of AFX-IDPForge and AFX-IDPCG ensembles for the most difficult category 3 set. Complete sets for both AFX-IDPCG and AFX-IDPForge across all three categories are in progress. Current and future ensembles generated by the AlphaFlex workflow may be used as information-dense machine learning training data. Since IDPConformerGenerator enables custom secondary structure sampling (CSSS) biased by NMR chemical shifts(20), experimental data may be used to regenerate AFX-IDPCG ensembles where available. In addition, AlphaFlex may be used to model protein ensembles beyond those of the canonical human proteome, leveraging the strengths of experimentally validated protein prediction and modeling strategies.
Having AlphaFlex ensembles deposited in the PED is a community resource that permits more proteome-wide analyses and biological interpretations of the full protein conformational space, highlighting the importance of global and local structural properties of IDRs in the context of full-length proteins for function. Furthermore, UniProt currently links to PED ensembles (listed just below AlphaFold in the “Structure” section), making accessibility of our conformational ensembles of proteins with IDRs extremely straightforward; for example, see Epsin-2 (UniProt ID O95208) and Homeobox protein Hox-C10 (UniProt ID Q9NYD6). Importantly, AlphaFlex predictions provide a physically relevant conformational ensemble of monomeric proteins in dilute solution. Future efforts to model the proteome’s conformational ensembles in other environments, including lipid bilayers for membrane proteins, complexes (homo- and hetero-multimers) and biomolecular condensates, represent the next pressing frontiers.
Supplementary Material
Acknowledgments:
Z.H.L. and J.D.F.-K. would like to acknowledge Dr. Simon Sharpe and Dr. Hue Sun Chan for their computational hardware contributions of AlphaFlex ensemble calculations. S.C.E.T and A.M.Monzon acknowledge ELIXIR, the research infrastructure for life-science data; views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Research Executive Agency. The authors acknowledge Dr. Jonathon Ditlev for helpful discussions and scientific input.
Funding:
National Institutes of Health grant R01GM127627 (THG, JDFK)
Canadian Institutes of Health Research grant #PJT- 180472 (JDFK, AMMoses)
Canada Research Chairs Program (JDFK, AMMoses)
Natural Sciences and Engineering Research Council of Canada grant RGPIN-2024-05725 (JDFK), fellowship PGS D 588933 2024 (ZHL)
National Institute of General Medical Sciences grant R01GM147677 (NLF)
European Cooperation in Science and Technology Action ML4NGP grant CA21160 (SCET, AMMonzon)
PNRR project ELIXIRxNextGenIT grant IR0000010 (SCET, AMMonzon)
Footnotes
Competing interests: Authors declare that they have no competing interests.
Data and materials availability:
The AlphaFlex workflow and analysis scripts are available under a subdirectory of the IDPConformerGenerator repository on GitHub after update v0.8.X (github.com/julie-forman-kay-lab/IDPConformerGenerator). The code for conformer generation with IDPForge is available at https://github.com/THGLab/IDPForge.git Ensembles calculated in this study have been uploaded to Zenodo (10.5281/zenodo.17684898), with all AlphaFlex ensembles being uploaded to the Protein Ensemble Database in a rolling fashion. Users can search the PED with the keyword “AlphaFlex” to identify AFX-IDPCG and AFX-IDPForge ensembles.
References
- 1.Eisenberg D., The discovery of the alpha-helix and beta-sheet, the principal structural features of proteins. Proc. Natl. Acad. Sci. U. S. A. 100, 11207–11210 (2003). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Jumper J., Evans R., Pritzel A., Green T., Figurnov M., Ronneberger O., Tunyasuvunakool K., Bates R., Žídek A., Potapenko A., Bridgland A., Meyer C., Kohl S. A. A., Ballard A. J., Cowie A., Romera-Paredes B., Nikolov S., Jain R., Adler J., Back T., Petersen S., Reiman D., Clancy E., Zielinski M., Steinegger M., Pacholska M., Berghammer T., Bodenstein S., Silver D., Vinyals O., Senior A. W., Kavukcuoglu K., Kohli P., Hassabis D., Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Ghafouri H., Lazar T., Del Conte A., Tenorio Ku L. G., PED Consortium, Tompa P., Tosatto S. C. E., Monzon A. M., PED in 2024: improving the community deposition of structural ensembles for intrinsically disordered proteins. Nucleic Acids Res. 52, D536–D544 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Bhowmick A., Brookes D. H., Yost S. R., Dyson H. J., Forman-Kay J. D., Gunter D., Head-Gordon M., Hura G. L., Pande V. S., Wemmer D. E., Wright P. E., Head-Gordon T., Finding our way in the Dark Proteome. J. Am. Chem. Soc. 138, 9730–9742 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Liu Z. H., Tsanai M., Zhang O., Head-Gordon T., Forman-Kay J. D., Biological insights from integrative modeling of intrinsically disordered protein systems. Curr. Opin. Struct. Biol. 93, 103063 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Burger V., Gurry T., Stultz C., Intrinsically disordered proteins: Where computation meets experiment. Polymers (Basel) 6, 2684–2719 (2014). [Google Scholar]
- 7.Pastic A., Nosella M. L., Kochhar A., Liu Z. H., Forman-Kay J. D., D’Amours D., Chromosome compaction is triggered by an autonomous DNA-binding module within condensin. Cell Rep. 43, 114419 (2024). [DOI] [PubMed] [Google Scholar]
- 8.Toyama Y., Takeuchi K., Shimada I., Regulatory role of the N-terminal intrinsically disordered region of the DEAD-box RNA helicase DDX3X in selective RNA recognition. Nat. Commun. 16, 7762 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Ahangama Liyanage L., McCready F., Chung S., Arsenault J., Wei W., Lin X., Wang L.-Y., Ellis J., Ditlev J. A., Disease-linked mutations dysregulate neuronal condensate physical properties, composition, and RNA translation, bioRxiv (2024)p. 2024.11.01.621623. [Google Scholar]
- 10.Reed B. J., Locke M. N., Gardner R. G., A conserved deubiquitinating enzyme uses intrinsically disordered regions to scaffold multiple protein interaction sites. J. Biol. Chem. 290, 20601–20612 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Borcherds W., Bremer A., Borgia M. B., Mittag T., How do intrinsically disordered protein regions encode a driving force for liquid-liquid phase separation? Curr. Opin. Struct. Biol. 67, 41–50 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Houser J. R., Cho H. W., Hayden C. C., Yang N. X., Wang L., Lafer E. M., Thirumalai D., Stachowiak J. C., Molecular mechanisms of steric pressure generation and membrane remodeling by disordered proteins. Biophys. J. 121, 3320–3333 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Oh C., Buckley P. M., Choi J., Hierro A., DiMaio D., Sequence-independent activity of a predicted long disordered segment of the human papillomavirus type 16 L2 capsid protein during virus entry. Proc. Natl. Acad. Sci. U. S. A. 120, e2307721120 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Keul N. D., Oruganty K., Schaper Bergman E. T., Beattie N. R., McDonald W. E., Kadirvelraj R., Gross M. L., Phillips R. S., Harvey S. C., Wood Z. A., The entropic force generated by intrinsically disordered segments tunes protein function. Nature 563, 584–588 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Tsang B., Pritišanac I., Scherer S. W., Moses A. M., Forman-Kay J. D., Phase separation as a missing mechanism for interpretation of disease mutations. Cell 183, 1742–1756 (2020). [DOI] [PubMed] [Google Scholar]
- 16.Alderson T. R., Pritišanac I., Kolarić Đ., Moses A. M., Forman-Kay J. D., Systematic identification of conditionally folded intrinsically disordered regions by AlphaFold2. Proc. Natl. Acad. Sci. U. S. A. 120, e2304302120 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Senanayaka D., Zeng D., Alishiri S., Martin W. J., Moore K. I., Patel R., Luka Z., Hirschi A., Reiter N. J., Autoregulatory mechanism of enzyme activity by the nuclear localization signal of lysine-specific demethylase 1. J. Biol. Chem. 300, 107607 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Martin E. W., Holehouse A. S., Peran I., Farag M., Incicco J. J., Bremer A., Grace C. R., Soranno A., Pappu R. V., Mittag T., Valence and patterning of aromatic residues determine the phase behavior of prion-like domains. Science 367, 694–699 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Fleming J., Magana P., Nair S., Tsenkov M., Bertoni D., Pidruchna I., Lima Afonso M. Q., Midlik A., Paramval U., Žídek A., Laydon A., Kovalevskiy O., Pan J., Cheng J., Avsec Ž., Bycroft C., Wong L. H., Last M., Mirdita M., Steinegger M., Kohli P., Váradi M., Velankar S., AlphaFold Protein Structure Database and 3D-Beacons: New data and capabilities. J. Mol. Biol. 437, 168967 (2025). [DOI] [PubMed] [Google Scholar]
- 20.Teixeira J. M. C., Liu Z. H., Namini A., Li J., Vernon R. M., Krzeminski M., Shamandy A. A., Zhang O., Haghighatlari M., Yu L., Head-Gordon T., Forman-Kay J. D., IDPConformerGenerator: A flexible software suite for sampling the conformational space of disordered protein states. J. Phys. Chem. A 126, 5985–6003 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Liu Z. H., Teixeira J. M. C., Zhang O., Tsangaris T. E., Li J., Gradinaru C. C., Head-Gordon T., Forman-Kay J. D., Local Disordered Region Sampling (LDRS) for ensemble modeling of proteins with experimentally undetermined or low confidence prediction segments. Bioinformatics 39, btad739 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Zhang O., Liu Z. H., Forman-Kay J. D., Head-Gordon T., Deep learning of proteins with local and global regions of disorder, arXiv [q-bio.BM] (2025). http://arxiv.org/abs/2502.11326. [Google Scholar]
- 23.Tesei G., Trolle A. I., Jonsson N., Betz J., Knudsen F. E., Pesce F., Johansson K. E., Lindorff-Larsen K., Conformational ensembles of the human intrinsically disordered proteome. Nature 626, 897–904 (2024). [DOI] [PubMed] [Google Scholar]
- 24.Dawson J. E., Bah A., Zhang Z., Vernon R. M., Lin H., Chong P. A., Vanama M., Sonenberg N., Gradinaru C. C., Forman-Kay J. D., Non-cooperative 4E-BP2 folding with exchange between eIF4E-binding and binding-incompatible states tunes cap-dependent translation inhibition. Nat. Commun. 11, 3146 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Lee J., Taneva S. G., Holland B. W., Tieleman D. P., Cornell R. B., Structural basis for autoinhibition of CTP:phosphocholine cytidylyltransferase (CCT), the regulatory enzyme in phosphatidylcholine synthesis, by its membrane-binding amphipathic helix. J. Biol. Chem. 289, 1742–1755 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Kamaliyan Z., Clarke T. L., Zinc finger proteins: guardians of genome stability. Front. Cell Dev. Biol. 12, 1448789 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Fleischer T. C., Yun U. J., Ayer D. E., Identification and characterization of three new components of the mSin3A corepressor complex. Mol. Cell. Biol. 23, 3456–3467 (2003). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Zeng M., Chen X., Guan D., Xu J., Wu H., Tong P., Zhang M., Reconstituted postsynaptic density as a molecular platform for understanding synapse formation and plasticity. Cell 174, 1172–1187.e16 (2018). [DOI] [PubMed] [Google Scholar]
- 29.Irwin R., Harkness R. W., Forman-Kay J. D., A FRET-based assay and computational tools to quantify enzymatic rates and explore the mechanisms of RNA deadenylases in heterogeneous environments. Methods Mol. Biol. 2723, 69–91 (2024). [DOI] [PubMed] [Google Scholar]
- 30.Brotzakis Z. F., Zhang S., Murtada M. H., Vendruscolo M., AlphaFold prediction of structural ensembles of disordered proteins. Nat. Commun. 16, 1632 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Lotthammer J. M., Hernández-García J., Griffith D., Weijers D., Holehouse A. S., Emenecker R. J., Metapredict enables accurate disorder prediction across the Tree of Life, bioRxiv (2024)p. 2024.11.05.622168. [Google Scholar]
- 32.Hu G., Katuwawala A., Wang K., Wu Z., Ghadermarzi S., Gao J., Kurgan L., flDPnn: Accurate intrinsic disorder prediction with putative propensities of disorder functions. Nat. Commun. 12, 4438 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Redl I., Fisicaro C., Dutton O., Hoffmann F., Henderson L., Owens B. M. J., Heberling M., Paci E., Tamiola K., ADOPT: intrinsic protein disorder prediction through deep bidirectional transformers. NAR Genom. Bioinform. 5, lqad041 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Hanson J., Yang Y., Paliwal K., Zhou Y., Improving protein disorder prediction by deep bidirectional long short-term memory recurrent neural networks. Bioinformatics 33, 685–692 (2017). [DOI] [PubMed] [Google Scholar]
- 35.Holehouse A. S., Kragelund B. B., The molecular basis for cellular function of intrinsically disordered protein regions. Nat. Rev. Mol. Cell Biol. 25, 187–211 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Fernandes N. P., Gomes T., Cordeiro T. N., Ensemblify: a user-friendly tool for generating ensembles of intrinsically disordered regions of AlphaFold and user-defined models, bioRxiv (2025)p. 2025.08.26.672300. [Google Scholar]
- 37.Dall’Armellina F., Urbé S., Rigden D. J., AlphaFold-driven discovery of ORP-PIP phosphatase interactions using new generation confidence scores, bioRxiv (2025)p. 2025.09.09.675126. [Google Scholar]
- 38.Harmalkar A., Lyskov S., Gray J. J., Reliable protein-protein docking with AlphaFold, Rosetta, and replica-exchange, eLife (2025). 10.7554/elife.94029.2. [DOI] [Google Scholar]
- 39.Chi X., Jin X., Chen Y., Lu X., Tu X., Li X., Zhang Y., Lei J., Huang J., Huang Z., Zhou Q., Pan X., Structural insights into the gating mechanism of human SLC26A9 mediated by its C-terminal sequence. Cell Discov. 6, 55 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Tesei G., Lindorff-Larsen K., Improved predictions of phase behaviour of intrinsically disordered proteins by tuning the interaction range. Open Res. Eur. 2, 94 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Pang Y. T., Yang L., Gumbart J. C., From simple to complex: Reconstructing all-atom structures from coarse-grained models using cg2all. Structure 32, 5–7 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Paiz E. A., Lewis K. A., Whitten S. T., Structural and energetic characterization of the denatured state from the perspectives of peptides, the coil library, and intrinsically disordered proteins. Molecules 26, 634 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Kabsch W., Sander C., Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22, 2577–2637 (1983). [DOI] [PubMed] [Google Scholar]
- 44.Dyson H. J., Wright P. E., NMR illuminates intrinsic disorder. Curr. Opin. Struct. Biol. 70, 44–52 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Camacho-Zarco A. R., Schnapka V., Guseva S., Abyzov A., Adamski W., Milles S., Jensen M. R., Zidek L., Salvi N., Blackledge M., NMR provides unique insight into the functional dynamics and interactions of intrinsically disordered proteins. Chem. Rev. 122, 9331–9356 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Oldfield C. J., Cheng Y., Cortese M. S., Romero P., Uversky V. N., Dunker A. K., Coupled folding and binding with alpha-helix-forming molecular recognition elements. Biochemistry 44, 12454–12470 (2005). [DOI] [PubMed] [Google Scholar]
- 47.Yang J., Gao M., Xiong J., Su Z., Huang Y., Features of molecular recognition of intrinsically disordered proteins via coupled folding and binding. Protein Sci. 28, 1952–1965 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.UniProt Consortium, UniProt: The universal protein knowledgebase in 2025. Nucleic Acids Res. 53, D609–D617 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Raisch T., Valkov E., Regulation of the multisubunit CCR4-NOT deadenylase in the initiation of mRNA degradation. Curr. Opin. Struct. Biol. 77, 102460 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Lukhele S., Bah A., Lin H., Sonenberg N., Forman-Kay J. D., Interaction of the eukaryotic initiation factor 4E with 4E-BP2 at a dynamic bipartite interface. Structure 21, 2186–2196 (2013). [DOI] [PubMed] [Google Scholar]
- 51.Sonenberg N., Gingras A.-C., The mRNA 5′ cap-binding protein eIF4E and control of cell growth. Curr. Opin. Cell Biol. 10, 268–275 (1998). [DOI] [PubMed] [Google Scholar]
- 52.Bah A., Vernon R. M., Siddiqui Z., Krzeminski M., Muhandiram R., Zhao C., Sonenberg N., Kay L. E., Forman-Kay J. D., Folding of an intrinsically disordered protein by phosphorylation as a regulatory switch. Nature 519, 106–109 (2015). [DOI] [PubMed] [Google Scholar]
- 53.von Bülow S., Johansson K. E., Lindorff-Larsen K., AF-CALVADOS: AlphaFold-guided simulations of multi-domain proteins at the proteome level, bioRxiv (2025)p. 2025.10.19.683306. [Google Scholar]
- 54.Ruff K. M., Pappu R. V., AlphaFold and implications for intrinsically disordered proteins. J. Mol. Biol. 433, 167208 (2021). [DOI] [PubMed] [Google Scholar]
- 55.Eastman P., Galvelis R., Peláez R. P., Abreu C. R. A., Farr S. E., Gallicchio E., Gorenko A., Henry M. M., Hu F., Huang J., Krämer A., Michel J., Mitchell J. A., Pande V. S., Rodrigues J. P., Rodriguez-Guerra J., Simmonett A. C., Singh S., Swails J., Turner P., Wang Y., Zhang I., Chodera J. D., De Fabritiis G., Markland T. E., OpenMM 8: Molecular dynamics simulation with machine learning potentials. J. Phys. Chem. B 128, 109–116 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Huang X., Pearce R., Zhang Y., FASPR: an open-source tool for fast and accurate protein side-chain packing. Bioinformatics 36, 3758–3765 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Tian C., Kasavajhala K., Belfon K. A. A., Raguette L., Huang H., Migues A. N., Bickel J., Wang Y., Pincay J., Wu Q., Simmerling C., Ff19SB: Amino-acid-specific protein backbone parameters trained against quantum mechanics energy surfaces in solution. J. Chem. Theory Comput. 16, 528–552 (2020). [DOI] [PubMed] [Google Scholar]
- 58.Rodrigues J. P. G. L. M., Teixeira J. M. C., Trellet M., Bonvin A. M. J. J., Pdb-tools: A swiss army knife for molecular structures. F1000Res. 7, 1961 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Lin Z., Akin H., Rao R., Hie B., Zhu Z., Lu W., Smetanin N., Verkuil R., Kabeli O., Shmueli Y., Dos Santos Costa A., Fazel-Zarandi M., Sercu T., Candido S., Rives A., Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379, 1123–1130 (2023). [DOI] [PubMed] [Google Scholar]
- 60.McGibbon R. T., Beauchamp K. A., Harrigan M. P., Klein C., Swails J. M., Hernández C. X., Schwantes C. R., Wang L.-P., Lane T. J., Pande V. S., MDTraj: A modern open library for the analysis of molecular dynamics trajectories. Biophys. J. 109, 1528–1532 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Harris C. R., Millman K. J., van der Walt S. J., Gommers R., Virtanen P., Cournapeau D., Wieser E., Taylor J., Berg S., Smith N. J., Kern R., Picus M., Hoyer S., van Kerkwijk M. H., Brett M., Haldane A., Del Río J. F., Wiebe M., Peterson P., Gérard-Marchant P., Sheppard K., Reddy T., Weckesser W., Abbasi H., Gohlke C., Oliphant T. E., Array programming with NumPy. Nature 585, 357–362 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Fleming P. J., Fleming K. G., HullRad: Fast calculations of folded and disordered protein and nucleic acid hydrodynamic properties. Biophys. J. 114, 856–869 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Melnikov D., Niemi A. J., Sedrakyan A., Topological indices of proteins. Sci. Rep. 9, 14641 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Schrödinger LLC, The PyMOL Molecular Graphics System, Version 3.0. [Preprint] (2024). [Google Scholar]
- 65.Lin J., Divergence measures based on the Shannon entropy. IEEE Trans. Inf. Theory 37, 145–151 (1991). [Google Scholar]
- 66.Del Conte A., Ghafouri H., Clementel D., Mičetić I., Piovesan D., Tosatto S. C. E., Monzon A. M., DRMAAtic: dramatically improve your cluster potential. Bioinform. Adv. 5, vbaf112 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The AlphaFlex workflow and analysis scripts are available under a subdirectory of the IDPConformerGenerator repository on GitHub after update v0.8.X (github.com/julie-forman-kay-lab/IDPConformerGenerator). The code for conformer generation with IDPForge is available at https://github.com/THGLab/IDPForge.git Ensembles calculated in this study have been uploaded to Zenodo (10.5281/zenodo.17684898), with all AlphaFlex ensembles being uploaded to the Protein Ensemble Database in a rolling fashion. Users can search the PED with the keyword “AlphaFlex” to identify AFX-IDPCG and AFX-IDPForge ensembles.





