Skip to main content
Nucleic Acids Research logoLink to Nucleic Acids Research
. 2024 May 15;52(W1):W176–W181. doi: 10.1093/nar/gkae385

AIUPred: combining energy estimation with deep learning for the enhanced prediction of protein disorder

Gábor Erdős 1, Zsuzsanna Dosztányi 2,
PMCID: PMC11223784  PMID: 38747347

Abstract

Intrinsically disordered proteins and protein regions (IDPs/IDRs) carry out important biological functions without relying on a single well-defined conformation. As these proteins are a challenge to study experimentally, computational methods play important roles in their characterization. One of the commonly used tools is the IUPred web server which provides prediction of disordered regions and their binding sites. IUPred is rooted in a simple biophysical model and uses a limited number of parameters largely derived on globular protein structures only. This enabled an incredibly fast and robust prediction method, however, its limitations have also become apparent in light of recent breakthrough methods using deep learning techniques. Here, we present AIUPred, a novel version of IUPred which incorporates deep learning techniques into the energy estimation framework. It achieves improved performance while keeping the robustness of the original method. Based on the evaluation of recent benchmark datasets, AIUPred scored amongst the top three single sequence based methods. With a new web server we offer fast and reliable visual analysis for users as well as options to analyze whole genomes in mere seconds with the downloadable package. AIUPred is available at https://aiupred.elte.hu.

Graphical Abstract

Graphical Abstract.

Graphical Abstract

Introduction

Intrinsically disordered proteins and protein regions (IDPs/IDRs) carry out vital biological functions without conforming to a single well-defined structure, challenging the conventional structure-function paradigm (1). These regions are characterized as ensembles of highly fluctuating conformations in isolation with tailored properties for specific functions (2,3). IDPs/IDRs can fulfill their function by serving as flexible linkers between domains, by participating in molecular recognition or driving the formation of membraneless organelles. Their behavior is often regulated in a context dependent manner, by post-translational modification, or environmental conditions, such as pH or redox potential. IDPs are ubiquitous in all kingdoms of life and are also abundant, especially in higher eukaryotes. It is estimated that 50% of the human proteome contains a long disordered segment (>30 residues) (4). The central resource for experimentally verified disordered segments is the DisProt database. While the number of entries in DisProt grows steadily, currently cataloging 2896 entries (5), a substantial portion of IDRs remains unexplored. This creates a strong need for computational tools that can be used to study specific proteins or to carry out large scale analysis of disordered regions.

In the last two decades, over 100 different methods have been developed for the sequence based predictions of protein disorder (6,7). These tools leverage diverse principles, including amino acid scales, biophysical models, and various machine learning techniques. In recent years, deep learning methods brought breakthroughs in structure predictions through AlphaFold2 (8) and are also transforming the field of disorder prediction methods (9). The Critical Assessment of Intrinsic protein Disorder (CAID) experiment was established in 2018 as a community-based blind test to determine the state of the art in the prediction of intrinsically disordered regions, leveraging the annotation provided by the DisProt database. Benchmarking results and statistics are available at the CAID prediction portal for the second round of CAID evaluation (10). The latest results highlight that while significant advances have been achieved in terms of general disorder prediction, the usability of top scoring methods are hindered by immense computational time and resource requirements arising from the requirement of evolutionary information and complex neural network architectures. Only a handful of methods rely on single sequence information and even fewer methods focus on balancing speed with accuracy.

The IUPred method—originally released in 2005—relies on biophysical principles to achieve accurate and fast prediction of disorder propensity (11). The method utilizes a pairwise statistical potential energy based forcefield that was optimized on a set of high resolution globular structures (12). This energy is subsequently extrapolated directly from the sequence, demonstrating a strong correlation between order and the ability for residues to engage in favorable interactions within their sequential neighborhood. Consequently, residues lacking this property are inferred to reside within disordered regions. The robustness of the biophysical principles underlying the algorithm, coupled with its logistic regression based disorder calculation, enables swift and accurate predictions, rendering IUPred one of the most widely adopted disorder prediction tools.

Over the years, subsequent iterations of IUPred have expanded its utility beyond the prediction of general disorder. These advancements have unveiled its capability to predict additional attributes, such as conditional disorder arising from interactions with partner proteins (ANCHOR and ANCHOR2 method) or environmental changes in redox potential (as demonstrated by IUPred2a-redox) (13,14). However, the subsequent versions of IUPred, including IUPred2A and IUPred3, have not introduced significant alterations to the underlying energy model or the general disorder prediction methodology, aside from minor adjustments (14–16).

This paper introduces AIUPred, an enhanced version of the original IUPred disorder prediction method. IUPred has been based on a totally unique approach, where energy-like quantities were directly predicted from the sequence and subsequently and these energy values were utilized to predict disorder. With AIUPred, we kept the core concept of IUPred, but replaced the simple approaches with state-of-the-art neural network architectures. This enabled us to further improve prediction accuracy, while maintaining the method's hallmark feature: speed. This new method outperforms every earlier version of IUPred by a substantial margin as well as the version submitted for the second round of CAID. The new method is fast and scales exceptionally well on GPUs (even with consumer level GPUs). We also updated the visual look of our web server in order to meet all modern standards and made it more responsive.

Materials and methods

IUPred, the predecessor of AIUPred, relies on a simple energy estimation method (11). The approach is based on statistical potentials, or pairwise contact energies, which are optimized on high resolution structures of globular proteins using the algorithm of P. Thomas and K. Dill (12) (Step 0). For proteins with known structure we can assign an energy to each residue based on its contacts and these energy scores. In the next step, we establish the energy estimation approach using a dataset of globular proteins again. The basic assumption is that for each residue, the energy calculated from the structure can be approximated using a simple formalism from the sequence alone (Step 1). We showed that the estimated energies can discriminate between ordered and disordered residues. For the prediction, the estimated energies are converted into disorder propensities using a logistic regression like method (Step 2). Here, we introduce the AIUPred disorder prediction method, which keeps the original energy estimation framework but combines it with the latest neural network architectures for both the energy prediction and the disorder propensity calculation.

In IUPred, the energy estimation relied on a 20 × 20 symmetrical energy predictor matrix whose parameters were derived using a linear regression-like method. In AIUPred, this is replaced with a transformer neural network. The transformer neural network was trained to predict positional energies calculated using the IUPred energy function for 17 282 non-redundant structures derived from the PISCES database (version: PISCES_cullpdb_pc40.0_res0.0–2.2_len40-10000_R0.25_Xray_d2022_05_22_chains18454) (17). From this dataset, sequences which could not be mapped to Uniprot sequences, with length below 30, and those with transmembrane regions (based on the PDBTM database (18)) were filtered. The dataset consists of 4 276 509 positions, which were split into two parts for training and testing, containing 80% and 20% of the positions, respectively. The network architecture was optimized to facilitate rapid and accurate energy prediction by keeping the amount of optimizable parameters to a minimum. For detailed description of the utilized architecture see Supplementary Material.

The second step in IUPred is translating the estimated energies into disorder tendencies using a logistic regression like approach. In AIUPred, the logistic regression is also replaced by a neural network. For detailed description of the network used for transfer learning see Supplementary Material. For training and testing, the DisProt database (version 2023-06) was used as the positive dataset (divided in a 80–20 ratio for training and testing), after filtered for redundancy using the CD-Hit algorithm (19). As a negative dataset, a set of globular domains comprises sequence regions which encode single structural domains with determined monomeric structures in the PDB (4549 protein regions) and splitted into two parties in 80–20 ratio, as as used in IUPred2A (14). For validation, the CAID1-PDB and CAID2-PDB datasets were used (each entry that is already part of DisProt was removed manually). All these datasets are available at the web server under https://aiupred.elte.hu/statistics.

To further increase the prediction capabilities of the network and the interpretability of the results an additional smoothing function was added using the Savitzky-Golay filter with parameters (11,5).

The resulting network showed great improvement over the previous iteration in multiple aspects. First the general accuracy of the method improved substantially both on the first and second validation dataset from the CAID Prediction Portal (Table 1). In order to assess the significance of the improvement we conducted a bootstrapped validation by selecting 100 sequences randomly from the independent validation dataset and assessing the resulting AUC values. Across all instances, AIUPred consistently outperformed IUPred, yielding an average AUC value of 0.9226 compared to IUPred's 0.8992. The disparity between the mean values of the two sets is statistically significant, as determined by a Mann-Whitney test yielding a p-value of 2.5e-4. We also tested the precision of the method on the prediction of fully disordered proteins as well as short disordered regions. On fully disordered proteins derived from the DisProt database AIUPred vastly outperforms its predecessor (Table 1). It shows clear improvement in terms of predicting short disordered regions. We collected regions from the DisProt database which are between 10 and 30 residues. AIUPred correctly predicted 44% while IUPred was only able to achieve 37%. We also studied missing residues with similar size from X-ray structured from the PDB database, which most often originate from short disordered segments. AIUPred was able to correctly identify 44 percent of such regions, while IUPred was only able to assess 36%.

Table 1.

Comparison of IUPred and AIUPred on various validation datasets

CAID1-PDB AUC CAID2-PDB AUC Fully disordered DisProt (%) Short disorder DisProt (%) Missing residues in PDB
IUPred 0.8615 0.8815 65% 37% 36%
AIUPred 0.8772 0.912 76% 44% 46%

We used the CAID prediction portal to compare AIUPred to SPOT-Disorder-Single, the highest-ranking single sequence-based method in the second round of CAID. SPOT-Disorder-Single achieved AUC = 0.917 on the CAID2-PDB set. In terms of speed, it processed an average of 0.023 proteins per second on a commercial CPU (equivalent to 240 hours for the human proteome) and does not leverage a dedicated GPU. AIUPred achieved AUC = 0.912 on the CAID2 dataset. However, it can complete the analysis of the human proteome in just 1.5 h, processing 3.5 proteins per second on a CPU. Additionally, utilizing a commercial-grade GPU (NVidia 1080) significantly accelerated this process, reducing the time to just 3 min (equivalent to analyzing 100 proteins per second) (Table 2).

Table 2.

Comparison of top scoring single sequence methods from CAID2. Due to a lack of downloadable package information about GPU utilization is not available for SPOT-Disorder Single and SETH methods

Accuracy (AUC) Running time on the Human proteome
CAID2 (PDB set) CPU GPU
SETH-0 0.930 1000 h 1.5 h
SPOT-Disorder-Single 0.917 240 h n/a
AIUPred 0.912 1.5 h 3 min

Server description

Main page

In AIUPred, we combined the input and output pages into a single interface. The new design also incorporates a ‘dark mode’ to ease the wear on users’ vision on longer sessions of analysis. As an input, the user can provide both standard UniProt accessions as well as identifiers (20). If a Uniprot accession or identifier is submitted, the corresponding sequence is accessed. To speed up this process, we created a snapshot of UniProt and stored it locally (downloaded at 2023-11-23). If a protein is not found in the locally stored database, a direct query against UniProt will be carried out. The submission page also contains a box to use a protein sequence as an input with one letter amino acid letters in plain text of FASTA format. In addition to disorder prediction (default option), the user can also carry out predictions for disordered binding regions (ANCHOR2), or select the method to identify redox state dependent disordered regions using the radio buttons. There is also an option to omit the default smoothing function. The page also features two examples, one which presents the user with the accession of the human p53 protein (P04637) and another which opens the advanced menu and fills the sequence input box with the sequence of the Human adenovirus protein E1A (P03255) and selects the ANCHOR2 context dependent disordered prediction option.

Disorder prediction output

Once the proper inputs are selected and submitted, the server calculates the results on the latest Django based back-end (version 5.0). The result of the analysis is shown directly below the main input field, which enables the quick analysis of another protein.

The output of the requested calculation is visualized using the latest PlotlyJS library (version 2.20) integrated into the Django frontend template system. In the prediction profile, values >0.5 indicate disordered regions. Integration with the UniProt resource enables the display of various additional information about the requested protein (when available). In case of a sequence input, AIUPred tries to match that sequence to a UniProt entry based on hashes generated from the sequence. If only one match is found, AIUPred will map the input to the found entry in UniProt. In order to help the interpretation of the results, additional annotations are also presented. These include information on experimentally verified disordered regions from three different databases: experimentally verified disordered regions from DisProt and disordered binding regions from DIBS and MFIB, together with known motifs from the ELM database (21–23). Low-throughput post-translational modifications (including Ser, Thr and His phosphorylations, methylation, ubiquitylation and acetylation sites) from PhosphoSitePlus are also indicated (24). In addition, PFAM annotations with the different types of sequence families (domain, families, repeats, motifs, disordered) highlighted with different colors (25). As AIUPred have been improved significantly in terms of speed, the limiting bottleneck is the calculation of PFAM annotation. To speed up access to results each prediction is calculated on-the-fly server side and returned asynchronously allowing users immediate access to results that have faster loading times.

In case the ANCHOR2 is selected, AIUPred will generate and display the results of the ANCHOR2 prediction with a blue line as well as the usual red for AIUPred prediction. A higher value indicated by the blue line corresponds to disordered regions that likely undergo a disorder-to-order transition upon binding to a partner protein. If the ‘Redox state’ option is selected from the same submenu, AIUPred will carry out an analysis of redox potential dependent disordered regions. For this, we use the same parameters as in IUPred2A, but using the AIUPred prediction profiles to calculate the ‘Redox plus’ and ‘Redox minus’ lines. These lines correspond to the state with cysteine stabilization (redox-plus), achieved either through disulfide bonds and by Zn binding and to one without cysteine stabilization (redox-minus), modeled by a cysteine/serine swap (14). An area between the resulting ‘Redox plus’ and ‘Redox minus’ lines will be filled with orange color if a region is predicted to undergo an order-to-disorder or disorder-to-order transition upon the change in the environmental redox potential.

The generated plots can be downloaded in high resolution PNG format including the annotations using the ‘camera’ icon on the top right corner of the plot. In addition to the graphical output, users can also download the results in both plain text and JSON format. AIUPred also supports RESTFul API access, which enables programmatic access to results directly from the web server in the same formats.

Besides the visualization features of the server we also offer a downloadable package free for academic use. The package contains an executable python script as well as an importable python library for programmatic access. Both of these require a currently supported version of python3 (3.7+) as well as the PyTorch library and the SciPy library.

Use cases

Example 1. Analysis of the Human TP53-binding protein 1 (UniProt: Q12888)

The human TP53-binding protein is a repair protein involved in response to DNA damage and telomere dynamics. It contains an experimentally verified long disordered segment spanning from the N-terminal up to residue 1483 followed by a structured domain responsible for the recruitment of proteins to double stranded breaks in DNA. After the domain there is another short experimentally verified disordered region followed by a C terminal BRCT domain (26–28). All of the structural features described in the literature are captured by AIUPred (Figure 1) as well as an indication that the second disordered region is likely longer than current experimental verification suggests. The N-terminal disordered region harbors multiple binding sites for the short linear motif binding protein LC8/DYNLL1 (at positions 1150–1157, 1167–1174, 1192–1199). All of these sites are located in correctly predicted disordered regions (29).

Figure 1.

Figure 1.

AIUPred analysis of the Human TP53-binding protein 1 (UniProt: Q12888). Prediction profile is generated using the default options (AIUPred - only disorder, Default smoothing). Known binding regions are indicated by red arrows. Below the plot from additional annotations can be seen from top to bottom respectively: Annotated regions from PFAM (PFAM row), Annotated motifs form the Eukaryotic Linear Motif database (ELM row) and Annotated low-throughput post-translational modification sites from the PhosPhoSitePlus database (PTM row).

Example 2. Redox potential sensitive conditional disorder in the 33 kDa chaperonin (Hsp33) protein from E. coli (UniProt: P0A6Y5)

Hsp33 falls within the holdase category of molecular chaperones. Its functionality is intricately tied to oxidative environments, yet this protein predominantly exists in a disordered functional state. In unstressed conditions, Hsp33 assumes a tightly folded configuration, bound to zinc and exhibiting minimal activity. However, under oxidative stress, the protein undergoes significant structural changes (30). Specifically, the formation of two intramolecular disulfide bonds and the release of zinc ions prompt the unfolding of the zinc-binding domain. This structural alteration exposes the substrate binding surface of the chaperone, essential for its activity. Notably, the redox switch domain spanning residues 224–285 is accurately pinpointed by AIUPred's redox-dependent disorder prediction (Figure 2).

Figure 2.

Figure 2.

AIUPred Redox potential dependent conditional disorder prediction of the E. coli protein Hsp33 (UniProt: P0A6Y5). Prediction profile is generated using the Redox state option. The Redox plus and Redox minus lines indicate the disorder prediction profile in the two scenarios, corresponding to states when cysteine residues are strongly stabilizing and when they are not, respectively.

Conclusion

While a large number of methods have been developed for the sequence based prediction of disordered regions, relatively few of them are available as a user-friendly web server. IUPred has served the research community for nearly two decades providing fast and robust disorder prediction methods. In this work we presented AIUPred web server, the newest installment of IUPred. In AIUPred, we kept the original energy prediction framework which was key to the robustness of the method, but incorporated state-of-the-art deep learning algorithms. The transformer architecture showcased here enables more accurate structural energy computation directly from the sequence. We show that while this energy-like quantity was never trained on disordered regions it can be utilized to efficiently predict IDRs. We demonstrated that this resulted in improved performance using recently published benchmark datasets, which placed AIUPred among the top single sequence based methods for intrinsically disordered proteins. The new method is also fast and scales exceptionally well with GPUs (even with consumer level GPUs). We have also implemented several upgrades on the website to augment its user-friendliness. The accelerated speed of AIUPred prompted us to optimize the performance of related calculations on the back-end server, resulting in a swift visualization tool for the analysis of protein disorder.

Rephrasing the original approach can open up novel ways to not only for general disorder but also to characterize their additional features. We demonstrated this by updating our prediction method for redox-sensitive disordered regions. In future iterations we plan to exploit the improved energy estimation for further applications, including the prediction of binding regions as well. AIUPred presents another way to exploit the power of deep learning methods for the improved sequence-based prediction of protein disorder. The next step in this field is the prediction of conformational ensembles of IDPs, for which deep learning based methods have also started to emerge (31–33).

Supplementary Material

gkae385_Supplemental_File

Acknowledgements

This paper is dedicated to the memory of Dr István Simon. We would like to thank András Lukács for providing access to GPU servers of the AI Research Group, Institute of Mathematics, ELTE Eötvös Loránd University. We also acknowledge Norbert Deutsch and Rámi Abo Alhuda for their critical comments on the manuscript and the web server .

Contributor Information

Gábor Erdős, Department of Biochemistry, Eötvös Loránd University, Pázmány Péter stny 1/c, Budapest H-1117, Hungary.

Zsuzsanna Dosztányi, Department of Biochemistry, Eötvös Loránd University, Pázmány Péter stny 1/c, Budapest H-1117, Hungary.

Data availability

AIUPred is free and open to all users and there is no login requirement. AIUPred can be accessed at https://aiupred.elte.hu.

Supplementary data

Supplementary Data are available at NAR Online.

Funding

National Research, Development and Innovation Fund of Hungary [K139284 to Z.D.]; University Excellence Award of ELTE (to D.Z. 2022); ELIXIR, the research infrastructure for life-science data. Funding for open access charge: National Research, Development and Innovation Fund of Hungary [K139284 to Z.D.].

Conflict of interest statement. None declared.

References

  • 1. Wright  P.E., Dyson  H.J.  Intrinsically unstructured proteins: re-assessing the protein structure-function paradigm. J. Mol. Biol.  1999; 293:321–331. [DOI] [PubMed] [Google Scholar]
  • 2. Wright  P.E., Dyson  H.J.  Intrinsically disordered proteins in cellular signalling and regulation. Nat. Rev. Mol. Cell Biol.  2015; 16:18–29. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. van der Lee  R., Buljan  M., Lang  B., Weatheritt  R.J., Daughdrill  G.W., Dunker  A.K., Fuxreiter  M., Gough  J., Gsponer  J., Jones  D.T.  et al.  Classification of intrinsically disordered regions and proteins. Chem. Rev.  2014; 114:6589–6631. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Oates  M.E., Romero  P., Ishida  T., Ghalwash  M., Mizianty  M.J., Xue  B., Dosztányi  Z., Uversky  V.N., Obradovic  Z., Kurgan  L.  et al.  D2P2: database of disordered protein predictions. Nucleic Acids Res.  2013; 41:D508–D516. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Aspromonte  M.C., Nugnes  M.V., Quaglia  F., Bouharoua  A.DisProt Consortium DisProt Consortium Tosatto  S.C.E., Piovesan  D.  DisProt in 2024: improving function annotation of intrinsically disordered proteins. Nucleic Acids Res.  2024; 52:D434–D441. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Zhao  B., Kurgan  L.  Surveying over 100 predictors of intrinsic disorder in proteins. Expert Rev. Proteomics. 2021; 18:1019–1029. [DOI] [PubMed] [Google Scholar]
  • 7. Kurgan  L., Hu  G., Wang  K., Ghadermarzi  S., Zhao  B., Malhis  N., Erdős  G., Gsponer  J., Uversky  V.N., Dosztányi  Z.  Tutorial: a guide for the selection of fast and accurate computational tools for the prediction of intrinsic disorder in proteins. Nat. Protoc.  2023; 18:3157–3172. [DOI] [PubMed] [Google Scholar]
  • 8. Jumper  J., Evans  R., Pritzel  A., Green  T., Figurnov  M., Ronneberger  O., Tunyasuvunakool  K., Bates  R., Žídek  A., Potapenko  A.  et al.  Highly accurate protein structure prediction with AlphaFold. Nature. 2021; 596:583–589. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Zhao  B., Ghadermarzi  S., Kurgan  L.  Comparative evaluation of AlphaFold2 and disorder predictors for prediction of intrinsic disorder, disorder content and fully disordered proteins. Comput. Struct. Biotechnol. J.  2023; 21:3248–3258. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Conte  A.D., Mehdiabadi  M., Bouhraoua  A., Miguel Monzon  A., Tosatto  S.C.E., Piovesan  D.  Critical assessment of protein intrinsic disorder prediction (CAID) - results of round 2. Proteins. 2023; 91:1925–1934. [DOI] [PubMed] [Google Scholar]
  • 11. Dosztányi  Z., Csizmók  V., Tompa  P., Simon  I.  The pairwise energy content estimated from amino acid composition discriminates between folded and intrinsically unstructured proteins. J. Mol. Biol.  2005; 347:827–839. [DOI] [PubMed] [Google Scholar]
  • 12. Thomas  P.D., Dill  K.A.  An iterative method for extracting energy-like quantities from protein structures. Proc. Natl. Acad. Sci. U.S.A.  1996; 93:11628–11633. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Mészáros  B., Simon  I., Dosztányi  Z.  Prediction of protein binding regions in disordered proteins. PLoS Comput. Biol.  2009; 5:e1000376. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Mészáros  B., Erdos  G., Dosztányi  Z.  IUPred2A: context-dependent prediction of protein disorder as a function of redox state and protein binding. Nucleic Acids Res.  2018; 46:W329–W337. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Dosztányi  Z., Csizmok  V., Tompa  P., Simon  I.  IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content. Bioinformatics. 2005; 21:3433–3434. [DOI] [PubMed] [Google Scholar]
  • 16. Erdős  G., Pajkos  M., Dosztányi  Z.  IUPred3: prediction of protein disorder enhanced with unambiguous experimental annotation and visualization of evolutionary conservation. Nucleic Acids Res.  2021; 49:W297–W303. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Wang  G., Dunbrack  R.L.  Jr  PISCES: a protein sequence culling server. Bioinformatics. 2003; 19:1589–1591. [DOI] [PubMed] [Google Scholar]
  • 18. Kozma  D., Simon  I., Tusnády  G.E.  PDBTM: protein Data Bank of transmembrane proteins after 8 years. Nucleic Acids Res.  2013; 41:D524–D529. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Fu  L., Niu  B., Zhu  Z., Wu  S., Li  W.  CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics. 2012; 28:3150–3152. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. UniProt Consortium  UniProt: the Universal Protein knowledgebase in 2023. Nucleic Acids Res.  2023; 51:D523–D531. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Schad  E., Fichó  E., Pancsa  R., Simon  I., Dosztányi  Z., Mészáros  B.  DIBS: a repository of disordered binding sites mediating interactions with ordered proteins. Bioinformatics. 2018; 34:535–537. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Fichó  E., Reményi  I., Simon  I., Mészáros  B.  MFIB: a repository of protein complexes with mutual folding induced by binding. Bioinformatics. 2017; 33:3682–3684. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Dinkel  H., Michael  S., Weatheritt  R.J., Davey  N.E., Van Roey  K., Altenberg  B., Toedt  G., Uyar  B., Seiler  M., Budd  A.  et al.  ELM–the database of eukaryotic linear motifs. Nucleic Acids Res.  2012; 40:D242–D251. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Hornbeck  P.V., Kornhauser  J.M., Tkachev  S., Zhang  B., Skrzypek  E., Murray  B., Latham  V., Sullivan  M.  PhosphoSitePlus: a comprehensive resource for investigating the structure and function of experimentally determined post-translational modifications in man and mouse. Nucleic Acids Res.  2012; 40:D261–D270. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Mistry  J., Chuguransky  S., Williams  L., Qureshi  M., Salazar  G.A., Sonnhammer  E.L.L., Tosatto  S.C.E., Paladin  L., Raj  S., Richardson  L.J.  et al.  Pfam: the protein families database in 2021. Nucleic Acids Res.  2021; 49:D412–D419. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Derbyshire  D.J., Basu  B.P., Serpell  L.C., Joo  W.S., Date  T., Iwabuchi  K., Doherty  A.J.  Crystal structure of human 53BP1 BRCT domains bound to p53 tumour suppressor. EMBO J.  2002; 21:3863–3872. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Joo  W.S., Jeffrey  P.D., Cantor  S.B., Finnin  M.S., Livingston  D.M., Pavletich  N.P.  Structure of the 53BP1 BRCT region bound to p53 and its comparison to the Brca1 BRCT structure. Genes Dev.  2002; 16:583–593. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Lambrus  B.G., Holland  A.J.  A new mode of mitotic surveillance. Trends Cell Biol.  2017; 27:314–321. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Howe  J., Weeks  A., Reardon  P., Barbar  E.  Multivalent binding of the hub protein LC8 at a newly discovered site in 53BP1. Biophys. J.  2022; 121:4433–4442. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Reichmann  D., Xu  Y., Cremers  C.M., Ilbert  M., Mittelman  R., Fitzgerald  M.C., Jakob  U.  Order out of disorder: working cycle of an intrinsically unfolded chaperone. Cell. 2012; 148:947–957. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Lotthammer  J.M., Ginell  G.M., Griffith  D., Emenecker  R.J., Holehouse  A.S.  Direct prediction of intrinsically disordered protein conformational properties from sequence. Nat. Methods. 2024; 21:465–476. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. Tesei  G., Trolle  A.I., Jonsson  N., Betz  J., Knudsen  F.E., Pesce  F., Johansson  K.E., Lindorff-Larsen  K.  Conformational ensembles of the human intrinsically disordered proteome. Nature. 2024; 626:897–904. [DOI] [PubMed] [Google Scholar]
  • 33. Zhu  J., Li  Z., Tong  H., Lu  Z., Zhang  N., Wei  T., Chen  H.  Phanto-IDP: compact model for precise intrinsically disordered protein backbone generation and enhanced sampling. Brief. Bioinform.  2023; 25:bbad429. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

gkae385_Supplemental_File

Data Availability Statement

AIUPred is free and open to all users and there is no login requirement. AIUPred can be accessed at https://aiupred.elte.hu.


Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press

RESOURCES