Abstract
The folding conformation of native protein has flexibility in different degrees, which may bring difficulty in presenting the structures, and also it causes complexity in understanding the relationship between structure and functions. Although many methods and databases provide information for intrinsically disordered protein (IDP), they are mainly limited to determining the intrinsically disordered regions (IDR) lacking knowledge of possible folding patterns. To overcome the barrier, the protein structure fingerprint technology has been developed, which includes PFSC (Protein Folding Shape Code) (Yang, 2008) and PFVM (Protein Folding Variation Matrix) (Yang et al., 2022) algorithms as well as FiveFold (Yang et al., 2025) approach for protein structure prediction, which are able explicitly to expose the possible conformational structures for intrinsically disordered protein. Three proteins, human cellular tumor antigen P53, human alpha-synuclein, and human protamine-2, are taken as samples for demonstration of how to obtain their folding conformation structures for intrinsically disordered proteins. The folding features for intrinsically disordered proteins with given structures may be revealed by the alignment of PFSC strings, and the folding possibility for intrinsically disordered proteins without a given structure can be exhibited by the local folding variations in PFVM. Furthermore, the multiple conformational 3D structures for intrinsically disordered protein can be predicted by FiveFold approach, which provides a significant tool further to understand the intrinsic disorder of proteins.
Keywords: Intrinsically disordered protein, Protein folding, Protein conformation, Protein structure prediction
Graphical abstract
Highlights
-
•
The native protein has flexible conformation.
-
•
Although many methods and database provided the information for intrinsically disordered protein (IDP), but they are mainly limited to determine the intrinsically disordered regions (IDR) lacking knowledge of possible folding patterns.
-
•
To overcome the barrier, a FiveFold approach was developed based on PFSC-PFVM (Protein Folding Shape Code - Protein Folding Variation Matrix) algorithms, which is able explicitly to expose the possible conformational structures for intrinsically disordered protein.
-
•
The PFSC alignment is able to expose the differences of folding conformation for intrinsically disordered protein.
-
•
The PFVM is able to reveal local folding variations for intrinsically disordered protein only according sequence.
-
•
Furthermore, the conformational 3D structures for intrinsically disordered protein can be predicted by FiveFold approach.
-
•
Three proteins, human cellular tumor antigen P53, human protamine-2 and human alpha-synuclein, are taken as samples for demonstration how to obtain their folding conformation structures for intrinsically disordered proteins.
1. Introduction
Although the concept that a well-defined protein 3D structure fulfilled its biological function, more studies revealed that the protein intrinsic disorder also played important roles in biological functions (Bu and Callaway, 2011; Kamerlin and Warshel, 2010; Collins et al., 2008a). Vladimir N. Uversky is one of the most influential researchers in the field of IDPs, who has extensively explored the biophysical properties of IDPs and their role in various diseases. Peter E. Wright applied nuclear magnetic resonance (NMR) spectroscopy to study IDP dynamics, which are involved in cellular regulation, signaling, and interaction with other biomolecules. The protein intrinsic disorder plays a significant role in both biological functions and pathological syndromes. The disordered folding conformations in proteins are particularly implicated in cell signaling, transcription, chromatin remodeling functions, or binding affinity (Iakoucheva et al., 2002; Sandhu, 2009; Collins et al., 2008b; van der Lee et al., 2014). Also many studies showed that the IDPs/IDRs caused different human diseases, including neurodegenerative, cardiovascular, diabetes, cancer, and amyloidosis, etc. (Uversky et al., 2008; Langella et al., 2021) Of course, the flexibility of folding conformation increases the complicity to comprehend the relationship between protein structure and biological function. Furthermore, thoroughly understanding the intrinsically disordered protein involves the essential object of protein folding problem.
Any protein basically has flexible conformations with different degrees rather than has a single rigid structure in the physiological environment. With conformational flexibility, a protein is able properly to interact with other proteins, ligands, etc., and also it is able to be adjusted to changing environment and conditions. Also, some proteins have fully disordered features along almost the entire sequence as IDP, and some proteins have partial regions with folding disorder along the sequence as IDR. The scope of disordered regions in protein indeed needs to be known first, which may be determined by experimental measurements, found out from a database, or obtained by computational prediction. The protein intrinsic disorder in some degrees may be detected by various experimental approaches, such as X-ray Crystallography (Dunker et al., 2001), Nuclear Magnetic Resonance (NMR) (Dyson and Wright, 2002, 2004) and Cryo-Electron Microscopy (Cryo-EM) (Yan et al., 2015), etc. About four thousand unique proteins are experimentally validated to contain disordered regions in Protein Data Bank (PDB) (Monzon et al., 2020). Over two hundred millions of sequences in MobiDB database have been annotated due to disorder based on missing residues in X-ray crystallographic structures and flexible regions in NMR structures (Piovesan et al., 2023). Also, many proteins, including over 2300 protein entries and more than 2500 scientific publications involving IDP and IDR, have been annotated in DisProt database (Quaglia et al., 2022). Furthermore, the protein intrinsic disorder can be predicted by various computational tools while some databases have been established (Zhao and Kurgan, 2021). The prediction approaches are classified into four broad categories. First, the prediction approach is based on physical or chemical properties of residues or prior knowledge of segments of intrinsic protein disorder, such as FoldIndex (Prilusky and others, 2005), NORSp (Liu and Rost, 2003), GlobPlot (Linding et al., 2003a), CH plot (Huang et al., 2014), DisPredict, (Iqbal and Hoque, 2015), PONDR (Xue et al., 2010), and SLIDER (Peng et al., 2014). Second, the prediction approach is based on inter-residue contacts, such as IUPred (Erdős and others, 2021), FoldUnfold (Galzitskaya et al., 2006), and Ucon (Schlessinger and others, 2007). Third, the prediction approach is based on algorithms on structural data sets to determine disorder and order regions in amino acid sequence, such as PrDOS (Ishida and Kinoshita, 2007), MobiDB (Piovesan and others, 2021), IDEAL (Fukuchi et al., 2014), PONDR (Obradovic et al., 2003), Spritz (Vullo et al., 2006), DisEMBL (Linding et al., 2003b), RONN (Yang and others, 2005), s2D (Sormanni et al., 2015), MFDp (Mizianty et al., 2013), Disopred3 (Jones and Cozzetto, 2015), and D2P (Yang et al., 2022). (Oates et al., 2013) Fourth, the prediction approach is based on a deep-learning process. Recently the developed Alphafold approach defined the lower confident regions in protein with lower pLDDT (predicted Local Distance Difference Test) (Carugo, 2023) score for IDR while it achieved the protein structure prediction competitive accuracy with experimental determination (Jumper et al., 2021; Tunyasuvunakool et al., 2021; Ruff and Pappu, 2021; Wilson et al., 2022; Guo et al., 2022). Also, some approaches, such as IUPred2A and DEPICTER2, etc., tried to predict the relationship between IDR and disorder biological functions (Basul et al., 2023; Meszaros et al., 2018; Oldfield et al., 2020). Anyway, for more than two decades, the effort, which focused on developing qualitative prediction for IDP and IDR, has made much progress in exposing protein intrinsic disorder.
The challenge of investigation of intrinsically disordered protein has persisted in two aspects despite progress in experimental measurement, database, and computational prediction. First, it lacks knowledge about specific folding conformations for intrinsically disordered proteins. Second, it is how to describe variable conformations for IDP or IDR. Some proteins may have multiple stable conformational states, and some proteins may keep folding flexibility and never stabilize in some states. With the perspective of the protein folding problem, in 1969 Cyrus Levinthal, an American molecular biologist, already indicated that a protein may have an astronomical number of folding conformations along its amino acid sequence (Levinthal, 1969), which is closely associated with the issues of protein intrinsic disorder. In practice, the most possible conformations with stable status under physiological conditions are interested to be known, and the unstable conformations for biological functions are also wanted to be known. In order to overcome these hurdles, the protein structure fingerprint technology, including PFSC-PFVM algorithms and FiveFold approach, has been developed to expose the flexible conformations for intrinsically disordered protein. First, a set of PFSC with 27 alphabetic letters is established to cover the full folding space of a mathematical model of five free-connected points for five amino acid residues. Then, a database, 5AAPFSC, is established to collect all possible folding patterns in PFSC letters for any combination of five amino acids. Subsequently, the PFVM is built up according to protein sequence by extraction of all possible local folding shapes in PFSC letters for each of five residues from the 5AAPFSC database, which is able to expose the folding flexibility for protein intrinsic disorder along the sequence. Based on PFVM, a massive number of folding conformations in PFSC strings are obtained by the combination of various local folding variations, which represent multiple conformations for intrinsically disordered proteins. Furthermore, according to these PFSC strings, an ensemble of conformational 3D structures is able to be constructed as the predicted protein structures, which is called as FiveFold approach (Yang et al., 2025). In summary, the protein structure fingerprint technology can expose the folding features of intrinsically disordered proteins. First, for a protein with enough numbers of given structures, the intrinsically disordered features may be revealed by alignment of PFSC strings because each protein structure conformation can be presented by a PFSC string. Second, for a protein without given structure, the intrinsically disordered features may be exposed by its PFVM because the local folding variations along the sequence are displayed in PFVM. Third, multiple conformation structures can be predicted by FiveFold because multiple conformations in the PFSC string can be formed according to its PFVM. Therefore, the protein structure fingerprint technology is an effective tool for discovering multiple conformation structures for intrinsically disordered proteins.
2. Materials and methods
The protein structure fingerprint technology is composed of four modules to reveal folding features for intrinsically disordered protein. The four modules and process with flow chart are presented in Fig. 1 (Yang et al., 2025). The detailed algorithms can be found in prepublication, (Yang, 2008; Yang et al., 2022), and they are briefly described here. First, a set of 27 alphabetic letters was mathematically set up to cover full folding space for five free connecting points. It is applied to five residues as a folding element in protein, which is called PFSC (Protein Folding Shape Code). The conformation of any protein 3D structure is able to be completely described by a PFSC string. Then, all possible folding patterns for each of 3,200,000 sets of permutations of five amino acids for 20 amino acids are generated and set up a 5AAPFSC database. Furthermore, the protein's local folding variations along its sequence are fully presented by its PFVM (Protein Folding Variation Matrix), which is assembled by extracting the folding shapes of five amino acid residues from the 5AAPFSC database according to its sequence. Based on a PFVM, a massive number of folding conformations in PFSC strings can be explicitly obtained by combinations of PFSC letters from each column. Furthermore, an ensemble of multiple conformational protein structures can be predicted based on the PFSC strings from PFVM, which approach is called FiveFold. So, protein structure fingerprint technology is able to investigate IDP and IDR.
Fig. 1.
The protein structure fingerprint technology, including PFSC, PFVM, and FiveFold algorithms. The cubic contains a set of 27 PFSC as folding patterns. The blue arrows indicated the process of the PFSC string for a protein with a known 3D structure. The green arrows indicated the process of constructing the 5AAPFSC database, which contained all folding shapes in PFSC letters for 3,200,000 permutations of five amino acids. The red arrows indicate the process how to obtain the PFVM and to predict an ensemble of protein structures from a sequence. The PFSC letters with red and pink colors are for typical helix and alike-helix local folds; the blue and light blue colors for the beta strand and alike-beta strand and black color for irregular folds.
PFSC: A protein conformation is able to be completely described by a PFSC alphabetic letter string. Mathematically, starting successive five-points connection without any constraint, a set of folding shapes for five amino acid residues is obtained completely to cover folding space, which is represented by 27 letters including “$” as PFSC. The 27 PFSC letters are able well to character folding patterns for five amino acid residues for protein folding features, including alpha-helix, beta-strand, irregular folds, and mixture in various degrees. The folding conformation for any protein with given structure is able fully to be described by a PFSC string without a gap along the sequence, covering secondary structure fragments as well as tertiary structure fragments. Thus, all structural data in PDB were converted into PFSC strings and stored into a PBD-PFSC database. It is useful that any protein conformation can be well compared by the alignment of PFSC strings.
5AAPFSC: For 20 amino acids, 3,200,000 sets of permutations for five amino acids are mathematically generated. A set of five amino acids may have multiple folding patterns. All possible folding patterns for each five amino acid fragment in PFSC letters are assembled into a 5AAPFSC database. The structural data for two-thirds of permutations of five amino acids are available in PDB, and their folding patterns are able to be collected. The structures of remnant one-third of permutations of five amino acids were computed by MD simulation with CHARMM (Chemistry at Harvard Macromolecular Mechanics) (Brooks et al., 2009). Then, all folding patterns for five amino acids are converted into the PFSC letters and stored into the 5AAPFSC database.
PFVM: The protein's local folding variations along its sequence are fully presented by its PFVM. A PFVM is assembled by extracting the folding shapes of five amino acid residues from the 5AAPFSC database according to its sequence. In PFVM, the protein sequence from the N-terminus to the C-terminus is horizontally displayed on top, and the possible folding shapes in PFSC letters for each set of five amino acid residues are listed in a column with middle position below five residues. Based on a PFVM, an astronomical number of folding conformations in PFSC strings can be explicitly obtained by combinations of PFSC without ambiguity. The PFSC string on the first row in the PFVM (PFVM-01) is one of the most possible conformations for a protein, and more possible conformations can be formed by optimization with the replacement of PFSC letters at the same column in PFVM. Thus, a set of most possible folding conformations in PFSC are able to be obtained to present the conformational flexibility of protein.
FiveFold: An ensemble of multiple conformational protein structures can be predicted based on the PFSC strings from PFVM. Based on each PFSC string, its protein 3D structure can be constructed by a high throughput screening of the PDB-PFSC database with homologous conformation process. For larger proteins, it is usually divided into smaller pieces with length of about 50–100 residues to search for homologous conformation, and then the fragments are connected back for the whole protein and the optimized 3D structure is obtained by molecule dynamic simulations. Thus, a protein 3D structure is able to be constructed according to one PFSC string, and a set of ensemble of protein 3D structures for multiple conformations is predicted according to the multiple PFSC string from PFVM. So, the multiple conformational structures for intrinsically disordered protein are able to be predicted.
AlphaFold, Robetta, and I-TASSER predicted structures: The Human P53 Protein, Alpha-synuclein Protein, and Protamine-2 Protein structures were predicted by AlphaFold, I-TASSER and Robetta. The AlphaFold structure was obtained in the AlphaFold Protein Structure Database (Jumper et al., 2021). The I-TASSER predicted structures were obtained in the I-TASSER online server (Yang et al., 2015), which applied a multiple threading approach through PDB to acquire multiple templates representing different conformations for homologous proteins. The Robetta predicted structures were obtained in Robetta online server (Baek et al., 2021), which extensively sampled the protein's structural landscape using Rosetta's fragment-based modeling and energy optimization to obtain protein multiple conformation structures.
3. Result
The conformations for intrinsically disordered protein are able to be explicitly revealed by PFSC and PFVM. If a protein has a large number of given structures, its folding difference as intrinsic disorder may be discovered by the alignment of PFSC strings because the conformation for any given structure is able to be completely described by a PFSC alphabetical letter string. For example, the protein of human cellular tumor antigen (P53_HUMAN) has many experimental determined structures in PDB, so these given structures may expose the folding difference for intrinsic disorder in some degrees. However, most proteins do not have enough structural data or even without any given structure, in those cases, the folding features of protein intrinsic disorder are able to be exposed by PFVM which is obtained according to its sequence. The folding patterns for the intrinsic disorder of three proteins, human cellular tumor antigen (P53_HUMAN) protein, Alpha-synuclein (SYUA_HUMAN) protein, and Protamine-2 (PRM2_HUMAN) protein, were separately revealed by its PFVM. Also, the FiveFold prediction structures were compared with given structures and AlphaFold predicted structures.
3.1. Folding patterns exposed by PFSC
Human p53 is a well-studied protein with 393 residues on sequence, and it has many given structures available in PDB. Over 170 structural data covered the region 93–292 including the DNA-binding domain (DBD), and three structures almost covered the entire protein. In order to expose the intrinsically disordered features, a set of P53_HUMAN protein structures, which included twelve structures from the X-Ray diffraction method and one structure with ten models from NMR method for the P53 DBD domain, were compared by PFSC alignment. Table 1 showed that these structures are measured by different methods at different laboratories at different time, under different conditions and different interactions with molecules. The structures for P53 DBD domain were compared. Although a domain should have a stable conformation, the superimposition of P53 DBD structures showed its folding intrinsic disorder to some degree under different conditions. With the superimposition of P53 DBD domain, the structural similarity and difference were displayed in Fig. 2A and B. However, with visualization, it was hard to illustrate the folding difference between these structures. In order to expose the intrinsic disorder of these given structures, each conformation of these given protein structures was converted into PFSC string. The conformations in the PFSC string of 22 given structures of the P53 DNA-binding domain were aligned and displayed in Fig. 3. Each PFSC letter represented the folding shape of five amino acid residues. The PFSC letters with red and pink colors were for typical helix and alike-helix local folds; the blue and light blue colors were for the beta strand and alike-beta strand and black color for irregular folds. With the observation of colors from PFSC letters, the folding similarity and dissimilarity could be easily revealed while the folding difference for protein intrinsic disorder is exposed. First, the secondary structure fragments of the P53 DBD domain were generally aligned well. Second, both the intrinsic stable region and intrinsic disordered region might be distinguished by comparison of the alignment of PFSC strings. For P53 DBD domain (94–293), the intrinsic stable regions included 125–136, 140–147, 153–176, 191–199, 226–237, 251–261 and 266–285; the intrinsic disordered regions included 94–124, 137–139, 148–152, 177–190, 200–225, 238–250 and 262–265. Third, some missing structural fragments with PFSC letters might be involved with the intrinsically disordered regions due to hard determination by experimental measurement, such as 113–123 for 3D05-A and 7EDS-A; 182–188 for 3D05-A, 3IGK-A, 4IBV-A, 6ZNC-A, and 7B4d-A. Fourth, any local difference in the folding pattern for intrinsic disorder could be discovered by the difference of PFSC letters after alignment because each PFSC letter represents the folding shape of five amino acids. Therefore, the PFSC alignment is able well to expose the intrinsically disordered features of a protein with given structures.
Table 1.
The difference of environments and measure methods for given P53_HUMAN protein. There were 17 protein structures timing span 1996–2022 from PDB, of which 12 were measured by X-Ray diffraction, one by solution NMR, and three by electron microscopy. The difference between interaction molecules and the residue replacements was indicated.
| PDB ID |
Interaction Molecule | Residue Replacement | Method | Year |
|---|---|---|---|---|
| Chain | ||||
| 1YCS-A | Zn+2, 53BP2 | X-Ray Diffraction | 1996 | |
| 2PCX-A | Zn+2 | R282Q | X-Ray Diffraction | 2007 |
| 3D05-A | Zn+2 | R249S | X-Ray Diffraction | 2008 |
| 3IGK-A | Zn+2, DNA | H168R,R249S | X-Ray Diffraction | 2009 |
| 4IBV-A | Zn+2, DNA, C2H6O2 | R273C | X-Ray Diffraction | 2012 |
| 4IJT-A | Zn+2, DNA, C4H10O2S2, C2H6O2 | R273H | X-Ray Diffraction | 2012 |
| 4MZI-A | Zn+2 | V123G, C135V, C141V, W146Y, C182S, V203A, R209P, C229Y, HYNY(233–236)YFKF, T253V, N269D | X-Ray Diffraction | 2013 |
| 5BUA-A | Zn+2, DNA, C8H16N2O3 | X-Ray Diffraction | 2015 | |
| 6ZNC-A | Zn+2, DNA, C8H13NO, C8H13NO | X-Ray Diffraction | 2020 | |
| 7B4D-A | Zn+2, DNA, C8H13NO, CH2O2 | R273C | X-Ray Diffraction | 2020 |
| 7EDS-A | Zn+2, ARSENIC | M133T | X-Ray Diffraction | 2021 |
| 8HLL-A | Zn+2, Apoptosis regulator Bcl-2 | X-Ray Diffraction | 2022 | |
| 2FEJ-A | Zn+2 | Solution NMR | 2005 | |
| 6XRE-M | Zn+2, Mg+2, DNA-directed RNA polymerases | Electron Microscopy | 2020 | |
| 8F2H-A | P53 dimer | Electron Microscopy | 2022 | |
| 8F2I-A | P53 monomer | Electron Microscopy | 2022 |
Fig. 2.
The superposition of given structures from PDB for P53 protein DBD domain. A: twelve structures were measured by X-Ray diffraction or electron microscopy. B: ten structural models were measured NMR.
Fig. 3.
The conformation comparison of given structures from PDB for P53 protein DBD domain (94–292) with PFSC alignment. The PDB ID and chain or model were listed on the left. The top section was the amino acid sequence and position number. The bottom section was the protein conformation for each given structure in the PFSC string. The PFSC letters with red and pink colors were for typical helix and alike-helix local folds; the blue and light blue colors were for the beta strand and alike-beta strand and black color for irregular folds. The yellow color on the sequence indicated the relatively stable folding regions in the protein.
3.2. Folding patterns exposed by PFVM
With sequence only, the protein's intrinsically disordered patterns could be comprehendingly exposed by the PFVM platform. Over 245,000,000 proteins, however, have the sequence information available in UniProt database, but most of these proteins do not have the given structure for the study of intrinsic disorder. Even if some proteins have some of the given structures, the number of structural data is not enough for the investigation of protein intrinsic disorder. Alphaford approach and its database can predict the protein 3D structure according to its sequence, but its result is limited to a structure in a single state, i.e. it does not offer structures for multiple folding states. So, it does not meet the requirement for the investigation of protein intrinsically disordered. Significantly, the PFVM provided an effective tool to expose the local folding variations along sequence which might be associated with the intrinsic disorder of a protein. Three proteins as samples, Human P53 protein (P53_HUMAN), Alpha-synuclein protein (SYUA_HUMAN), and Protamine-2 protein (HSP1_HUMAN), demonstrated how the intrinsically disordered features were directly exposed by PFVM. Also, the multiple conformation structures were predicted by FiveFold.
3.3. Human P53 protein
The Human 53 protein is composed of four domains, such as transactivation domain 1 (6–30), transactivation domain 2 (35–59), DNA-binding domain (100–288), and tetramerization domain (319–357). The PFVM for human P53 protein with 393 residues was displayed at section A and B in Fig. 4. The PFSC letters with red and pink colors were for typical helix and alike-helix local folds; the blue and light blue colors were for the beta strand and alike-beta strand and black color for irregular folds. Each column listed the possible folding shapes in PFSC letters for five residues while the most tendentious folding shapes were ranked on top. Thus, the local folding variations with folding patterns and possible folding numbers along the sequence were well displayed in the PFVM, which provided the platform to investigate the protein intrinsic disorder. At section A in Fig. 4, the intrinsically disordered regions were marked in sequence by blue and relatively stable regions by yellow. Also, the top row of the PFSC string (PFVM-01) was composed of the most favored folding shapes which represented one of the most possible conformations. And some secondary structure fragments, such as alpha-helix and beta-strand, were displayed. It was obvious that the relatively stable regions had secondary structure fragments in PFVM-01. Based on PFVM-01, a PFSC letter might be replaced by other PFSC letter in the same column in PFVM, then more possible conformations were formed to represent the folding flexibility for multiple conformations, and five PFSC strings as possible multiple conformations were listed at section D in Fig. 4,.
Fig. 4.
The PFVM and the alignment of conformations in the PFSC string for the entire P53 protein (1–393). A and C: P53 Sequence and residue position rule. According MobiDB for intrinsically disordered proteins, the relatively stable region was marked by yellow color and the intrinsically disordered region by blue. B: P53 protein PFVM. D: five possible P53 conformations in PFSC strings formed from PFVM and the conformations for FiveFold. E: five conformations in PFSC string for structures predicted by Robetta (based on RoseTTAFold). F: five conformations in PFSC string for structures predicted by I-TASSER. G: the conformation in PFSC string for single structure predicted by AlphaFold and H: the conformations in PFSC string for three given structures from PDB. The PFSC letters with red and pink colors were for typical helix and alike-helix local folds; the blue and light blue colors were for the beta strand and alike-beta strand and black color for irregular folds. Blue color on sequence marked the intrinsically disordered regions; and yellow color for relatively stable regions.
With different methods, the predicted regions and degrees of protein intrinsic disorder for P53 were shown in Fig. 5, which included the curves of the number of local folding patterns along sequence in PFVM, PrDOS, PONDR, MobiDB-Lite, PDB, and AlphaFold. Generally, the PFVM indicated the N-terminus (1–93) and C-terminus (293–393) were more disordered than the DNA-binding domain region (94–292), which overall agreed with the results of other methods. However, it was apparent that the PFVM had more detailed disordered information for P53, which indicated some intrinsically ordered regions in the N-terminus and C-terminus while some intrinsically disordered regions in the stable DNA-binding domain region. More importantly, the PFVM not only provided information about regions and degree of protein intrinsic disorder, but it also offered possible folding patterns.
Fig. 5.
Prediction of protein intrinsic disorder for human P53 by various methods. The method names or resources were listed on the left column. The disorder was presented by curves, and the values of Y-axis parameters for PFVM were the numbers of local folding patterns of five residues; for PrDOS and PONDR were the disorder probability of geometry space of each residue based on threshold line. The disorder regions in PDB, MobiDB-Lite, and AlphaFold were marked by red color.
According to the PFSC strings at section D in Fig. 4, the multiple conformational structures of the P53 protein could be predicted by the FiveFold approach, and the superposition of the five predicted structures were displayed at section A of Fig. 6. The multiple conformational structures of P53 protein could also be predicted by Robetta and I-TASSER methods, and the superposition of five predicted structures for each method were shown in section B and C. A single structure predicted by AlphaFold was shown in section D. Also, the superposition of three given structures from PDB as well as AlphaFold structure were displayed in section E. With image superposition, it was obvious that the predicted structures by the same method were different, and also the structures from different methods were different. These structural differences or similarities were blurred. However, the conformation difference of these structures was well exposed by PFSC string alignment. Each 3D protein structure could be converted into a PFSC string for conformation description, and then the PFSC string was aligned for comparison. With PFSC alignment, the structure similarity and difference could be easily exposed. First, it was apparent that the secondary structure fragments were generally aligned for different methods with the observation of red and pink colors for alpha helix, and blue and light blue for beta strand. Second, the predicted DNA-binding domain (DBD) was relatively stable than the N-terminus and C-terminus. Third, the FiveFold and I-TASSER methods better indicated the intrinsically disordered region in the N-terminus. Thus, the PFSC string alignment better discovered the structural features in similarity and difference than the traditional structure superposition.
Fig. 6.
The images of the superposition of P53 protein structures. A: five structures predicted by FiveFold. B: five structures predicted by Robetta. C: five structures predicted by I-TASSER. D: one structure predicted by AlphaFold. E: three given structures from PDB and AlphaFold.
3.4. Alpha-synuclein protein
Alpha-synuclein (SYUA_HUMAN) is a small soluble protein with 140 residues, and it is considered to be a typical sample for intrinsically disordered protein due to the lack of a single stable 3D structure (van Rooijen et al., 2009). Alpha-synuclein can adopt alpha-helical conformations when it interacts with lipid membranes. Under pathological conditions, it can aggregate into beta-sheet structures, forming oligomers and fibrils that are toxic to neurons. Alpha-synuclein is a neuronal protein for the regulation of synaptic vesicle trafficking and subsequent neurotransmitter release, which is involved with Parkinson's disease, Lewy body dementia, and multiple system atrophy (Spillantini et al., 1997).
The protein intrinsic disorder for Alpha-synuclein was exposed in Fig. 7, which included PFVM, NMR structure, and the structure predicted by FiveFold, I-TASSER, Robetta, and AlphaFold methods. First, the NMR structure discovered the intrinsic disordered regions. The 34 models of NMR structure clearly displayed the region 1–22 with a sable alpha-helical fragment; the region 23–47 in the circle with some disorder due to agglomeration of disordered sequence which the residues were remarked by red color under NMR images, and they caused the dispersal of orientation for following fragment; the region 48–100 with sable alpha-helical fragment in different space orientations and the region of 100–140 with wider disorder. Second, the AlphaFold predicted structure also exposed some intrinsic disorders. The AlphaFold structure displayed the region 1–90 with relative stable as long alpha helix with pLDDT score larger than 70 and the region 90–140 with disordered feature with pLDDT score around 50 or lower. Third, with local folding variations, the PFVM provided rich information to expose the intrinsic disorder of the Alpha-synuclein protein. With PFVM, the local folding variations along the sequence displayed the intrinsic disorder features with folding patterns and possible folding numbers. The PFSC string at the top row (PFVM-01) showed the regions of protein secondary structural fragments, such as the regions 1–22 and 48–93 are almost alpha-helical folds and other regions of 23–47 and 93–140 had various folds, which were agreed with NMR structural information. The regions in sequence with the obvious secondary structure fragmented were relatively stable and marked by yellow; the regions in sequence with various folds were widely disordered and marked by blue. Also, for the investigation of intrinsic disorder, the PFVM provided information on local folding variations of each set of five residues along the sequence. The AlphaFold only predicted one static structure at a single state, so it could not discover the intrinsic disorder features of Alpha-synuclein. However, both the PFVM and NMR structure exposed intrinsic disordered features for Alpha-synuclein.
Fig. 7.
The structural intrinsic disordered information for Alpha-synuclein (SYUA_HUMAN) protein. A: sequence and the ruler for residue position. The sequence marked with yellow was for relatively stable regions and the blue for wide disorder regions. B: PFVM for Alpha-synuclein protein. C: five conformations in PFSC string which were formed from PFVM and the conformation for FiveFold. D: five conformations in PFSC string for structures predicted by Robetta (based on RoseTTAFold). E: five conformations in PFSC string for structures predicted by I-TASSER and F: the conformation in PFSC string for structure predicted by AlphaFold. Bottom section: the images of NMR structure and AlphaFold predicted structure, FiveFold predicted structures, I-TASSER predicted structures, and Robetta predicted structures. The disordered region of sequence (23–47) for the NMR structure was marked by a circle and its residues were marked by red color under the NMR structure. The pLDDT was a per-residue model confidence score for AlphaFold. The PFSC letters with red and pink colors were for typical helix and alike-helix local folds; the blue and light blue colors for the beta strand and alike-beta strand and black color for irregular folds.
The multiple conformation structures of Alpha-synuclein were predicted by various methods, The superposition of five predicted structures by FiveFold, I-TASSER, and Robetta were separately displayed in Fig. 7. With PFSC alignment, the conformation comparisons were shown in section C, D, E, and F. First, for the intrinsically disorder regions (23–47 and 100–140), the PFSC alignment well exposed the features with the predicted structures from FiveFold, I-TASSER, and Robetta methods. Second, for relatively stable regions, I-TASSER predicted rigid helix fragments in regions (1–22 and 48–99), Robetta predicted too many features of beta strands in the region (48–99); FiveFold predicted rational folding features in these regions. Third, with the folding shape for five amino acids, the FiveFold provided a more detailed description to investigate the protein intrinsic disorder. Furthermore, compard with NMR and other methods, the PFSC, PFVM, and FiveFold better revealed the intrinsically disordered folding patterns based on five amino acid residues for the Alpha-synuclein protein.
3.5. Protamine-2 protein
Protamine-2 (PRM2_HUMAN) protein with 102 amino acids resides is a largely disordered protein in solution, it binds to DNA and helps to condense and stabilize the DNA in sperm cells. Protamine-2 protein doesn't have any structure available in PDB, but a structure is predicted by AlphaFold which has several alpha-helical fragments. The model confidence (pLDDT) for the AlphaFold structure is lower than 70; several alpha-helical regions are on 50–70 and other regions are less than 50. However, the intrinsic disorder of Protamine-2 could be further exposed by its PFVM at section B in Fig. 8. First, both intrinsically disordered regions and relatively stable regions for Protamine-2 protein might be distinguished in PFVM. The number of PFSC letters in each column indicated the possible change in folding patterns. A column with a higher number of PFSC letters might indicate the intrinsically disordered region; conversely, a column with less number of PFSC letters might related to stable regions. The intrinsically disordered regions were marked by blue color on their sequence and the relatively stable regions by yellow color in Fig. 8. Second, many alpha-helical folding patterns in PFSC letters (red or pink) for five amino acids were shown in the PFVM of Protamine-2, particularly in the first row of PFVM (PFVM-01). Third, a set of multiple conformations, which represent flexible conformations for protein intrinsic disorder, was formatted by the PFVM. The first row of the PFSC string (PFVM-01) in PFVM was one of the most possible conformations for the Protamine-2 protein. Based on PFVM-01, more PFSC strings could be obtained by replacement of the PFSC letter in the same column. Five PFSC strings among multiple conformations were listed at section C section in Fig. 8. With the FiveFold approach, five conformational structures were separately predicted according to each PFSC string, and they were shown at the bottom section. In order to compare, five structures were separately predicted by I-TASSER and Robetta methods, then their superposition of structures and one structure predicted by AlphaFold were displayed in Fig. 8, and their conformation alignments in PFSC string were listed in section D, E, and F. First, compared with AlphaFold, the FiveFold, I-TASSER, and Robetta methods provided multiple conformation structures. Second, for relatively stable regions, the I-TASSER predicted too much unstable folding features at regions (15–31, 50–60, and 65–76); Robetta predicted unstable folding features at region (50–60). FiveFold predicted stable folding features for these regions. Third, for intrinsically disordered regions, all methods revealed folding variations in different degrees. Fourth, to compare with FiveFold, the Robetta had longer helical fragments. With PFVM, the FiveFold approach predicted multiple conformational structures for the intrinsic disorder of the Protamine-2 protein. Thus, according to the Protamine-2 protein sequence, the PFVM was able to expose the protein's disordered features, and it was a significant platform to reveal the protein intrinsically disordered protein without a given structure. Furthermore, multiple conformation structures could be predicted by PFVM.
Fig. 8.
The intrinsically disordered features of Protamine-2 (PRM2_HUMAN) protein. A: the sequence and the ruler for residue position. Blue color on sequence marked the intrinsically disordered regions; and yellow color for relatively stable regions. B: the PFVM for Protamine-2protein. C: five possible conformations in PFSC string obtained from PFVM. D: five conformations in PFSC string for structures predicted by Robetta (based on RoseTTAFold). E: five conformations in PFSC string for structures predicted by I-TASSER and F: the conformation in PFSC string for structure predicted by AlphaFold. Bottom Section: AlphaFold predicted structure, FiveFold predicted structures, I-TASSER predicted structures, and Robetta predicted structures. The PFSC letters with red and pink colors were for typical helix and alike-helix local folds; the blue and light blue colors were for the beta strand and alike-beta strand and black color for irregular folds.
4. Discussion
4.1. Prediction of disordered proteins
The intrinsic disorder popularly exists in most proteins. The protein is a polypeptide which is composed of a series of amino acid residues, and a protein as a polypeptide is not a rigid structural system as a gears or a mechanical component. Any protein does not only have a single static state in its conformation, and its instant conformation depends on the physiological environment or interaction with other proteins and small molecules (Niazi, 2025). With time scale, some proteins may stay at a stable conformation with minimum energy state; some proteins may always float between several different conformations. With molecule scale, some proteins may have folding flexibility along the entire sequence; some proteins may only have a few fragments with intrinsic disorder. So, the intrinsic disorder is the essence of protein structure.
As an intrinsic disorder in proteins, the assessment of the quality of protein structure prediction should not focus only on the structure accuracy. Particularly, it should not emphasize the accuracy against only one of the given structures when many given structures with different conformations are available in PDB. As the protein conformation has flexibility with intrinsic disorder, a judicious criterion for protein structure prediction should reflect the reality of protein structure with multiple conformations. So, the assessment of the quality of protein structure prediction should consider the authenticity of protein structure including both accuracy and flexibility (Yang et al., 2025). Thus, it is optimal that a method of protein structure prediction is able to provide an ensemble of multiple conformations for a protein structure, which embraces all given conformational structures as well as provides more other possible conformations even unknown ones.
The investigation of intrinsically disordered proteins involves the study of protein folding problem. Theoretically, the protein folding problem relates to astronomical numbers of conformations. So the solution to the protein folding problem benefits understanding the essential of intrinsically disordered protein. However, the study of intrinsically disordered protein may focus on the limited numbers of folding conformations. Also, the tendency to study intrinsically disordered protein is to understand the protein's biological functions and related diseases. So far, many methods and databases have been developed to determine the intrinsically disordered regions and proteins. However, after knowing the intrinsically disordered regions or proteins, the challenge is what possible folding patterns are for intrinsically disordered regions and proteins. Furthermore, it is how to display the possible folding patterns for intrinsically disordered regions and proteins. The PFSC-PFVM algorithm provided a solution to reveal possible folding patterns and structures for intrinsically disordered proteins (Yang et al., 2023).
4.2. Characterize disordered proteins
The protein structure fingerprint technology, including PFSC, PFVM, and FiveFold, provided an effective tool to characterize disordered protein. Exactly, any native protein has intrinsically disordered features that cannot be exposed by a single state conformation. A set of protein structures with multiple conformations may represent the folding fluctuations for intrinsically disordered protein. Unfortunately, most of proteins do not have multiple 3D structures available in PDB or directly obtained from the AlphaFold approach. The obstacle, however, has been overcome by the protein structure fingerprint technology. First, the alignment of PFSC strings can explicitly reveal the similarities and differences between multiple conformations for the study of intrinsically disordered protein. Second, only based on sequence, the PFVM provided the local folding variations along a protein sequence which displayed the landscape of intrinsically disordered protein. For any protein, the local folding variations are fully exposed by its PFVM, which may reveal the features of intrinsic disorder in protein. For example, the PFSC letters in each column represent the possible folding patterns and folding numbers, so higher numbers of PFSC letters in some columns may indicate the locations with folding flexibility. Also, the most possible folding patterns are listed on top of each column, so the PFSC string at the top row (PFVM-01) is one of the most possible conformations. Generally, the regions with secondary structure fragments in PFVM-01 are relatively stable, other regions may relate to intrinsic disorder. Significantly, the capability of PFVM is not only to determine the regions of intrinsically disordered, but it also provides the most possible folding patterns for intrinsically disordered protein. Third, with PFVM, the FiveFold can predict multiple conformation structures for the investigation of flexible features in structure for intrinsically disordered protein. The structure of a native protein under physiological conditions is flexible, so it essentially has multiple conformations. Some methods, such as Robetta and I-TASSER, have been developed to predict the 3D structures with multiple conformations. However, these methods are limited to sampling the homology fragments in PDB. Comparatively, the protein structure fingerprint technology takes the folding pattern for each set of five amino acid residues in PFSC letters to describe folding conformation, and also it takes all possible folding patterns for each set of five amino acid residues in the PFVM platform to expose protein multiple conformations and structures. Furthermore, the protein structure fingerprint technology provided a better tool to compare the multiple folding conformations to reveal the structural similarity and differences for the investigation of intrinsically disordered protein.
4.3. PFVM server
The PFVM can be generated at a link in http://www.micropht.com/. Users can enter any amino acid sequence as input, and the PFVM will be output as a result on screen or saved into a file as output.
CRediT authorship contribution statement
Jiaan Yang: overall designed and managed the project and prepared the manuscript. Wenxin Ji: performed data operation and prepared the draft manuscript. Wen Xiang Cheng: and. Gang Wu: prepared input data. Si Tong Sheng: deployed software code and maintained the web server. Peng Zhang: performed output data collection and analysis. Jun Lin: and. Xiaojia Chen: and. Qiong Shi: verified final data and result.
Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgements
This work was supported by the Guangdong Key R&D Program (2023B1111030002); Guangdong S&T Program No. 2022B111070007 and No. 2024B1111160006.
Handling Editor: Dr A Wlodawer
Footnotes
This article is part of a special issue entitled: Protein intrinsic disorder published in Current Research in Structural Biology.
Data availability
Data will be made available on request.
References
- Baek Minkyung, DiMaio Frank, Anishchenko Ivan, Dauparas Justas, Ovchinnikov Sergey, Lee Gyu Rie, Wang Jue, Cong Qian, Kinch Lisa N., Dustin Schaeffer R., Millán Claudia, Park Hahnbeom, Adams Carson, Glassman Caleb R., DeGiovanni Andy, Pereira Jose H., Rodrigues Andria V., van Dijk Alberdina A., Ebrecht Ana C., Opperman Diederik J., Sagmeister Theo, Buhlheller Christoph, PavkovKeller Tea, Rathinaswamy Manoj K., Dalwadi Udit, Yip Calvin K., Burke John E., Christopher Garcia K., Grishin Nick V., Adams Paul D., Read Randy J., Baker David. Accurate prediction of protein structures and interactions using a 3-track network. Science. 2021 doi: 10.1126/science.abj8754. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Basul S., Gsponer J., Kurgan L. DEPICTER2: a comprehensive webserver for intrinsic disorder and disorder function prediction. Nucleic Acids Res. 2023;51:W141–W147. doi: 10.1093/nar/gkad330. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Brooks B.R., Brooks C.L., Mackerell A.D., et al. CHARMM: the biomolecular simulation program. J. Comput. Chem. 2009;30(10):1545–1614. doi: 10.1002/jcc.21287. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bu Z., Callaway D.J. "proteins move! protein dynamics and long-range allostery in cell signaling". Protein structure and diseases. Adv. Protein Chem. Struct. Biol. 2011;83:163–221. doi: 10.1016/B978-0-12-381262-9.00005-7. [DOI] [PubMed] [Google Scholar]
- Carugo O. pLDDT values in AlphaFold2 protein models are unrelated to globular protein local flexibility. Crystals. 2023;13:1560. [Google Scholar]
- Collins M.O., Yu L., Campuzano I., Grant S.G., Choudhary J.S. Phosphoproteomic analysis of the mouse brain cytosol reveals a predominance of protein phosphorylation in regions of intrinsic sequence disorder. Mol. Cell. Proteomics. 2008;7(7):1331–1348. doi: 10.1074/mcp.M700564-MCP200. [DOI] [PubMed] [Google Scholar]
- Collins M.O., Yu L., Campuzano I., Grant S.G., Choudhary J.S. Phosphoproteomic analysis of the mouse brain cytosol reveals a predominance of protein phosphorylation in regions of intrinsic sequence disorder. Mol. Cell. Proteomics. 2008;7(7):1331–1348. doi: 10.1074/mcp.M700564-MCP200. [DOI] [PubMed] [Google Scholar]
- Dunker A.K., Lawson J.D., Brown C.J., Williams R.M., Romero P., Oh J.S., Oldfield C.J., Campen A.M., Ratliff C.M., Hipps K.W., et al. Intrinsically disordered protein. J. Mol. Graph. Model. 2001;19:26–59. doi: 10.1016/s1093-3263(00)00138-8. [DOI] [PubMed] [Google Scholar]
- Dyson H.J., Wright P.E. Insights into the structure and dynamics of unfolded proteins from nuclear magnetic resonance. Adv. Protein Chem. 2002;62:311–340. doi: 10.1016/s0065-3233(02)62012-1. [DOI] [PubMed] [Google Scholar]
- Dyson H.J., Wright P.E. Unfolded proteins and protein folding studied by NMR. Chem. Rev. 2004;104:3607–3622. doi: 10.1021/cr030403s. [DOI] [PubMed] [Google Scholar]
- Erdős G., others IUPred3: prediction of protein disorder enhanced with unambiguous experimental annotation and visualization of evolutionary conservation. Nucleic Acids Res. 2021;49(W1):297–303. doi: 10.1093/nar/gkab408. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fukuchi S., Amemiya T., Sakamoto S., Nobe Y., Hosoda K., Kado Y., Murakami S.D., Koike R., Hiroaki H., Ota M. IDEAL in 2014 illustrates interaction networks composed of intrinsically disordered proteins and their binding partners". Nucleic Acids Res. 2014;42:D320–D325. doi: 10.1093/nar/gkt1010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Galzitskaya O.V., Garbuzynskiy S.O., Lobanov M.Y. FoldUnfold: web server for the prediction of disordered regions in protein chain. Bioinformatics. 2006;(23):2948–2949. doi: 10.1093/bioinformatics/btl504. [DOI] [PubMed] [Google Scholar]
- Guo H.B., Perminov A., Bekele S., et al. AlphaFold2 models indicate that protein sequence determines both structure and dynamics. Sci. Rep. 2022;12 doi: 10.1038/s41598-022-14382-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Huang F., Oldfield C.J., Xue B., et al. Improving protein order-disorder classification using charge-hydropathy plots. BMC Bioinf. 2014;15(Suppl. 17) doi: 10.1186/1471-2105-15-S17-S4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Iakoucheva L.M., Brown C.J., Lawson J.D., Obradović Z., Dunker A.K. Intrinsic disorder in cell-signaling and cancer-associated proteins. J. Mol. Biol. 2002;323(3):573–584. doi: 10.1016/s0022-2836(02)00969-5. [DOI] [PubMed] [Google Scholar]
- Iqbal S., Hoque M.T. DisPredict: a predictor of disordered protein using optimized RBF kernel. PLoS One. 2015;10(10) doi: 10.1371/journal.pone.0141551. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ishida T., Kinoshita K. PrDOS: prediction of disordered protein regions from amino acid sequence. Nucleic Acids Res. 2007;35(Web Server issue):W460–W464. doi: 10.1093/nar/gkm363. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jones D.T., Cozzetto D. DISOPRED3: precise disordered region predictions with annotated protein-binding activity. Bioinformatics. 2015;31(6):857–863. doi: 10.1093/bioinformatics/btu744. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jumper J., Evans R., Pritzel A., et al. Highly accurate protein structure prediction with AlphaFold. Nature. 2021;596:583–589. doi: 10.1038/s41586-021-03819-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kamerlin S.C., Warshel A. At the dawn of the 21st century: is dynamics the missing link for understanding enzyme catalysis?". Proteins. 2010;78(6):1339–1375. doi: 10.1002/prot.22654. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Langella E., Buonanno M., De Simone G., Monti S.M. Intrinsically disordered features of carbonic anhydrase IX proteoglycanlike domain. Cell. Mol. Life Sci. 2021;78:2059–2067. doi: 10.1007/s00018-020-03697-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Levinthal C. How to fold graciously. Mossbauer spectroscopy in biological systems. 1969;67(41):22–24. [Google Scholar]
- Linding R., Russell R.B., Neduva V., Gibson T.J. GlobPlot: exploring protein sequences for globularity and disorder. Nucleic Acids Res. 2003;31(13):3701–3708. doi: 10.1093/nar/gkg519. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Linding R., Jensen L.J., Diella F., Bork P., Gibson T.J., Russell R.B. Protein disorder prediction: implications for structural proteomics structure. 2003;11(Issue 11) doi: 10.1016/j.str.2003.10.002. [DOI] [PubMed] [Google Scholar]
- Liu J., Rost B. NORSp: predictions of long regions without regular secondary structure. Nucleic Acids Res. 2003;31(13):3833–3835. doi: 10.1093/nar/gkg515. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Meszaros B., Erdos G., Dosztanyi Z. IUPred2A: context-dependent prediction of protein disorder as a function of redox state and protein binding. Nucleic Acids Res. 2018;46:W329–W337. doi: 10.1093/nar/gky384. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mizianty M.J., Peng Z., Kurgan L. MFDp2: accurate predictor of disorder in proteins by fusion of disorder probabilities, content and profiles. Intrinsically Disord. Proteins. 2013;1(1) doi: 10.4161/idp.24428. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Monzon A.M., Necci M., Quaglia F., Walsh I., Zanotti G., Piovesan D., Tosatto S.C.E. Experimentally determined long intrinsically disordered protein regions are now abundant in the protein data bank. Int. J. Mol. Sci. 2020;21:4496. doi: 10.3390/ijms21124496. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Niazi S.K. Quantum mechanics paradox in protein structure prediction: intrinsically linked to sequence yet independent of it,computational and structural. Biotechnology Reports. 2025;2 [Google Scholar]
- Oates M.E., Romero P., Ishida T., Ghalwash M., Mizianty M.J., Xue B., Dosztányi Z., Uversky V.N., Obradovic Z., Kurgan L., Dunker A.K., Gough J. D2P2: database of disordered protein predictions. Nucleic Acids Res. 2013;41(Database issue):D508–D516. doi: 10.1093/nar/gks1226. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Obradovic Z., Peng K., Vucetic S., Radivojac P., Brown C.J., Dunker A.K. Predicting intrinsic disorder from amino acid sequence. Proteins Struct. Funct. Genet. 2003;53:566–572. doi: 10.1002/prot.10532. [DOI] [PubMed] [Google Scholar]
- Oldfield C.J., Peng Z., Kurgan L. Disordered RNA-Binding region prediction with DisoRDPbind. Methods Mol. Biol. 2020;2106:225–239. doi: 10.1007/978-1-0716-0231-7_14. [DOI] [PubMed] [Google Scholar]
- Peng Z., Mizianty M.J., Kurgan L.A. Genome-scale prediction of proteins with long intrinsically disordered regions. Proteins: Struct., Funct., Bioinf. 2014;82(1):145–158. doi: 10.1002/prot.24348. [DOI] [PubMed] [Google Scholar]
- Piovesan D., others MobiDB: intrinsically disordered proteins. Nucleic Acids Res. 2021;49(D1):D361–D367. doi: 10.1093/nar/gkaa1058. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Piovesan D., Del Conte A., Clementel D., Monzon A.M., Bevilacqua M., Aspromonte M.C., Iserte J.A., Orti F.E., Marino-Buslje C., Tosatto S.C.E. MobiDB: 10 years of intrinsically disordered proteins. Nucleic Acids Res. 2023;51(D1):D438–D444. doi: 10.1093/nar/gkac1065. 6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Prilusky J., others FoldIndex©: a simple tool to predict whether a given protein sequence is intrinsically unfolded. Bioinformatics. 2005;21(16):3435–3438. doi: 10.1093/bioinformatics/bti537. [DOI] [PubMed] [Google Scholar]
- Quaglia F., Mészáros B., Salladini E., Hatos A., Pancsa R., Chemes L.B., Pajkos M., Lazar T., Peña-Díaz S., Santos J., Ács V., Farahi N., Fichó E., Aspromonte M.C., Bassot C., Chasapi A. etc. DisProt in 2022: improved quality and accessibility of protein intrinsic disorder annotation. Nucleic Acids Res. 2022;50(D1):D480–D487. doi: 10.1093/nar/gkab1082. 7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ruff K.M., Pappu R.V. AlphaFold and implications for intrinsically disordered proteins. J. Mol. Biol. 2021;433(Issue 20) doi: 10.1016/j.jmb.2021.167208. [DOI] [PubMed] [Google Scholar]
- Sandhu K.S. Intrinsic disorder explains diverse nuclear roles of chromatin remodeling proteins. J. Mol. Recogn. 2009;22(1):1–8. doi: 10.1002/jmr.915. [DOI] [PubMed] [Google Scholar]
- Schlessinger A., others Natively unstructured regions in proteins identified from contact predictions. Bioinformatics. 2007;23(18):2376–2384. doi: 10.1093/bioinformatics/btm349. [DOI] [PubMed] [Google Scholar]
- Sormanni P., Camilloni C., Fariselli P., Vendruscolo M. The s2D method: simultaneous sequence-based prediction of the statistical populations of ordered and disordered regions in proteins. J. Mol. Biol. 2015;427(4):982–996. doi: 10.1016/j.jmb.2014.12.007. [DOI] [PubMed] [Google Scholar]
- Spillantini M.G., Schmidt M.L., Lee V.M., Trojanowkski J.Q., Jakes R., Goedert M. "α-Synuclein in Lewy bodies". Nature. 1997;388(6645):839–840. doi: 10.1038/42166. [DOI] [PubMed] [Google Scholar]
- Tunyasuvunakool K., Adler J., Wu Z., et al. Highly accurate protein structure prediction for the human proteome. Nature. 2021;596:590–596. doi: 10.1038/s41586-021-03828-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Uversky V.N., Oldfield C.J., Dunker A.K. Intrinsically disordered proteins in human diseases: introducing the D2 concept. Annu. Rev. Biophys. 2008;37:215–246. doi: 10.1146/annurev.biophys.37.032807.125924. [DOI] [PubMed] [Google Scholar]
- van der Lee R., Lang B., Kruse K., Gsponer J., Sánchez de Groot N., Huynen M.A., Matouschek A., Fuxreiter M., Babu M.M. Intrinsically disordered segments affect protein half-life in the cell and during evolution. Cell Rep. 2014;8(6):1832–1844. doi: 10.1016/j.celrep.2014.07.055. 25. [DOI] [PMC free article] [PubMed] [Google Scholar]
- van Rooijen B.D., van Leijenhorst-Groener K.A., Claessens M.M., Subramaniam V. Tryptophan fluorescence reveals structural features of alpha-synuclein oligomers. J. Mol. Biol. 2009;394(5):826–833. doi: 10.1016/j.jmb.2009.10.021. [DOI] [PubMed] [Google Scholar]
- Vullo A., Bortolami O., Pollastri G., Tosatto S.C. Spritz: a server for the prediction of intrinsically disordered regions in protein sequences using kernel machines. Nucleic Acids Res. 2006;34(Web Server issue):W164–W168. doi: 10.1093/nar/gkl166. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wilson C.J., Choy W.Y., Karttunen M. AlphaFold2: a role for disordered protein/region prediction? Int. J. Mol. Sci. 2022;23(9):4591. doi: 10.3390/ijms23094591. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Xue B., Dunbrack R.L., Williams R.W., Dunker A.K., Uversky V.N. PONDR-FIT: a meta-predictor of intrinsically disordered amino acids. Biochim. Biophys. Acta. 2010;1804(4):996–1010. doi: 10.1016/j.bbapap.2010.01.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yan Z., Bai X., Yan C., Wu J., Li Z., Xie T., Peng W., Yin C., Li X., Scheres S.H.W., et al. Structure of the rabbit ryanodine receptor ryr1 at near-atomic resolution. Nature. 2015;517:50–55. doi: 10.1038/nature14063. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yang Jiaan. Comprehensive description of protein structures using protein folding shape code. Proteins. 2008;71(3):1497–1518. doi: 10.1002/prot.21932. [DOI] [PubMed] [Google Scholar]
- Yang Z.R., others RONN: the bio-basis function neural network technique applied to the detection of natively disordered regions in proteins. Bioinformatics. 2005;21(6):3369–3376. doi: 10.1093/bioinformatics/bti534. [DOI] [PubMed] [Google Scholar]
- Yang J., Yan R., Roy A., Xu D., Poisson J., Zhang Y. The I-TASSER suite: protein structure and function prediction. Nat. Methods. 2015;12:7–8. doi: 10.1038/nmeth.3213. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yang Jiaan, Wen Xiang Cheng, Xiao Fei Zhao, Wu Gang, Si Tong Sheng, Qi Yue Hu, Zhang Peng. Comprehensive folding variations for protein folding. Proteins. 2022;90(11):1851–1872. doi: 10.1002/prot.26381. [DOI] [PubMed] [Google Scholar]
- Yang J., Cheng Wx, Wu G., et al. Prediction of folding patterns for intrinsic disordered protein. Sci. Rep. 2023;13 doi: 10.1038/s41598-023-45969-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yang J., Cheng W.X., Zhang Gang Wu, Sheng Si Tong, Yang Junjie, Zhao Suwen, Hu Qiyue, Ji Wenxin, Shi Qiong. Conformational ensembles for protein structure prediction. Sci. Rep. 2025;15:8513. doi: 10.1038/s41598-024-84066-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhao B., Kurgan L. Surveying over 100 predictors of intrinsic disorder in proteins. Expert Rev. Proteomics. 2021;18(12):1019–1029. doi: 10.1080/14789450.2021.2018304. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
Data will be made available on request.











