Abstract
Recent research in predicting protein secondary structure populations (SSP) based on Nuclear Magnetic Resonance (NMR) chemical shifts has helped quantitatively characterise the structural conformational properties of intrinsically disordered proteins and regions (IDP/IDR). Different from protein secondary structure (SS) prediction, the SSP prediction assumes a dynamic assignment of secondary structures that seem correlate with disordered states. In this study, we designed a single-task deep learning framework to predict IDP/IDR and SSP respectively; and multitask deep learning frameworks to allow quantitative predictions of IDP/IDR evidenced by the simultaneously predicted SSP. According to independent test results, single-task deep learning models improve the prediction performance of shallow models for SSP and IDP/IDR. Also, the prediction performance was further improved for IDP/IDR prediction when SSP prediction was simultaneously predicted in multitask models. With p53 as a use case, we demonstrate how predicted SSP is used to explain the IDP/IDR predictions for each functional region.
Introduction
According to the sequence-structure-function paradigm1, protein function has been closely associated with a unique, well-defined three-dimensional structure. However, it is eluded by the discovery of intrinsically disordered proteins and regions (IDP/IDR). It is experimentally difficult to characterize IDP/IDR since they do not show stable electron densities in crystal structure analysis2. Various computational methods have been developed to predict IDP/IDR directly from amino acid (AA) sequences3-6. Among these methods, the S2D method7 and the s2D methods8 took a different perspective, characterising IDP/IDR in terms of their protein secondary structure populations (SSP) calculated from NMR chemical shifts. The former calculates the SSP directly from the six chemical shifts obtained from NMR experiments; while the latter trained a machine learning model to automatically predict SSP from amino acid sequences, which is further used to characterise IDP/IDR. Several other works also explore the relation between protein structure and NMR chemical shifts9-11.
Given the close correlation between SSP and IDP/IDR, we propose to predict the two properties directly from amino acid sequences using multitask learning. We first designed a single-task deep learning framework for predicting SSP and IDP/IDR, respectively, which are referred to as DeepS2P-P and DeepS2P-D. With DeepS2P-P, we characterised the secondary structure populations of ordered and disordered proteins and regions, from which we observed the quantitative correlation between protein SSP and IDP/IDR. Based on this observation, we then designed the multitask frameworks for predicting SSP and IDP/IDR simultaneously, with hard parameter sharing and cross-stitch-based soft parameter sharing; namely, multitask-D and Cross-stitch-D, respectively.
The contribution of this paper is summarised as follows. Firstly, it achieved the state-of-the-art performance for SSP prediction. Note that SSP prediction is different from protein secondary structure (SS) prediction because SSP models are trained based on the population labels ranging from 0 to 1, while SS prediction models, either generating binary outputs or probabilistic results, are trained on certain SS assignments represented as binary values. Secondly, it takes the (52D/s2D methods one step further to IDP/IDR prediction, filling the gap between IDP/IDR characterisation and IDP/IDR prediction. A detailed comparison between the (2D/s2D methods and our methods is demonstrated in Fig 1. Finally, for the first time, it automatically generates the quantitative correlation between two protein structural prediction tasks, e.g. SSP prediction and IDP/IDR prediction in this paper. This feature of Cross-stitch-D can be extended to the exploration of the quantitative correlation between any other pair of protein structural properties.
Materials and methods Datasets
Protein SSP prediction. The training dataset for SSP prediction was obtained from the s2D method8. It contains 2,671 proteins with 362,702 residues, among which 2,223 proteins were obtained from the BMRB database12 and the remaining 448 proteins were obtained from the PDB database13 in form of X-ray structures. For independent tests, we constructed a novel benchmark dataset, namely Bmr201 8, from the 12,018 entries of the latest BMRB database (downloaded in March 2018). The following filtering procedure was applied. Firstly, we kept entries that have the same experimental conditions described in the s2D method, i.e. with pH between 5.5 and 8 and with temperature between 10 and 42 C. Secondly, we only kept the entries that have the sample type labelled as 'solution' and removed any entry with amino acid 'X'. Thirdly, we extracted the annotated values for the six backbone chemical shifts, e.g. CA, CB, CO(C), N, HA, and HN, and removed entries that are lack of at least one of the six backbone chemical shifts. Finally, we removed entries that appear in the s2D training dataset. To obtain the SSP annotation for each entry, we used the (2D method to generate the populations for helix, strands and coils from the extracted chemical shifts. Since PSI-BLAST failed to find a matching hit for 1,009 of the 2,293 entries, we end up with a dataset of 1,284 BMR entries. The resulting Bmr2018 dataset represents the first independent test benchmark dataset for protein SSP prediction.
IDP/IDR prediction. The training dataset for IDP/IDR prediction was downloaded from the web server of Spine-D14. We performed 10-fold cross validation over the 4,229 proteins. For independent tests, we used the 117 targets in the Casp9 benchmark15 and the 94 targets in the Casp10 banchmark16.
Feature calculation. For both SSP and IDP/IDR prediction, we calculated the position-specific scoring matrix (PSSM)17 for each residue as the only source of information for prediction. Combined with the position and residue type, the feature vector of each residue is composed of 23 real values, where the first value represents the position of the residue in the protein sequence, the second value indicates the residue type ranging from -10.0 to 10.0, and the remaining 21 values from the PSSM.
Deep learning framework for single tasks
The single-task frameworks for predicting SSP and IDP/IDR are essentially the same except for the different numbers of units in the output layer and the use of different activation and loss functions. We refer to the general single-task framework as DeepS2P. It is designed based on the deep convolutional neural network (DCNN)18, a deep feed-forward neural network where individual neurons in hidden layers are only connected with a restricted set of neighbouring neurons in the previous layer.
The DeepS2P framework uses DCNN to sequentially label amino acid sequences, e.g. assigning a categorical or numeric value for each residue in the amino acid sequence. Similar to any DCNN-based models, the DeepS2P framework is composed of an input layer, convolutional layers (with max-pooling layers), fully-connected layers and an output layer. With convolutional layers and fully-connected layers designed the same with those in generic DCNN architectures, the input layer and the output layer are designed specifically for protein sequence modelling, which are described as follows,
The input layer I: We used a sliding window of size L to extract the neighbouring residues of the target residue rt (where t represents the position of the residue in the sequence). Therefore, the feature vectors of the sequence segment rt-w... rt... rt+w of size L = 2 * w + 1 were combined together to form an input vector v/ of size L x 23, where 23 is the number of real values in the constructed feature vector. Correspondingly, the input layer was designed to have L x 23 neurons distributed in two dimensions.
The output layer O: For protein SSP prediction, this layer used three real-valued neurons representing the populations of secondary structure elements: Helix (H), Strands (E) and Coil (C), with values ranging from 0 to 1. The outputs of H, E and C neurons were restricted so that they sum to 1. Accordingly, the sigmoid function was used as the activation function and the mean squared error (MSE) as the loss function. For IDP/IDR prediction, two binary neurons indicating the ordered/disordered states were added in the output layer. The cross entropy (CE) was used as the loss function and the softmax function as the activation function.
The hard-parameter-sharing multitask deep learning framework
Based on the observation of the correlation between protein SSP and IDP/IDR8, we propose to combine the prediction of these two into one multitask framework. Reviewing the design of DeepS2P-P and DeepS2P-D, the architectures of their input layers and convolutional layers are very much the same. Therefore, it is straightforward to hard share the weights in convolutional layers and split full-connected layers and output layers for task-specific weights.
Alternatively training was performed. Specifically, we trained the model with mini-batches acquired from the (2D dataset and the Spine-D dataset in an alternate manner. For each mini-batch, only one loss function from MSE and CE was used to optimise the model. As a result, the shared parameters in convolutional layers were updated in each mini-batch, while the task-specific weights were only updated in alternate mini-batches for respective tasks.
The cross-stitch multitask deep learning framework
To automatically learn the quantitative correlation between SSP and IDP/IDR, we further explore the application of the cross-stitch architecture19, a soft-parameter-sharing model where both tasks have their own neural network models. Between corresponding layers in the two models, a cross-stitch unit is added to linearly combine the outputs of the two hidden layers, producing inputs for their next hidden layers respectively. Here, hidden layers refers to both convolutional layers and fully-connected layers in DCNN and a layer-specific cross-stitch unit is added on top of each hidden layer. Fig. 2 (a) illustrates the change from hard-parameter-sharing to full soft-parameter-sharing while (b) demonstrates the architecture of the cross-stitch units.
According to Fig 2 (b), the cross-stitch unit between task A and task B for layer hi (where i indicate the i-th hidden layer) is composed of a matrix (formula (1)), indicating the linear relation between the contribution of the outputs and for the next layers hi+1. As a result, the inputs for layer hi+1 for task A and task B, e.g. and , can be respectively represented as shown in formula (2) and (3). We added a layer-specific cross-stitch unit Mi after each hidden layer and the fully-connected layer, yielding four cross-stitch units altogether.
| () | 1 |
| () | 2 |
| () | 3 |
Results
Results for SSP prediction
In this section, we compare our proposed model DeepS2P-P to s2D and Linear Regression (LR) in SSP prediction. The performance for SSP prediction was evaluated using three measures including the Pearson correlation coefficient (R) (, the higher the better), the mean squared error (MSE) and the mean absolute error (MAE) (, the lower the better)8. Characterising IDP/IDR using predicted SSP provided a new avenue for quantitatively analysing IDP/IDR8. Among the three compared models, linear regression (LR) is a generic linear model for capturing the relationship between a scalar-dependent variable and one or more relevant variables. In contrast, the s2D method represents non-linear shallow models and the DeepS2P-P method represents non-linear deep learning models. A comparison of the three approaches illustrates how increasing model complexity benefits the prediction performance.
Table 1 shows their respective average performance(%)/the confidence intervals(%) for predicting the population of Helix (H), Strand (E) and Coil (C) on the s2D validation set, with a significance level of 5%. According to these results, DeepS2P-P improved the Pearson correlation coefficient for helix, strands and coils by 3.5, 3.6 and 3.2 points, respectively, compared to s2D predictions. The LR model performed significantly worse than the other two methods, indicating that non-linear models are more suited for modelling SSP prediction. Similar trends were observed for MSE and MAE, where the DeepS2P-P method achieved the lowest scores for these two measures among the three methods.
Table 1: SSP prediction results using LR, s2D and DeepS2P-P on the S2D validation set.
| LR | s2D | DeepS2P-P | |||||||
| Helix | Strands | Coil | Helix | Strands | Coil | Helix | Strands | Coil | |
| R | 33.7/5.2 | 37.9/4.7 | 43.1/4.7 | 81.7/1.4 | 77.0/3.9 | 71.0/3.9 | 85.2/0.8 | 80.6/1.4 | 74.2/2.7 |
| MSE | 10.7/0.6 | 5.1/0.4 | 6.7/0.4 | 3.8/0.4 | 2.4/0.2 | 4.1/0.2 | 3.2/0.4 | 2.2/0.2 | 3.6/0.2 |
| MAE | 27.8/0.8 | 19.0/0.8 | 21.8/0.9 | 14.0/0.8 | 11.3/0.6 | 15.8/0.6 | 11.9/0.8 | 10.1/0.8 | 14.3/0.6 |
To validate the performance of DeepS2P-P and the s2D method, we conducted an independent test by further applying DeepS2P-P and the s2D method to the constructed benchmark dataset Bmr2018. Table 2 shows the prediction performance^) of DeepS2P-P and the s2D method for protein SSP prediction on Bmr2018.
Table 2: Performance of the DeepS2P-P and the s2D method on the independent benchmark Bmr2018.
| s2D | DeepS2P-P | ||||
| Helix | Strands | Coils | Helix Strands | Coils | |
| R | 79.8 | 76.1 | 63.0 | 83.2 79.6 | 65.2 |
| MSE | 4.7 | 3.3 | 4.9 | 4.0 2.9 | 4.8 |
| MAE | 15.3 | 13.2 | 17.5 | 12.9 11.9 | 17.1 |
According to the results shown above, DeepS2P-P has improved the respective Pearson's coefficient of correlations by 3.4, 3.5, and 2.2 points for helix, strand and coil populations compared to those achieved by the s2D method. Correspondingly, DeepS2P-P decreased the MSE and MAE for helix, strand and coil populations as well. Its performance represents the current state-of-the-art performance for this task.
IDR/IDP independent test evaluations
For IDP/IDR prediction, the performance of the proposed DeepS2P-D, multitask-D and Cross-stitch-D methods in terms of true positive (TP), false positive (FP), true negative (TN), false negative (FN), balanced accuracy (BACC), Matthew's correlation coefficient (MCC) and the area under the ROC curve (AUC) were evaluated on the Casp9 and Casp1020 targets. Performance was compared to benchmark model that includes four models with the best performance in Casp9, including PrDOS221, DisoPred3C22, MultiCom6 and Spine-D14, and the four models with the best performance in Casp10, including Prdos-CNF21, DISOPRED35, Biomine-dr-mixed and Bio-mine-dr-pdb-c23. In addition, two recent deep learning models, e.g. DeepCNF-D6 and AUCPreD24, were included in the evaluation.
According to the evaluation on Casp9 in Table 3, DeepS2P-D improved the AUC score by 0.3 points compared to PrDOS2 and DeepCNF-D. Cross-stitch-D and multitask-D further improved the AUC score by 0.5 and 1.2 points respectively. In terms of BACC, PrDOS2 still performed best among all models with a score of 75.4. In comparison, the three models introduced in this study achieved BACC scores of 66.0, 67.7 and 68.2, respectively. As for MCC, multitask-D and Cross-stitch-D achieved the best and second-best performance, improving the MCC score by 1.9 and 2.8 points, respectively, as compared to the MCC score of the DeepCNF-D model. According to the prediction statistics based on TP, FP, TN and FN, PrDOS2 achieved the best sensitivity by correctly predicting 1,468 of 2,417 positive examples, while DisoPred3C achieved the best precision by correctly predicting 839 of 1,019 positive examples. In comparison, the performance of Cross-stitch-D and multitask-D are located some-where in between, with a tendency to make more positive but cautious predictions. Specifically, they correctly predicted more positive examples than DisoPred3C, and obtained a FP to TP ratio below 1:3.
Table 3: Performance comparison for IDP/IDR prediction in independent test evaluations.
| TP | FP | TN | FN | BACC | MCC | AUC | |
| Casp9 (117 targets) | |||||||
| PrDOS215 | 1,468 | 2,340 | 21,318 | 949 | 75.4 | 41.8 | 85.5 |
| DisoPred3C15 | 839 | 180 | 23,478 | 1,578 | 67.0 | 50.8 | 85.4 |
| MultiCom15 | 953 | 934 | 21,695 | 1,310 | 69.0 | 41.3 | 85.3 |
| Spine-D15 | 1,399 | 2,774 | 20,884 | 1,018 | 73.1 | 36.5 | 83.2 |
| DeepCNF-D6 | - | - | - | - | 75.2 | 48.6 | 85.5 |
| AUCPreD * | 1,010 | 538 | 23,118 | 1,417 | 69.7 | 48.4 | 85.0 |
| DeepS2P-D | 796 | 202 | 23,456 | 1,621 | 66.0 | 48.5 | 85.8 |
| Cross-stitch-D | 883 | 248 | 23,410 | 1,534 | 67.7 | 50.5 | 86.0 |
| multitask-D | 905 | 244 | 23,414 | 1,512 | 68.2 | 51.4 | 86.7 |
| Casp10 (94 targets) | |||||||
| Prdos-CNF16 | 657 | 287 | 22,401 | 845 | 71.2 | 52.9 | 90.7 |
| DISOPRED316 | 607 | 201 | 22,487 | 895 | 69.8 | 53.1 | 89.7 |
| Biomine-dr-mixed16 | 628 | 368 | 22,320 | 874 | 70.1 | 48.8 | 89.0 |
| Biomine-dr-pdb-c16 | 579 | 290 | 22,398 | 923 | 68.6 | 48.3 | 88.6 |
| DeepCNF-D6 | - | - | - | - | 76.4 | 47.4 | 89.8 |
| AUCPreD * | 673 | 485 | 22,203 | 829 | 71.3 | 48.2 | 88.0 |
| DeepS2P-D | 561 | 171 | 22,517 | 941 | 68.3 | 51.6 | 89.5 |
| Cross-stitch-D | 613 | 179 | 22,509 | 889 | 70.0 | 54.3 | 89.8 |
| multitask-D | 603 | 178 | 22,510 | 899 | 69.7 | 53.7 | 90.2 |
When applied to C asp 10 proteins, the prediction performance of most models improved, keeping the FP to TP ratio under 1:2. Among all methods, Cross-stitch-D and multitask-D performed second and thirst best in term of AUC with scores of 89.8 and 90.2, and the best and second best MCC, with scores of 54.3 and 53.7, respectively. Prdos-CNF achieved the best AUC score of 90.7 and DeepCNF-D achieved the best BACC with a score of 76.4. In summary, when applied to Casp10 proteins, generative models such as Prdos-CNF and deep learning models including DeepCNF-D, Cross-stitch-D and multitask-D achieved a superior performance compared to the other models.
Correlation between SSP and IDP/IDR prediction
Besides simultaneously predicting IDP/IDR and SSP, the Cross-stitch-D model also automatically learned the linear correlation between each of the corresponding layers for the two tasks. The cross-stitch units M in Fig. 2 and formula (1) were populated with real-valued correlations during the learning process, which are normalised and illustrated as a heapmap in Fig. 3, where darker blue indicates stronger dependence. The heatmap in Fig. 3 is divided into four areas: cross_stitch_unit(AA), cross_stitch_unit(BA), cross_stitch_unit(AB) and cross_stitch_unit(BB), corresponding respectively to values of αAA, αBA, αAB and αBB in cross-stitch units. Here, task A represents IDP/IDR prediction and task B represents SSP prediction.
Discussion
According to the correlation heatmap, αAA and αBB are darker than αBA and αAB, with maximum weights achieved at layer conv2, indicating that both tasks rely mainly on the outputs of its own previous layer. According to weights in αAB, SSP prediction has the most support from IDP/IDR prediction at layer convl, showing that features extracted for short chains in IDP/IDR prediction can also be reused in SSP prediction. In comparison, the maximum weights in αBA was achieved at layer conv3, indicating that features for longer chains in SSP prediction is better reused in IDP/IDR prediction.
To demonstrate how to explain IDP/IDR prediction using the simultaneously predicted SSP, we plotted the predicted results of DeepS2P-D, multitask-D, s2D8, PSIPRED25, PrDOS221 and DISOPRED35 for target T0520 from Casp9 in Fig. 4.
Fig. 4 (a) shows the prediction results from DeepS2P-D and multitask-D, and IDR labels in Casp9 are indicated as 'D' on the top axis. Both DeepS2P-D and multitask-D predicted the first short IDR (residues 1-2) with one false positive prediction at residue 3 and failed to predict the second short IDR (residues 23-26). The third long IDR (residues 174-189) was predicted by multitask-D with full accuracy, while DeepS2P-D predicted only 12 of the 16 disordered residues. This difference can be evidenced by the additional support obtained from the simultaneously predicted higher coil populations for residues 172-189 (indicated by 'grey' bars), which is only available in the multitask-D model. This, altogether, shows the benefits of using a multitask frame-work over a single-task framework.
In Fig. 4 (b), results from multiple other methods are plotted, including the secondary structure predicted by PSIPRED25 (indicated as 'H', 'E' and 'C' on the top axis), the IDR predicted by DISOPRED3 and PrDOS2 (indicated by respect 'red' and 'black' lines) and the SSP predicted by the s2D method (with H, E and C populations indicated by 'blue', 'green' and 'grey' bars respectively). The SSP prediction results of s2D, multitask-D and PSIPRED generally agree with each other. In IDP/IDR prediction, DISOPRED3 missed the first short IDR and the first two residues in the third long IDR while PrDOS2 missed the first three residues in the third long IDR.
We further applied DeepS2P-D and multitask-D to protein p53 from the DisProt 7.0 database26, demonstrating, how simultaneously predicted SSP can be used to qualitatively explain the predicted IDP/IDR states. The prediction results are plotted in Fig. 5.
The p53 protein is composed of three functional regions, including the N-terminal region, the core DNA-binding region and the C-terminal region27. The N-terminal region is further divided into a transaction-activation domain (TAD, residues 1-63) and a proline-rich area (residues 64-93)27. The C-terminal region is further divided into a tetramerization domain (residues 320-356) and a regulatory domain (residues 363-393)28. multitask-D predicted four disordered regions, where region 1 (residues 1-15) and region 2 (residues 56-92) are located in the N-terminal region, and region 3 (residues 287-322) and region 4 (residues 359-393) in the C-terminal region.
According to27, a) the whole N-terminal region p53(1-93) is disordered and b) residues 21-25 form a residual a-helical segment, which is consistent with the known propensity of residues 18-25 to form an a-helix when binding to MDM229. The first two predicted IDR by multitask-D do not cover the whole N-terminal region, but the predicted SSP in the gap region 18-23 reveals a larger helix population, which is consistent with b). According to the correlation between IDP/IDR and SSP, a higher helix population may explain the predicted structure states in this region. Another observation in the N-terminal region is that the second IDR (residues 56-92) predicted by multitask-D corresponds to the proline-rich domain (residue 64-93). According to30, no significant chemical shifts were observed in this proline-rich domain, but resonances undergoing significant chemical shift changes were observed in the segment (residues 18-57), which corresponds to the structured region (residues 16-55) that was predicted by multitask-D. These corresponding regions suggest that segments with significant chemical shift changes are less likely to be predicted as an IDR, which in turn explains the predicted structure states in residues 16-55.
The third predicted IDR (residue 287-322) was validated by the crystal structure of the core domain of p53 introduced in31. According to this crystal structure of p53, residues 278-289 form a a-helix segment H2 for which the multitask-D model predicted an increase of helix population from 0.137 to 0.668 followed by a decrease to 0.471. With the decrease of helix population and the increase of the coil population, the H2 segment ends and, according to31, residues up to Thr-312 are disordered. This disordered region, ranging from the end of H2 at residue 289 to residue Thr-312, overlaps with the third predicted IDR (residues 287-322). Finally, the whole C-terminal region of p53 was annotated as a disordered region in DisProt 7.0. However, the multitask-D model predicted the regulatory domain (residues 363-393) to be disordered and the tetramerization domain (residues 320-356) to be populated with strands and helixes. This observation is consistent with the results in27, showing that the tetramerization domain of p53 adopts a well-defined conformation and is a folded domain. Our predictions validated these results.
Conclusion
In this study, we simultaneously predicted IDP/IDR and SSP by exploring the mutual correlation between these two tasks, using multitask deep learning neural networks. The cross-validation and independent test results demonstrate that the deep learning model DeepS2P-P out-performs the s2D method for predicting protein SSP and that the representations learned for SSP and IDP/IDR prediction are mutually supportive. With the multitask framework, it is possible to explain the IDP/IDR predictions for proteins such as p53 using the simultaneously predicted SSP.
Despite the improved performance in both IDP/IDR and SSP prediction using multitask deep learning frameworks, the frameworks presented here can be extended in several ways. First, additional protein features can be incorporated to improve the prediction performance. Other features that have been proved useful for IDP/IDR prediction include physicochemical properties, structural features, and evolution-based features. Second, the multitask framework can be modified to automatically learn non-linear correlations among multiple tasks. The current cross-stitch framework only explores the linear relations, which is a relatively naive assumption of the correlation among different protein sequence-based predictions. Third, other related tasks can be added to the multitask framework, including protein-ATP site prediction, protein-nucleotide binding residue prediction, phosphorylation site prediction, contact map prediction, and protein fold pattern.
Figures & Table
Figure 1: Relation with the (2D method and the s2D method.
Figure 2: Framework for a cross-stitch multitask deep convolutional neural network.

Figure 3: Cross stitch units indicating a linear correlation between IDP/IDR prediction (task A) and secondary structure population prediction (task B).

Figure 4: Prediction results for target T0520 in Casp9 by DeepS2P-D, multitask-D, PSIPRED, DISOPRED, PrDOS2 and s2D.
Figure 5: Prediction results for p53 by DeepS2P-D and multitask-D.
References
- 1.Wright Peter E, Dyson H Jane. Intrinsically unstructured proteins: re-assessing the protein structure-function paradigm. Journal of molecular biology. 1999;293(2):321–331. doi: 10.1006/jmbi.1999.3110. [DOI] [PubMed] [Google Scholar]
- 2.Dyson H Jane, Wright Peter E. Intrinsically unstructured proteins and their functions. Nature reviews Molecular cell biology. 2005;6(3):197–208. doi: 10.1038/nrm1589. [DOI] [PubMed] [Google Scholar]
- 3.Hanson Jack, Paliwal Kuldip, Zhou Yaoqi. Accurate single-sequence prediction of protein intrinsic disorder by an ensemble of deep recurrent and convolutional architectures. Journal of chemical information and modeling. 2018;58(11):2369–2376. doi: 10.1021/acs.jcim.8b00636. [DOI] [PubMed] [Google Scholar]
- 4.Hanson Jack, Yang Yuedong, Paliwal Kuldip, Zhou Yaoqi. Improving protein disorder prediction by deep bidirectional long short-term memory recurrent neural networks. Bioinformatics. 2017;33(5):685–692. doi: 10.1093/bioinformatics/btw678. [DOI] [PubMed] [Google Scholar]
- 5.Jones David T, Domenico Cozzetto. Disopred3: precise disordered region predictions with annotated protein-binding activity. Bioinformatics. 2015;31(6):857–863. doi: 10.1093/bioinformatics/btu744. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Wang Sheng, Weng Shunyan, Ma Jianzhu, Tang Qingming. Deepcnf-d: predicting protein order/disorder regions by weighted deep convolutional neural fields. International journal of molecular sciences. 2015;16(8):17315–17330. doi: 10.3390/ijms160817315. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Camilloni Carlo, Simone Alfonso De, Vranken Wim F, Vendruscolo Michele. Determination of secondary structure populations in disordered states of proteins using nuclear magnetic resonance chemical shifts. Biochemistry. 2012;51(11):2224–2231. doi: 10.1021/bi3001825. [DOI] [PubMed] [Google Scholar]
- 8.Sormanni Pietro, Camilloni Carlo, Fariselli Piero, Vendruscolo Michele. The s2d method: simultaneous sequence-based prediction of the statistical populations of ordered and disordered regions in proteins. Journal of molecular biology. 2015;427(4):982–996. doi: 10.1016/j.jmb.2014.12.007. [DOI] [PubMed] [Google Scholar]
- 9.Lin Hao, Ding Chen, Song Qiang, Yang Ping, Ding Hui, Deng Ke-Jun, Chen Wei. The prediction of protein structural class using averaged chemical shifts. Journal of Biomolecular Structure and Dynamics. 2012;29(6):1147–1153. doi: 10.1080/07391102.2011.672628. [DOI] [PubMed] [Google Scholar]
- 10.Wishart David S, Sykes Brian D, Richards Frederic M. Relationship between nuclear magnetic resonance chemical shift and protein secondary structure. Journal of molecular biology. 1991;222(2):311–333. doi: 10.1016/0022-2836(91)90214-q. [DOI] [PubMed] [Google Scholar]
- 11.Wishart David S, Sykes Brian D, Richards Fredric M. The chemical shift index: a fast and simple method for the assignment of protein secondary structure through nmr spectroscopy. Biochemistry. 1992;31(6):1647–1651. doi: 10.1021/bi00121a010. [DOI] [PubMed] [Google Scholar]
- 12.Ulrich Eldon L, Akutsu Hideo, Doreleijers Jurgen F, Harano Yoko, Ioannidis Yannis E, Lin Jundong, Livny Miron, Mading Steve, Maziuk Dimitri, Miller Zachary, et al. Biomagresbank. Nucleic acids research. 2007;36(suppl_1):D402–D408. doi: 10.1093/nar/gkm957. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Bernstein Frances C, Koetzle Thomas F, Williams Graheme JB, Meyer Edgar F, Jr, Brice Michael D, Rodgers John R, Kennard Olga, Shimanouchi Takehiko, Tasumi Mitsuo. The protein data bank: A computer-based archival file for macromolecular structures. European journal of biochemistry. 1977;80(2):319–324. doi: 10.1111/j.1432-1033.1977.tb11885.x. [DOI] [PubMed] [Google Scholar]
- 14.Zhang Tuo, Faraggi Eshel, Xue Bin, Dunker A Keith, Uversky Vladimir N, Zhou Yaoqi. Spine-d: accurate prediction of short and long disordered regions by a single neural-network based method. Journal ofBiomolecular Structure and Dynamics. 2012;29(4):799–813. doi: 10.1080/073911012010525022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Monastyrskyy Bohdan, Fidelis Krzysztof, Moult John, Tramontano Anna, Kryshtafovych Andriy. Evaluation of disorder predictions in casp9. Proteins: Structure, Function, and Bioinformatics. 2011;79(S10):107–118. doi: 10.1002/prot.23161. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Monastyrskyy Bohdan, Kryshtafovych Andriy, Moult John, Tramontano Anna, Fidelis Krzysztof. Assessment of protein disorder region predictions in casp10. Proteins: Structure, Function, and Bioinformatics. 2014;82:127–137. doi: 10.1002/prot.24391. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Altschul Stephen F, Madden Thomas L, Schaffer Alejandro A, Zhang Jinghui, Zhang Zheng, Miller Webb, Lipman David J. Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic acids research. 1997;25(17):3389–3402. doi: 10.1093/nar/25.17.3389. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.LeCun Yann, Bengio Yoshua, et al. Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks. 1995;3361(10) 1995. [Google Scholar]
- 19.Misra Ishan, Shrivastava Abhinav, Gupta Abhinav, Hebert Martial. Cross-stitch networks for multi-task learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016. pp. 3994–4003.
- 20.Piovesan Damiano, Tabaro Francesco, Paladin Lisanna, Necci Marco, Micetic Ivan, Camilloni Carlo, Davey Norman, Dosztanyi Zsuzsanna, Meszaros Balint, Monzon Alexander M, et al. Mobidb 3.0: more annotations for intrinsic disorder, conformational diversity and interactions in proteins. Nucleic acids research. 2018;46(D1):D471–D476. doi: 10.1093/nar/gkx1071. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Ishida Takashi, Kinoshita Kengo. Prdos: prediction of disordered protein regions from amino acid sequence. Nucleic acids research. 2007;35(suppl_2):W460–W464. doi: 10.1093/nar/gkm363. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Wang Sheng, Peng Jian, Ma Jianzhu, Xu Jinbo. Protein secondary structure prediction using deep convolu-tional neural fields. Scientific reports. 2016;6(1):1–11. doi: 10.1038/srep18962. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Mizianty Marcin J, Peng Zhenling, Kurgan Lukasz. Mfdp2: Accurate predictor of disorder in proteins by fusion of disorder probabilities, content and profiles. Intrinsically Disordered Proteins. 2013;1(1):e24428. doi: 10.4161/idp.24428. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Wang Sheng, Ma Jianzhu, Xu Jinbo. Aucpred: proteome-level protein disorder prediction by auc-maximized deep convolutional neural fields. Bioinformatics. 2016;32(17):i672–i679. doi: 10.1093/bioinformatics/btw446. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Buchan Daniel WA, Minneci Federico, Nugent Tim CO, Bryson Kevin, Jones David T. Scalable web services for the psipred protein analysis workbench. Nucleic acids research. 2013;41(W1):W349–W357. doi: 10.1093/nar/gkt381. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Sickmeier Megan, Hamilton Justin A, LeGall Tanguy, Vacic Vladimir, Cortese Marc S, Tantos Agnes, Szabo Beata, Tompa Peter, Chen Jake, Uversky Vladimir N, et al. Disprot: the database of disordered proteins. Nucleic acids research. 2007;35(suppl_1):D786–D793. doi: 10.1093/nar/gkl893. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Wells Mark, Tidow Henning, Rutherford Trevor J, Markwick Phineus, Jensen Malene Ringkjobing, Mylonas Efstratios, Svergun Dmitri I, Blackledge Martin, Fersht Alan R. Structure of tumor suppressor p53 and its intrinsically disordered n-terminal transactivation domain. Proceedings ofthe National academy ofSciences. 2008;105(15):5762–5767. doi: 10.1073/pnas.0801353105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Kannan Srinivasaraghavan, Lane David P, Verma Chandra S. Long range recognition and selection in idps: the interactions of the c-terminus of p53. Scientific reports. 2016;6(1):1–13. doi: 10.1038/srep23750. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Kussie Paul H, Gorina Svetlana, Marechal Vincent, Elenbaas Brian, Moreau Jacque, Levine Arnold J, Pavletich Nikola P. Structure of the mdm2 oncoprotein bound to the p53 tumor suppressor transactivation domain. Science. 1996;274(5289):948–953. doi: 10.1126/science.274.5289.948. [DOI] [PubMed] [Google Scholar]
- 30.Rowell John P, Simpson Kathryn L, Stott Katherine, Watson Matthew, Thomas Jean O. Hmgb1-facilitated p53 dna binding occurs via hmg-box/p53 transactivation domain interaction, regulated by the acidic tail. Structure. 2012;20(12):2014–2024. doi: 10.1016/j.str.2012.09.004. [DOI] [PubMed] [Google Scholar]
- 31.Joerger Andreas C, Allen Mark D, Fersht Alan R. Crystal structure of a superstable mutant of human p53 core domain insights into the mechanism of rescuing oncogenic mutations. Journal ofBiological Chemistry. 2004;279(2):1291–1296. doi: 10.1074/jbc.M309732200. [DOI] [PubMed] [Google Scholar]



