Skip to main content
. 2019 Nov 7;29(1):184–200. doi: 10.1002/pro.3756

Table 2.

Comparison of the per‐protein AUC values for the test proteins produced by the 12 disorder predictors, the oracle method that selects the predictor with the highest AUC, the selection based on the highest estimated AUC produced by DISOselect, and four different residue‐level consensus‐based disorder predictors that use logistic regression (LR) and support vector regression (SVR) models

Category Predictor Mean per‐protein AUC Per‐protein AUC at the worst quartile of proteins Significance of differences compared to DISOselect
Hypothetical method Oracle 0.983 0.984 p‐value < .01 (significantly better)
Proposed model DISOselect 0.974 0.971
Consensus models Top2Predictor SVR 0.947 0.938 p‐value < .01 (significantly worse)
Top2Predictor LR 0.947 0.936 p‐value < .01 (significantly worse)
12Predictor SVR 0.942 0.929 p‐value < .01 (significantly worse)
12Predictor LR 0.940 0.921 p‐value < .01 (significantly worse)
Individual predictors SPOT‐disorder 0.940 0.927 p‐value < .01 (significantly worse)
DISOPRED3 0.935 0.921 p‐value < .01 (significantly worse)
ESpritz‐Xray 0.880 0.832 p‐value < .01 (significantly worse)
ESpritz‐NMR 0.865 0.809 p‐value < .01 (significantly worse)
VSL2B 0.864 0.816 p‐value < .01 (significantly worse)
disEMBL‐465 0.853 0.768 p‐value < .01 (significantly worse)
IUPred‐short 0.843 0.768 p‐value < .01 (significantly worse)
disEMBL‐HL 0.816 0.719 p‐value < .01 (significantly worse)
ESpritz‐DisProt 0.772 0.649 p‐value < .01 (significantly worse)
JRONN 0.733 0.603 p‐value < .01 (significantly worse)
IUPred‐long 0.718 0.584 p‐value < .01 (significantly worse)
GlobPlot 0.646 0.537 p‐value < .01 (significantly worse)

Note: We compared the mean per‐protein AUCs computed over the test proteins and the AUCs for the worst (the least accurately predicted) quartile of the test proteins (i.e., the 25% point in Figure 6). Methods are sorted by their mean per‐protein AUCs. Significance of the differences in the per‐protein AUCs of the predictions selected by DISOselect and the predictions generated by the other methods (including the oracle) was assessed with the t test for normal measurements and the Wilcoxon test otherwise; normality was tested with the Anderson‐Darling test at .05 significance; we sampled 50% of proteins in the test data set 10 times at random and compared the corresponding 10 pairs of AUCs; the resulting p‐values are listed in the last column.