. 2019 Nov 7;29(1):184–200. doi: 10.1002/pro.3756

Table 2.

Comparison of the per‐protein AUC values for the test proteins produced by the 12 disorder predictors, the oracle method that selects the predictor with the highest AUC, the selection based on the highest estimated AUC produced by DISOselect, and four different residue‐level consensus‐based disorder predictors that use logistic regression (LR) and support vector regression (SVR) models

Category	Predictor	Mean per‐protein AUC	Per‐protein AUC at the worst quartile of proteins	Significance of differences compared to DISOselect
Hypothetical method	Oracle	0.983	0.984	p‐value < .01 (significantly better)
Proposed model	DISOselect	0.974	0.971
Consensus models	Top2Predictor SVR	0.947	0.938	p‐value < .01 (significantly worse)
	Top2Predictor LR	0.947	0.936	p‐value < .01 (significantly worse)
	12Predictor SVR	0.942	0.929	p‐value < .01 (significantly worse)
	12Predictor LR	0.940	0.921	p‐value < .01 (significantly worse)
Individual predictors	SPOT‐disorder	0.940	0.927	p‐value < .01 (significantly worse)
	DISOPRED3	0.935	0.921	p‐value < .01 (significantly worse)
	ESpritz‐Xray	0.880	0.832	p‐value < .01 (significantly worse)
	ESpritz‐NMR	0.865	0.809	p‐value < .01 (significantly worse)
	VSL2B	0.864	0.816	p‐value < .01 (significantly worse)
	disEMBL‐465	0.853	0.768	p‐value < .01 (significantly worse)
	IUPred‐short	0.843	0.768	p‐value < .01 (significantly worse)
	disEMBL‐HL	0.816	0.719	p‐value < .01 (significantly worse)
	ESpritz‐DisProt	0.772	0.649	p‐value < .01 (significantly worse)
	JRONN	0.733	0.603	p‐value < .01 (significantly worse)
	IUPred‐long	0.718	0.584	p‐value < .01 (significantly worse)
	GlobPlot	0.646	0.537	p‐value < .01 (significantly worse)

Note: We compared the mean per‐protein AUCs computed over the test proteins and the AUCs for the worst (the least accurately predicted) quartile of the test proteins (i.e., the 25% point in Figure 6). Methods are sorted by their mean per‐protein AUCs. Significance of the differences in the per‐protein AUCs of the predictions selected by DISOselect and the predictions generated by the other methods (including the oracle) was assessed with the t test for normal measurements and the Wilcoxon test otherwise; normality was tested with the Anderson‐Darling test at .05 significance; we sampled 50% of proteins in the test data set 10 times at random and compared the corresponding 10 pairs of AUCs; the resulting p‐values are listed in the last column.