. 2018 Jan 3;18:580. doi: 10.1186/s12859-017-1995-z

Table 4.

Predictive performance on the TESTsmall dataset

	Predictors	Material production (MF)			Purification (PF)			Crystallization (CF)			Diffraction-quality crystallization (CR)
	Predictors	Average	±std	p-value	Average	±std	p-value	Average	±std	p-value	Average	±std	p-value
AUC	fDETECT	0.68	±0.11		0.64	±0.11		0.55	±0.11		0.64	±0.07
	PPCpred	0.64	±0.11	0.004	0.67	±0.12	<0.001	0.60	±0.12	<0.001	0.66	±0.08	0.054
	Crysalis	0.67	±0.11	0.392	0.59	±0.11	<0.001	0.56	±0.10	0.366	0.60	±0.08	<0.001
	PredPPCrys	0.62	±0.11	<0.001	0.59	±0.11	0.002	0.48	±0.12	<0.001	0.62	±0.08	0.001
	XtalPRed	NA			NA			NA			0.59	±0.09	<0.001
	XtalPred-RF	NA			NA			NA			0.65	±0.08	0.392
	TragetCrys	NA			NA			NA			0.64	±0.07	0.734
	CRYSTALP2	NA			NA			NA			0.63	±0.08	0.419
MCC	fDETECT	0.21	±0.17		0.15	±0.19		0.16	±0.18		0.20	±0.13
	PPCpred	0.19	±0.20	0.039	0.26	±0.21	<0.001	0.20	±0.18	0.752	0.23	±0.15	0.253
	Crysalis	0.20	±0.18	0.115	0.11	±0.19	0.018	0.07	±0.16	<0.001	0.15	±0.15	0.005
	PredPPCrys	0.12	±0.17	<0.001	0.06	±0.18	<0.001	0.00	±0.19	<0.001	0.19	±0.15	0.314
	XtalPRed	NA			NA			NA			0.18	±0.18	0.580
	XtalPred-RF	NA			NA			NA			0.24	±0.14	0.039
	TragetCrys	NA			NA			NA			0.21	±0.12	0.889
	CRYSTALP2	NA			NA			NA			0.23	±0.14	0.297
Accuracy	fDETECT	78.0	±4.9		72.7	±6.2		68.5	±6.9		59.9	±6.7
	PPCpred	77.4	±5.4	0.040	76.2	±6.7	<0.001	69.9	±6.6	0.770	61.4	±7.6	0.211
	Crysalis	77.7	±4.9	0.104	71.4	±6.0	0.018	65.1	±6.1	<0.001	57.5	±7.3	0.004
	PredPPCrys	75.6	±4.8	<0.001	70.0	±6.0	<0.001	61.9	±7.5	<0.001	59.3	±7.5	0.348
	XtalPRed	NA			NA			NA			58.7	±8.8	0.315
	XtalPred-RF	NA			NA			NA			62.2	±7.2	0.117
	TragetCrys	NA			NA			NA			60.3	±6.2	1.000
	CRYSTALP2	NA			NA			NA			61.4	±6.7	0.256

We report average AUC, MCC and accuracy and their corresponding standard deviations over 100 bootstrap tests (each test is based on 25% of randomly chosen proteins). Statistical significance of differences between fDETECT and each other method was measured with paired t-test; the measured values are normal, which we verified based on the Anderson-Darling test at 0.05 significance. The best results that are not significantly different with each other (p-value >0.05) for each outcome are given in bold font. NA means that a given method does not provide this type of prediction