Table 2.
Test dataset (prediction target) | Method | MCC |
ACC |
SPEC | SENS | AUC | ||
---|---|---|---|---|---|---|---|---|
value | sig | value | sig | |||||
DB_CRYS (propensity of the diffraction-quality crystallization success) | ParCrys | 0.108 | + | 47.5 | + | 31.8 | 78.6 | 0.561 |
OBScore | 0.124 | + | 47.8 | + | 31.4 | 80.3 | 0.572 | |
BLAST-based | 0.188 | + | 65.6 | + | 79.5 | 38.0 | N/A | |
CRYSTALP2 | 0.195 | + | 55.3 | + | 45.7 | 74.4 | 0.648 | |
MetaPPCP | 0.195 | + | 59.9 | + | 59.0 | 61.7 | 0.620 | |
SVMCrys | 0.213 | + | 56.3 | + | 46.7 | 75.2 | N/A | |
XtalPred | 0.278 | + | 63.9 | + | 62.3 | 67.0 | 0.683 | |
SVM_POLY | 0.398 | + | 74.6 | + | 88.1 | 47.9 | 0.779 | |
max-based | 0.467 | + | 76.1 | + | 81.6 | 65.3 | 0.793 | |
PPCpred | 0.471 | 76.8 | 84.8 | 61.2 | 0.789 | |||
DB_MF (propensity of the material production failure) | BLAST-based | 0.014 | + | 55.4 | + | 35.3 | 66.0 | N/A |
max-based | 0.339 | + | 71.6 | + | 45.4 | 85.5 | 0.621 | |
SVM_RBF | 0.423 | + | 74.6 | + | 56.1 | 84.5 | 0.791 | |
PPCpred | 0.462 | 75.0 | 69.2 | 78.0 | 0.755 | |||
DB_PF (propensity of the purification failure) | BLAST-based | 0.102 | + | 60.0 | + | 43.2 | 67.4 | N/A |
max-based | 0.246 | + | 70.8 | + | 34.4 | 86.9 | 0.609 | |
SVM_POLY | 0.290 | + | 73.2 | – | 30.8 | 91.8 | 0.741 | |
PPCpred | 0.324 | 72.0 | 50.1 | 81.6 | 0.697 | |||
DB_CF (propensity of the crystallization failure) | BLAST-based | 0.060 | + | 60.9 | + | 37.0 | 69.4 | N/A |
SVM_POLY | 0.346 | + | 77.0 | = | 40.1 | 90.0 | 0.814 | |
PPCpred | 0.457 | 76.6 | 70.8 | 78.7 | 0.811 | |||
max-based | 0.461 | – | 76.9 | – | 70.5 | 79.2 | 0.813 |
The proposed PPCpred is compared against results on the OBScore, XtalPred, ParCrys, CRYSTALP2, MetaPPCP and SVMCrys on the DB_CRYS dataset, and against the maximum-based aggregation method (max-based), the best performing SVM classifier (SVM_POLY or SVM_RBF), and the BLAST-based predictor on the four datasets. The methods are sorted in the ascending order based on their MCC scores, and the highest values for each quality index and dataset are shown in bold. The BLAST and SVMCrys provide only binary prediction and thus we could not compute their AUC. Results of tests of significance of the differences in MCC and ACC between PPCpred and the other methods are given in the ‘sig’ columns. The tests compare values over 100 bootstrapping repetitions. The ‘+’ and ‘−’ mean that PPCpred is statistically significantly better/worse with P<0.01, and ‘=’ means that results are not significantly different.