Table S1.
Ten-fold cross validation model performance
| Model | CA | SE | SP | AUC | MCC |
|---|---|---|---|---|---|
| NB-ExtFP | 0.8272 | 0.8876 | 0.7669 | 0.8631 | 0.6593 |
| KNN-ExtFP | 0.8708 | 0.9298 | 0.8118 | 0.9245 | 0.7468 |
| RF-ExtFP | 0.8820 | 0.8848 | 0.8792 | 0.9539 | 0.7641 |
| SVM-ExtFP | 0.8833 | 0.9298 | 0.8371 | 0.9423 | 0.7702 |
| NB-MACCSFP | 0.7850 | 0.8006 | 0.7697 | 0.8533 | 0.5705 |
| KNN-MACCSFP | 0.8806 | 0.9045 | 0.8567 | 0.9287 | 0.7621 |
| RF-MACCSFP | 0.8903 | 0.9185 | 0.8624 | 0.9541 | 0.7821 |
| SVM-MACCSFP | 0.8679 | 0.9438 | 0.7921 | 0.9212 | 0.7446 |
| NB-PubChemFP | 0.7992 | 0.8455 | 0.7528 | 0.8531 | 0.6009 |
| KNN-PubChemFP | 0.8736 | 0.9129 | 0.8343 | 0.9173 | 0.7495 |
| RF-PubChemFP | 0.8736 | 0.9213 | 0.8258 | 0.9480 | 0.7506 |
| SVM-PubChemFP | 0.8524 | 0.9354 | 0.7697 | 0.9133 | 0.7149 |
| NB-AP2D | 0.7681 | 0.8399 | 0.6966 | 0.8112 | 0.5421 |
| KNN-AP2D | 0.8427 | 0.8624 | 0.8230 | 0.8964 | 0.6859 |
| RF-AP2D | 0.8287 | 0.9213 | 0.7360 | 0.9031 | 0.6689 |
| SVM-AP2D | 0.8131 | 0.8876 | 0.7388 | 0.8473 | 0.6335 |
Abbreviations: NB, Naïve Bayesian; KNN, k-nearest neighbor; RF, random forest; SVM, support vector machine; Ext, extended; AP2D, 2D atom pairs; FP, fingerprints; SE, sensitivity; SP, specificity; AUC, area under the receiver operating characteristic curve; MCC, Matthews correlation coefficient; CA, classification accuracy.