Table 2.
The performance of different kinds of feature descriptors with various machine learning algorithms based on main dataset using 5-fold cross-validation.
| Machine learning algorithm | Feature descriptor | |||||
|---|---|---|---|---|---|---|
| PP | EI | BP + NBP + PP | BP + NBP + EI | PP + EI | BP + NBP + PP + EI | |
| Accuracy (%) | ||||||
| SVM-SMO | 83.2 | 85.7 | 85.6 | 87.3 | 87.3 | 89.6 |
| Simple logistic regression | 81.8 | 84.2 | 84.2 | 87.0 | 85.7 | 88.3 |
| Random forest | 81.3 | 84.3 | 83.5 | 86.7 | 85.9 | 88.1 |
| Naive bayes | 78.6 | 77.2 | 82.8 | 82.3 | 82.6 | 84.3 |
| Decision tree | 80.2 | 82.5 | 82.6 | 84.4 | 84.1 | 86.2 |
|
| ||||||
| Sensitivity (%) | ||||||
| SVM-SMO | 82.4 | 84.9 | 84.4 | 86.5 | 85.8 | 88.4 |
| Simple logistic regression | 80.7 | 83.1 | 82.3 | 84.4 | 85.6 | 86.7 |
| Random forest | 81.1 | 83.6 | 82.8 | 86.0 | 85.3 | 86.2 |
| Naive bayes | 76.9 | 76.1 | 79.4 | 80.8 | 81.1 | 82.6 |
| Decision tree | 78.6 | 80.4 | 81.7 | 82.7 | 82.5 | 84.7 |
|
| ||||||
| Specificity (%) | ||||||
| SVM-SMO | 84.6 | 86.3 | 86.7 | 88.2 | 88.6 | 90.8 |
| Simple logistic regression | 82.9 | 85.5 | 86.0 | 88.8 | 85.9 | 90.2 |
| Random forest | 81.6 | 85.2 | 84.1 | 87.5 | 86.3 | 90.0 |
| Naive bayes | 80.2 | 78.5 | 85.6 | 83.8 | 84.7 | 86.0 |
| Decision tree | 81.8 | 84.7 | 83.5 | 86.2 | 85.7 | 87.7 |
|
| ||||||
| Matthew correlation coefficient | ||||||
| SVM-SMO | 0.55 | 0.58 | 0.62 | 0.66 | 0.66 | 0.67 |
| Simple logistic regression | 0.56 | 0.55 | 0.64 | 0.62 | 0.64 | 0.66 |
| Random forest | 0.55 | 0.56 | 0.60 | 0.62 | 0.63 | 0.66 |
| Naive bayes | 0.52 | 0.49 | 0.56 | 0.53 | 0.54 | 0.59 |
| Decision tree | 0.53 | 0.55 | 0.61 | 0.63 | 0.62 | 0.64 |
|
| ||||||
| AUC | ||||||
| SVM-SMO | 0.83 | 0.86 | 0.86 | 0.88 | 0.87 | 0.90 |
| Simple logistic regression | 0.83 | 0.84 | 0.85 | 0.86 | 0.85 | 0.88 |
| Random forest | 0.81 | 0.84 | 0.84 | 0.86 | 0.85 | 0.87 |
| Naive bayes | 0.78 | 0.76 | 0.80 | 0.79 | 0.80 | 0.82 |
| Decision tree | 0.80 | 0.82 | 0.83 | 0.84 | 0.84 | 0.86 |
BP: binding propensity feature; NBP: nonbinding propensity feature; PP: physicochemical property feature; EI: evolutionary information feature.