Table 2.
Validation on test sets | Hybrid Fingerprint | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
RF model without-upsampling dataset | RF model with-upsampling dataset | ||||||||||
TPR (%) | TNR (%) | PPV (%) | ACC (%) | MCC | TPR (%) | TNR (%) | PPV (%) | ACC (%) | MCC | ||
EC1 | CV-10 FOLD | 62.41 | 97.1 | 65.11 | 95.9 | 0.61 | 97.03 | 99.82 | 97.17 | 99.67 | 0.97 |
Splitting and Testing | 82.14 | 97.02 | 75.71 | 95.26 | 0.75 | 97.09 | 99.82 | 97.02 | 99.66 | 0.97 | |
Blind Set | 87.62 | 98.18 | 95.07 | 97.5 | 0.89 | 83.17 | 98.73 | 83.67 | 97.3 | 0.81 | |
EC2 | CV-10 FOLD | 55.65 | 93.39 | 56.96 | 89.27 | 0.50 | 86.67 | 98.52 | 86.26 | 97.33 | 0.85 |
Splitting and Testing | 66.94 | 94.23 | 64.75 | 90.34 | 0.6 | 85.36 | 98.44 | 85.83 | 97.17 | 0.84 | |
Blind Set | 81.76 | 94.36 | 81.75 | 91.02 | 0.76 | 83.09 | 96.12 | 80.95 | 92.86 | 0.77 | |
EC3 | CV-10 FOLD | 65.38 | 97.38 | 78.82 | 96.3 | 0.68 | 97.04 | 99.63 | 97 | 99.34 | 0.97 |
Splitting and Testing | 88.77 | 96.56 | 93.91 | 93.93 | 0.87 | 95.4 | 99.42 | 95.3 | 98.96 | 0.95 | |
Blind Set | 95 | 97.5 | 94.44 | 96.3 | 0.92 | 95 | 97.5 | 94.44 | 96.3 | 0.92 | |
EC4 | CV-10 FOLD | 59.21 | 88.86 | 69.42 | 86.27 | 0.53 | 91.86 | 98.84 | 91.47 | 97.96 | 0.9 |
Splitting and Testing | 70 | 76.98 | 48.91 | 73.15 | 0.34 | 88.83 | 98.48 | 89.06 | 97.26 | 0.87 | |
Blind Set | 78.57 | 82.5 | 80.55 | 83.33 | 0.63 | 80.83 | 91.22 | 75 | 86.54 | 0.67 | |
EC5* | CV-10 FOLD | 76.67 | 83.54 | 85 | 91.16 | 0.7 | 95.62 | 98.91 | 95.93 | 98.25 | 0.95 |
Splitting and Testing | 88.89 | 75 | 93.75 | 86.36 | 0.62 | 97.77 | 99.39 | 97.5 | 99 | 0.97 | |
EC6* | CV-10 FOLD | 95.74 | 93.77 | 92.61 | 95.16 | 0.89 | 97.78 | 99.44 | 97.79 | 99.11 | 0.97 |
Splitting and Testing | 95 | 95 | 90 | 92.86 | 0.85 | 98 | 99.46 | 97.78 | 99.11 | 0.97 |
*For EC5 and EC6 classes, the validation could not be performed on blind set due to less representation of molecules in these classes. The average accuracy of cross-validation, splitting and testing and the blind set was 98.61, 98.52 and 93.25%, respectively.
TPR = True Positive Rate or Sensitivity, TNR = True Negative Rate or Specificity, PPV = Positive Predictive Value or Precision, ACC = Accuracy, MCC = Matthews correlation coefficient.