Table 4. Performance Metrics for the External Validation of the Three ML Algorithms Trained on the Original and Extended Dataseta.
GNB |
KNB |
GDF-SVM |
||||||||
---|---|---|---|---|---|---|---|---|---|---|
training set | test set | TNR | TPR | r2 | TNR | TPR | r2 | TNR | TPR | r2 |
original | small | 0.68 | 0.38 | 0.50 | 0.59 | 0.42 | 0.45 | 0.69 | 0.40 | 0.11 |
large | 0.67 | 0.41 | 0.64 | 0.58 | 0.44 | 0.61 | 0.68 | 0.42 | 0.53 | |
extended | small | 0.77 | 0.31 | 0.56 | 0.65 | 0.37 | 0.71 | 0.68 | 0.37 | 0.18 |
TNR (states A) and TPR (states I) are extracted from 96 000 and 102 000 MD frames for each state in the small and large test sets, respectively. The r2 values obtained from linear regression analyses are also shown for each set of predictions, with the best values highlighted in red.