Table 4.
Enzyme Identification (EC L0) | Enzyme Classification (EC L1) | |||
---|---|---|---|---|
Classifiers |
Ten-fold Accuracy* |
Testing Accuracy |
Ten-fold Accuracy* |
Testing Accuracy |
DS | 66.39 | 66.39 | 39.12 | 39.31 |
NBC | 92.60 | 92.46 | 96.11 | 95.88 |
KNN | 94.38 | 94.38 | 97.80 | 97.56 |
SVM | 95.69 | 94.86 | 98.34 | 98.39 |
RFC | 98.42 | 94.60 | 97.50 | 97.28 |
*Ten-fold cross validation accuracy. At EC L0 and EC L1 using ML classifiers, Decision Stump (DS), Naïve Bayes Classifier (NBC), K-Nearest Neighbor (KNN), Support Vector Machine (SVM), and Random Forest Classifier (RFC). At EC L0, train and test sets contain 154,592 and 38,648 sequences respectively, whereas EC L1 contain train and test sets of 50,139 and 12,535, respectively.