Table 6. Dataset 4 (Date vs. Party hubs) predictions from classifiers trained using machine learning methods.
Approach | Best k | Accuracy | F1 Score | Precision | Recall | C.C. | AUC |
NB k-gram | 3 | 67.1 | 59.8 | .54 | .67 | .33 | .71 |
NB(k) | 3 | 65.1 | 58.0 | .53 | .64 | .29 | .65 |
Decision Tree | 1 | 53.5 | 55.4 | .50 | .62 | .08 | .53 |
SVM | 3 | 62.1 | 59.0 | .59 | .59 | .24 | .66 |
ANN | 2 | 66.2 | 55.5 | .70 | .46 | .30 | .69 |
Naive Bayes | 1 | 65.2 | 57.5 | .66 | .51 | .29 | .70 |
Domain-based | N/A | 59.1 | 30.2 | .62 | .20 | .14 | .57 |
Homology-based | N/A | 29.8 | 22.0 | .22 | .22 | −.43 | N/A |
HybSVM | N/A | 69.2 | 62.6 | .71 | .56 | .37 | .68 |
Accuracy, F-measure (F1 Score), precision, recall, correlation coefficient (C.C.), and area under the receiver operating characteristic curve (AUC) of classification for the multi-interface versus singlish-interface dataset are presented. Accuracy and F-measure are reported in percentage. For each machine learning approach, values of k ranged from 1 to 4. Only the classifier with the best performing k-value (as defined by highest correlation coefficient) is shown. Our methods were estimated by cross-validation. The highest performing value(s) for each performance measure is highlighted in bold.