Skip to main content
. 2019 Mar 11;9:4071. doi: 10.1038/s41598-019-40561-2

Table 2.

Comparison to state-of-the-art classifiers in terms of accuracy and model complexity.

Dataset SCMb CARTb L1-logistic* L2-logistic* Poly-SVM Naive Bayes Majority
A. baumannii 0.849 (2.7) 0.864 (3.4) 0.880 (3980.5) 0.885 (all) 0.886 (all) 0.822 (all) 0.644
E. coli 0.818 (4.6) 0.808 (7.0) 0.792 (3727.2) 0.789 (all) 0.779 (all) 0.634 (all) 0.697
E. faecium 1.000 (1.0) 1.000 (1.0) 1.000 (142.0) 1.000 (all) 0.996 (all) 0.808 (all) 0.588
K. pneumoniae 0.950 (3.9) 0.949 (4.3) 0.952 (7607.4) 0.948 (all) 0.943 (all) 0.760 (all) 0.571
M. tuberculosis 0.963 (4.5) 0.962 (4.7) 0.962 (2242.2) 0.941 (all) 0.934 (all) 0.789 (all) 0.658
N. gonorrhoeae 0.935 (3.0) 0.936 (3.3) 0.942 (6095.6) 0.915 (all) 0.906 (all) 0.736 (all) 0.529
P. aeruginosa 0.939 (1.2) 0.942 (1.1) 0.937 (87.8) 0.828 (all) 0.773 (all) 0.768 (all) 0.588
P. difficile 0.982 (1.0) 0.982 (1.0) 0.957 (121.8) 0.936 (all) 0.949 (all) 0.887 (all) 0.599
S. aureus 0.987 (1.0) 0.987 (1.0) 0.988 (230.6) 0.987 (all) 0.987 (all) 0.868 (all) 0.544
S. enterica 0.913 (1.0) 0.913 (1.0) 0.925 (991.2) 0.929 (all) 0.920 (all) 0.759 (all) 0.709
S. haemolyticus 0.925 (1.0) 0.925 (1.0) 0.925 (279.1) 0.838 (all) 0.829 (all) 0.758 (all) 0.629
S. pneumoniae 0.960 (1.0) 0.960 (1.0) 0.948 (1391.5) 0.949 (all) 0.946 (all) 0.910 (all) 0.654

For each dataset the accuracy is shown, along with the number of k-mers used by the model (in parentheses). Results are shown for Set Covering Machines (SCM), Classification trees (CART), Logistic regression with L1 and L2 regularization and χ2 feature selection (L1-logistic, L2-logistic), Polynomial kernel Support Vector Machines (Poly-SVM), Naive Bayes, and a baseline predictor that predicts the most abundant class in the data (Majority). Accuracies within 1% of the maximum value are shown in bold. Results are averaged over ten repetitions of the experiment.

[*] For scalability reasons, these algorithms were trained using feature selection to select the one million k-mers that were most associated with the phenotypes; all other k-mers were discarded (see Methods).