Skip to main content
. 2017 Dec 1;17:155. doi: 10.1186/s12911-017-0556-8

Table 3.

Top five best-performing interpretable shallow classifiers in iDASH and MGH datasets

Data Feature Vector Algorithm F1 AUC p-value
iDASH Bag-of-words + UMLS (5SG) Tf-idf SVM-Lin 0.932 0.957 <0.01
Bag-of-words + UMLS (All) Tf-idf SVM-Lin 0.931 0.957 <0.01
Bag-of-words + UMLS (15ST) Tf-idf SVM-Lin 0.930 0.957 <0.01
Bag-of-words + UMLS (All) Tf-idf SVM-Lin-SGD 0.928 0.955 <0.01
Bag-of-words Tf-idf SVM-Lin 0.927 0.955 <0.01
Bag-of-words Tf NB 0.893 0.935 Baseline
MGH Bag-of-words + UMLS (5SG) Tf-idf SVM-Lin 0.934 0.964 <0.01
Bag-of-words + UMLS (15ST) Tf-idf SVM-Lin 0.931 0.962 <0.01
Bag-of-words + UMLS (All) Tf-idf SVM-Lin 0.930 0.962 <0.01
Bag-of-words Tf-idf SVM-Lin 0.924 0.958 <0.01
Bag-of-words + UMLS (5SG) Tf LR-L1 0.915 0.953 <0.01
Bag-of-words Tf NB 0.755 0.867 Baseline

Abbreviation: SG Semantic groups, ST Semantic types, Tf Term frequency, Tf-idf Term frequency-inverse document frequency weighting, SVM-Lin Linear support vector machine, SVM-Lin-SGD Linear support vector machine with stochastic gradient descent training, LR-L1 L1-regularized multinomial logistic regression, NB Multinomial naïve Bayes. Baseline combinations are shown in bold face