Table 3.
Data | Feature | Vector | Algorithm | F1 | AUC | p-value |
---|---|---|---|---|---|---|
iDASH | Bag-of-words + UMLS (5SG) | Tf-idf | SVM-Lin | 0.932 | 0.957 | <0.01 |
Bag-of-words + UMLS (All) | Tf-idf | SVM-Lin | 0.931 | 0.957 | <0.01 | |
Bag-of-words + UMLS (15ST) | Tf-idf | SVM-Lin | 0.930 | 0.957 | <0.01 | |
Bag-of-words + UMLS (All) | Tf-idf | SVM-Lin-SGD | 0.928 | 0.955 | <0.01 | |
Bag-of-words | Tf-idf | SVM-Lin | 0.927 | 0.955 | <0.01 | |
Bag-of-words | Tf | NB | 0.893 | 0.935 | Baseline | |
MGH | Bag-of-words + UMLS (5SG) | Tf-idf | SVM-Lin | 0.934 | 0.964 | <0.01 |
Bag-of-words + UMLS (15ST) | Tf-idf | SVM-Lin | 0.931 | 0.962 | <0.01 | |
Bag-of-words + UMLS (All) | Tf-idf | SVM-Lin | 0.930 | 0.962 | <0.01 | |
Bag-of-words | Tf-idf | SVM-Lin | 0.924 | 0.958 | <0.01 | |
Bag-of-words + UMLS (5SG) | Tf | LR-L1 | 0.915 | 0.953 | <0.01 | |
Bag-of-words | Tf | NB | 0.755 | 0.867 | Baseline |
Abbreviation: SG Semantic groups, ST Semantic types, Tf Term frequency, Tf-idf Term frequency-inverse document frequency weighting, SVM-Lin Linear support vector machine, SVM-Lin-SGD Linear support vector machine with stochastic gradient descent training, LR-L1 L1-regularized multinomial logistic regression, NB Multinomial naïve Bayes. Baseline combinations are shown in bold face