Table 1.
Features | Classifiers | AUC | Precision | Recall | F1 Score | Specificity | Youden |
---|---|---|---|---|---|---|---|
BoW | Random Forest | 0.866 | 0.516 | 0.348 | 0.415 | 0.974 | 0.322 |
SGDCIassifier_ENP | 0.845 | 0.363 | 0.468 | 0.409 | 0.935 | 0.403 | |
SGDCIassifier_L2 | 0.842 | 0.378 | 0.461 | 0.415 | 0.940 | 0.401 | |
SGDCIassifier_L1 | 0.840 | 0.317 | 0.560 | 0.405 | 0.905 | 0.466 | |
BernoulliNB | 0.836 | 0.275 | 0.631 | 0.383 | 0.869 | 0.500 | |
Logistic Regression | 0.823 | 0.439 | 0.333 | 0.379 | 0.967 | 0.300 | |
LinearSVC_L2 | 0.822 | 0.532 | 0.177 | 0.266 | 0.988 | 0.165 | |
MultinomialNB | 0.815 | 0.247 | 0.638 | 0.356 | 0.847 | 0.486 | |
LinearSVC_L1 | 0.808 | 0.515 | 0.241 | 0.329 | 0.982 | 0.223 | |
BoW+str | Random Forest | 0.874 | 0.515 | 0.362 | 0.425 | 0.973 | 0.335 |
BernoulliNB | 0.836 | 0.276 | 0.631 | 0.384 | 0.870 | 0.501 | |
SGDCIassifier_L2 | 0.829 | 0.335 | 0.433 | 0.378 | 0.933 | 0.365 | |
SGDCIassifier_ENP | 0.820 | 0.252 | 0.589 | 0.352 | 0.862 | 0.451 | |
Logistic Regression | 0.815 | 0.423 | 0.312 | 0.359 | 0.967 | 0.279 | |
LinearSVC_L1 | 0.800 | 0.550 | 0.156 | 0.243 | 0.990 | 0.146 | |
LinearSVC_L2 | 0.793 | 0.680 | 0.121 | 0.205 | 0.996 | 0.116 | |
MultinomialNB | 0.791 | 0.232 | 0.617 | 0.337 | 0.839 | 0.456 | |
SGDCIassifier_L1 | 0.763 | 0.714 | 0.035 | 0.068 | 0.999 | 0.034 | |
BoW+CUI | Random Forest | 0.881 | 0.444 | 0.454 | 0.449 | 0.955 | 0.409 |
SGDCIassifier_ENP | 0.868 | 0.686 | 0.170 | 0.273 | 0.994 | 0.164 | |
LinearSVC_L2 | 0.866 | 0.707 | 0.206 | 0.319 | 0.993 | 0.199 | |
SGDCIassifier_L2 | 0.866 | 0.700 | 0.199 | 0.309 | 0.993 | 0.192 | |
Logistic Regression | 0.862 | 0.652 | 0.319 | 0.429 | 0.987 | 0.306 | |
SGDCIassifier_L1 | 0.858 | 0.850 | 0.121 | 0.211 | 0.998 | 0.119 | |
LinearSVC_L1 | 0.853 | 0.875 | 0.099 | 0.178 | 0.999 | 0.098 | |
BernoulliNB | 0.834 | 0.348 | 0.574 | 0.433 | 0.915 | 0.490 | |
MultinomialNB | 0.827 | 0.417 | 0.426 | 0.421 | 0.953 | 0.379 | |
BoW+CUIsem | Random Forest | 0.875 | 0.398 | 0.468 | 0.430 | 0.957 | 0.412 |
Logistic Regression | 0.859 | 0.418 | 0.418 | 0.418 | 0.967 | 0.373 | |
SGDCIassifier_ENP | 0.851 | 0.767 | 0.163 | 0.269 | 0.996 | 0.159 | |
LinearSVC_L2 | 0.849 | 0.765 | 0.184 | 0.297 | 0.967 | 0.180 | |
SGDCIassifier_L2 | 0.846 | 0.774 | 0.170 | 0.279 | 0.996 | 0.166 | |
SGDCIassifier_L1 | 0.845 | 0.714 | 0.142 | 0.237 | 0.996 | 0.137 | |
BernoulliNB | 0.833 | 0.324 | 0.567 | 0.412 | 0.907 | 0.474 | |
LinearSVC_L1 | 0.832 | 0.714 | 0.106 | 0.185 | 1.000 | 0.103 | |
MultinomialNB | 0.727 | 0.319 | 0.475 | 0.382 | 0.964 | 0.395 | |
BoW+CUI+str | Random Forest | 0.877 | 0.442 | 0.461 | 0.451 | 0.954 | 0.415 |
LinearSVC_L1 | 0.855 | 0.808 | 0.149 | 0.251 | 0.997 | 0.146 | |
LinearSVC_L2 | 0.849 | 0.732 | 0.213 | 0.330 | 0.994 | 0.207 | |
Logistic Regression | 0.838 | 0.258 | 0.617 | 0.364 | 0.861 | 0.478 | |
BernoulliNB | 0.834 | 0.348 | 0.574 | 0.433 | 0.915 | 0.490 | |
MultinomialNB | 0.830 | 0.397 | 0.383 | 0.390 | 0.954 | 0.337 | |
SGDCIassifier_L1 | 0.816 | 0.696 | 0.113 | 0.195 | 0.996 | 0.110 | |
SGDCIassifier_ENP | 0.797 | 0.684 | 0.092 | 0.163 | 0.997 | 0.089 | |
SGDCIassifier_L2 | 0.774 | 0.750 | 0.064 | 0.118 | 0.998 | 0.062 |
Results within one feature combination are ranked by descending AUC; the best f1 score across all feature combinations is highlighted in bold. Precision, recall, f1 score, sensitivity, specificity, and Youden’s J statistic index are only showing the values for the positive class as distant recurrence.
Abbreviation: BoW: Bag-of-Words model; +str: is incorporated with structured data; +CUI: is incorporated with Bag-of-CUIs; +CUIsem: is incorporated with Bag-of-CUIs by semantic selection; BernoulliNB: naïve Bayes using Bernoulli model; MultinomialNB: naïve Bayes using multinomial model; LinearSVC: support vector machine using linear kernel; SGDC: Stochastic Gradient Descent Classifier; L1, L2, ENP: L1, L2, elastic net penalty regularization; AUC: area under the receiver operating characteristic curve.