Skip to main content
. Author manuscript; available in PMC: 2021 Nov 1.
Published in final edited form as: Artif Intell Med. 2020 Nov 1;110:101977. doi: 10.1016/j.artmed.2020.101977

Table 1.

Results for Machine Learning Configurations

Features Classifiers AUC Precision Recall F1 Score Specificity Youden
BoW Random Forest 0.866 0.516 0.348 0.415 0.974 0.322
SGDCIassifier_ENP 0.845 0.363 0.468 0.409 0.935 0.403
SGDCIassifier_L2 0.842 0.378 0.461 0.415 0.940 0.401
SGDCIassifier_L1 0.840 0.317 0.560 0.405 0.905 0.466
BernoulliNB 0.836 0.275 0.631 0.383 0.869 0.500
Logistic Regression 0.823 0.439 0.333 0.379 0.967 0.300
LinearSVC_L2 0.822 0.532 0.177 0.266 0.988 0.165
MultinomialNB 0.815 0.247 0.638 0.356 0.847 0.486
LinearSVC_L1 0.808 0.515 0.241 0.329 0.982 0.223

BoW+str Random Forest 0.874 0.515 0.362 0.425 0.973 0.335
BernoulliNB 0.836 0.276 0.631 0.384 0.870 0.501
SGDCIassifier_L2 0.829 0.335 0.433 0.378 0.933 0.365
SGDCIassifier_ENP 0.820 0.252 0.589 0.352 0.862 0.451
Logistic Regression 0.815 0.423 0.312 0.359 0.967 0.279
LinearSVC_L1 0.800 0.550 0.156 0.243 0.990 0.146
LinearSVC_L2 0.793 0.680 0.121 0.205 0.996 0.116
MultinomialNB 0.791 0.232 0.617 0.337 0.839 0.456
SGDCIassifier_L1 0.763 0.714 0.035 0.068 0.999 0.034

BoW+CUI Random Forest 0.881 0.444 0.454 0.449 0.955 0.409
SGDCIassifier_ENP 0.868 0.686 0.170 0.273 0.994 0.164
LinearSVC_L2 0.866 0.707 0.206 0.319 0.993 0.199
SGDCIassifier_L2 0.866 0.700 0.199 0.309 0.993 0.192
Logistic Regression 0.862 0.652 0.319 0.429 0.987 0.306
SGDCIassifier_L1 0.858 0.850 0.121 0.211 0.998 0.119
LinearSVC_L1 0.853 0.875 0.099 0.178 0.999 0.098
BernoulliNB 0.834 0.348 0.574 0.433 0.915 0.490
MultinomialNB 0.827 0.417 0.426 0.421 0.953 0.379

BoW+CUIsem Random Forest 0.875 0.398 0.468 0.430 0.957 0.412
Logistic Regression 0.859 0.418 0.418 0.418 0.967 0.373
SGDCIassifier_ENP 0.851 0.767 0.163 0.269 0.996 0.159
LinearSVC_L2 0.849 0.765 0.184 0.297 0.967 0.180
SGDCIassifier_L2 0.846 0.774 0.170 0.279 0.996 0.166
SGDCIassifier_L1 0.845 0.714 0.142 0.237 0.996 0.137
BernoulliNB 0.833 0.324 0.567 0.412 0.907 0.474
LinearSVC_L1 0.832 0.714 0.106 0.185 1.000 0.103
MultinomialNB 0.727 0.319 0.475 0.382 0.964 0.395

BoW+CUI+str Random Forest 0.877 0.442 0.461 0.451 0.954 0.415
LinearSVC_L1 0.855 0.808 0.149 0.251 0.997 0.146
LinearSVC_L2 0.849 0.732 0.213 0.330 0.994 0.207
Logistic Regression 0.838 0.258 0.617 0.364 0.861 0.478
BernoulliNB 0.834 0.348 0.574 0.433 0.915 0.490
MultinomialNB 0.830 0.397 0.383 0.390 0.954 0.337
SGDCIassifier_L1 0.816 0.696 0.113 0.195 0.996 0.110
SGDCIassifier_ENP 0.797 0.684 0.092 0.163 0.997 0.089
SGDCIassifier_L2 0.774 0.750 0.064 0.118 0.998 0.062

Results within one feature combination are ranked by descending AUC; the best f1 score across all feature combinations is highlighted in bold. Precision, recall, f1 score, sensitivity, specificity, and Youden’s J statistic index are only showing the values for the positive class as distant recurrence.

Abbreviation: BoW: Bag-of-Words model; +str: is incorporated with structured data; +CUI: is incorporated with Bag-of-CUIs; +CUIsem: is incorporated with Bag-of-CUIs by semantic selection; BernoulliNB: naïve Bayes using Bernoulli model; MultinomialNB: naïve Bayes using multinomial model; LinearSVC: support vector machine using linear kernel; SGDC: Stochastic Gradient Descent Classifier; L1, L2, ENP: L1, L2, elastic net penalty regularization; AUC: area under the receiver operating characteristic curve.