. Author manuscript; available in PMC: 2021 Nov 1.

Published in final edited form as: Artif Intell Med. 2020 Nov 1;110:101977. doi: 10.1016/j.artmed.2020.101977

Table 1.

Results for Machine Learning Configurations

Features	Classifiers	AUC	Precision	Recall	F1 Score	Specificity	Youden
BoW	Random Forest	0.866	0.516	0.348	0.415	0.974	0.322
	SGDCIassifier_ENP	0.845	0.363	0.468	0.409	0.935	0.403
	SGDCIassifier_L2	0.842	0.378	0.461	0.415	0.940	0.401
	SGDCIassifier_L1	0.840	0.317	0.560	0.405	0.905	0.466
	BernoulliNB	0.836	0.275	0.631	0.383	0.869	0.500
	Logistic Regression	0.823	0.439	0.333	0.379	0.967	0.300
	LinearSVC_L2	0.822	0.532	0.177	0.266	0.988	0.165
	MultinomialNB	0.815	0.247	0.638	0.356	0.847	0.486
	LinearSVC_L1	0.808	0.515	0.241	0.329	0.982	0.223

BoW+str	Random Forest	0.874	0.515	0.362	0.425	0.973	0.335
	BernoulliNB	0.836	0.276	0.631	0.384	0.870	0.501
	SGDCIassifier_L2	0.829	0.335	0.433	0.378	0.933	0.365
	SGDCIassifier_ENP	0.820	0.252	0.589	0.352	0.862	0.451
	Logistic Regression	0.815	0.423	0.312	0.359	0.967	0.279
	LinearSVC_L1	0.800	0.550	0.156	0.243	0.990	0.146
	LinearSVC_L2	0.793	0.680	0.121	0.205	0.996	0.116
	MultinomialNB	0.791	0.232	0.617	0.337	0.839	0.456
	SGDCIassifier_L1	0.763	0.714	0.035	0.068	0.999	0.034

BoW+CUI	Random Forest	0.881	0.444	0.454	0.449	0.955	0.409
	SGDCIassifier_ENP	0.868	0.686	0.170	0.273	0.994	0.164
	LinearSVC_L2	0.866	0.707	0.206	0.319	0.993	0.199
	SGDCIassifier_L2	0.866	0.700	0.199	0.309	0.993	0.192
	Logistic Regression	0.862	0.652	0.319	0.429	0.987	0.306
	SGDCIassifier_L1	0.858	0.850	0.121	0.211	0.998	0.119
	LinearSVC_L1	0.853	0.875	0.099	0.178	0.999	0.098
	BernoulliNB	0.834	0.348	0.574	0.433	0.915	0.490
	MultinomialNB	0.827	0.417	0.426	0.421	0.953	0.379

BoW+CUIsem	Random Forest	0.875	0.398	0.468	0.430	0.957	0.412
	Logistic Regression	0.859	0.418	0.418	0.418	0.967	0.373
	SGDCIassifier_ENP	0.851	0.767	0.163	0.269	0.996	0.159
	LinearSVC_L2	0.849	0.765	0.184	0.297	0.967	0.180
	SGDCIassifier_L2	0.846	0.774	0.170	0.279	0.996	0.166
	SGDCIassifier_L1	0.845	0.714	0.142	0.237	0.996	0.137
	BernoulliNB	0.833	0.324	0.567	0.412	0.907	0.474
	LinearSVC_L1	0.832	0.714	0.106	0.185	1.000	0.103
	MultinomialNB	0.727	0.319	0.475	0.382	0.964	0.395

BoW+CUI+str	Random Forest	0.877	0.442	0.461	0.451	0.954	0.415
	LinearSVC_L1	0.855	0.808	0.149	0.251	0.997	0.146
	LinearSVC_L2	0.849	0.732	0.213	0.330	0.994	0.207
	Logistic Regression	0.838	0.258	0.617	0.364	0.861	0.478
	BernoulliNB	0.834	0.348	0.574	0.433	0.915	0.490
	MultinomialNB	0.830	0.397	0.383	0.390	0.954	0.337
	SGDCIassifier_L1	0.816	0.696	0.113	0.195	0.996	0.110
	SGDCIassifier_ENP	0.797	0.684	0.092	0.163	0.997	0.089
	SGDCIassifier_L2	0.774	0.750	0.064	0.118	0.998	0.062

Results within one feature combination are ranked by descending AUC; the best f1 score across all feature combinations is highlighted in bold. Precision, recall, f1 score, sensitivity, specificity, and Youden’s J statistic index are only showing the values for the positive class as distant recurrence.

Abbreviation: BoW: Bag-of-Words model; +str: is incorporated with structured data; +CUI: is incorporated with Bag-of-CUIs; +CUIsem: is incorporated with Bag-of-CUIs by semantic selection; BernoulliNB: naïve Bayes using Bernoulli model; MultinomialNB: naïve Bayes using multinomial model; LinearSVC: support vector machine using linear kernel; SGDC: Stochastic Gradient Descent Classifier; L1, L2, ENP: L1, L2, elastic net penalty regularization; AUC: area under the receiver operating characteristic curve.