. Author manuscript; available in PMC: 2016 Jan 31.

Published in final edited form as: J Biomed Inform. 2014 Nov 8;53:196–207. doi: 10.1016/j.jbi.2014.11.002

Table 3.

Paired classification performances (all instances) over the three data sets. ADR F-scores, non-ADR F-scores, Accuracies and 95% Confidence Intervals (CI) for each of the train-test set combinations are shown.

Test Data	Training Data	ADR F-score	non-ADR F-score	Accuracy (%)	95% CI

ADE	ADE	0.812	0.914	88.2	87.3 – 89.1
	ADE+DS_ALL	0.789	0.904	86.9	85.9 – 87.8
	ADE+TW_ALL	0.800	0.912	87.7	86.8 – 88.7

TW	TW	0.538	0.919	86.2	84.7 – 87.6
	TW+ADE_ALL	0.545	0.941	88.6	87.2 – 89.7
	TW+DS_ALL	0.597^*	0.943	90.1	88.7 – 91.3

DS	DS	0.678	0.890	83.8	82.2 – 85.0
	DS+ADE_ALL	0.674	0.891	83.5	81.6 – 84.8
	DS+TW_ALL	0.704^*	0.899	85.0	83.3 – 86.5

indicates statistically significant improvement in performance over the highest score achieved in the binary classification task.