. Author manuscript; available in PMC: 2016 Jan 31.

Published in final edited form as: J Biomed Inform. 2014 Nov 8;53:196–207. doi: 10.1016/j.jbi.2014.11.002

Table 2.

Paired classification performances over the three data sets when only ADR instances from a different corpus are added. ADR F-scores, non-ADR F-scores, Accuracies and 95% Confidence Intervals (CI) for each of the train-test set combinations are shown.

Test Data	Training Data	ADR F-score	non-ADR F-score	Accuracy (%)	95% CI

ADE	ADE	0.812	0.914	88.2	87.3 – 89.1
	ADE+DS_ADR	0.802	0.912	87.8	86.8 – 88.7
	ADE+TW_ADR	0.802	0.909	87.5	86.5 – 88.4

TW	TW	0.538	0.919	86.2	84.7 – 87.6
	TW+ADE_ADR	0.549	0.946^*	90.3^*	89.0 – 91.5
	TW+DS_ADR	0.565^*	0.939^*	89.3^*	87.9 – 90.5

DS	DS	0.678	0.890	83.8	82.2 – 85.0
	DS+ADE_ADR	0.682	0.886	83.2	82.7 – 85.8
	DS+TW_ADR	0.695^*	0.897	84.6	82.8 – 86.0

indicates statistically significant improvement in performance over the highest score achieved in the single corpus binary classification task.