. Author manuscript; available in PMC: 2016 Apr 1.

Published in final edited form as: J Biomed Inform. 2015 Feb 23;54:202–212. doi: 10.1016/j.jbi.2015.02.004

Table 3.

Comparison of system recalls, precisions and F-scores when manually annotated data is used for evaluation.

Study	Size	Recall	Precision	F-score

Leaman et al. [17]^*	3,150	0.70	0.78	0.74
Nikfarjam and Gonzalez^* [34]	1,200	0.66	0.70	0.68
Hadzi-puric and Grmusa [43]	990	0.65	0.75	0.70
Liu and Chen [46]	200	0.80	0.87	0.84
Yates and Goharian [50]	125	0.89	0.69	0.78
Freifeld et al. [57]	437	0.86	0.72	0.78
Segura-Bedmar et al. [58]	400	0.56	0.85	0.68
O'Connor et al. [35]	1,873	0.62	0.54	0.58
Sampathkumar et al. [62]	2,000	0.74	0.79	0.76
Nikfarjam et al. (DailyStrength) [65]	1,559	0.78	0.86	0.82
Nikfarjam et al. (Twitter) [65]	444	0368	0.77	0.72

indicates systems using the same (or subsets of the same) data set.