Skip to main content
. Author manuscript; available in PMC: 2017 Oct 30.
Published in final edited form as: Proc Conf Assoc Comput Linguist Meet. 2017;2017:299–309. doi: 10.18653/v1/P17-1028

Table 4.

Biomedical IE results for Task 1: aggregating sequential crowd labels to induce consensus labels. Rows 1–3 indicate non-sequential baselines. Results are averaged over 100 bootstrap re-samples. We report the standard deviation of F1, std, due to this dataset having fewer gold labels for evaluation.

Method Precision Recall F1 std
Majority Vote 91.89 48.03 63.03 2.6
MACE 45.01 88.49 59.63 1.7
Dawid-Skene 77.85 66.77 71.84 1.7

Dawid-Skene then HMM 72.49 58.77 64.86 2.0
ID HMM (Huang et al., 2015) 78.99 68.10 73.11 1.9
HMM-Crowd 72.81 75.14 73.93 1.8