Skip to main content
. Author manuscript; available in PMC: 2017 Oct 30.
Published in final edited form as: Proc Conf Assoc Comput Linguist Meet. 2017;2017:299–309. doi: 10.18653/v1/P17-1028

Table 3.

NER results on Task 2: predicting sequences on unannotated text when trained on crowd labels. Rows 1–4 train the predictive model using individual crowd labels, while Rows 5–8 first aggregate crowd labels then train the model on the induced consensus labels. The last row indicates an upper-bound from training on gold labels. LSTM-Crowd and LSTM-Crowd-cat are described in Section 3.

Method Precision Recall F1
CRF-MA (Rodrigues et al., 2014) 49.40 85.60 62.60
LSTM (Lample et al., 2016) 83.19 57.12 67.73
LSTM-Crowd 82.38 62.10 70.82
LSTM-Crowd-cat 79.61 62.87 70.26

Majority Vote then CRF 45.50 80.90 58.20
Dawid-Skene then LSTM 72.30 61.17 66.27
HMM-Crowd then CRF 77.40 61.40 68.50
HMM-Crowd then LSTM 76.19 66.24 70.87

LSTM on Gold Labels (upper-bound) 85.27 83.19 84.22