Table 5. Biomedical IE.
results for Task 2. Rows 1–3 correspond to training on all labels, while Rows 4–7 first aggregate crowd labels then train the sequence labeling model on consensus annotations.
| Method | Precision | Recall | F1 | std |
|---|---|---|---|---|
| LSTM (Lample et al., 2016) | 77.43 | 61.13 | 68.27 | 1.9 |
| LSTM-Crowd | 73.83 | 63.93 | 68.47 | 1.6 |
| LSTM-Crowd-cat | 68.08 | 68.41 | 68.20 | 1.8 |
|
| ||||
| Majority Vote then CRF | 93.71 | 33.16 | 48.92 | 2.8 |
| Dawid-Skene then LSTM | 70.21 | 65.26 | 67.59 | 1.7 |
| HMM-Crowd then CRF | 79.54 | 54.76 | 64.81 | 2.0 |
| HMM-Crowd then LSTM | 73.65 | 64.64 | 68.81 | 1.9 |