Table 3.
Model | Strict Match | Overlap Match | ||||
---|---|---|---|---|---|---|
P | R | F1 | P | R | F1 | |
Baseline | 0.749 | 0.816 | 0.781 | 0.800 | 0.871 | 0.834 |
+ linguistic features | 0.817 | 0.764 | 0.789 | 0.868 | 0.812 | 0.839 |
+ linguistic features + KFs | 0.776 | 0.845 | 0.809 | 0.820 | 0.893 | 0.855 |
+ linguistic features + ELMo | 0.826 | 0.801 | 0.813 | 0.874 | 0.847 | 0.860 |
+ linguistic features + KFs + ELMo (ours) | 0.815 | 0.812 | 0.814 | 0.873 | 0.869 | 0.871 |
Strict match criteria require that the predicted entity and the gold standard annotations have to match exactly at the byte offset; and overlap match criteria allows a match if the predicted entity overlaps with the gold annotation at all. The highest scores are highlighted in bold. We tune the hyper-parameters through the validation set and use the official evaluation script to assess the performance of the final chosen model on the test set