Table 11. Evaluation results on the test portion of SPL coreference dataset using concept recognition/normalization with MetaMap.
Precision | Recall | F1 | F1 difference from using gold entities | |
---|---|---|---|---|
With the best configuration for gold entity mentions | ||||
Bio-SCoRes | 61.1 | 41.3 | 49.2 | |
- Anaphora | 61.7 | 43.1 | 50.7 | |
- Cataphora | 66.7 | 28.4 | 39.8 | |
- Appositive | 43.9 | 43.9 | 43.9 | |
- PredicateNominative | 88.1 | 52.1 | 65.5 | |
With the best configuration for end-to-end coreference resolution | ||||
Bio-SCoRes | 62.7 | 43.9 | 51.7 | -1.8 |
- Anaphora | 59.6 | 44.9 | 51.2 | -2.1 |
- Cataphora | 74.0 | 32.4 | 45.1 | -1.4 |
- Appositive | 58.7 | 50.5 | 54.3 | -1.0 |
- PredicateNominative | 86.4 | 53.5 | 66.1 | -3.5 |