Table 4.
Level | Model | P (%) | R (%) | F (%) |
---|---|---|---|---|
Innermost | CRF | 77.19 | 68.78 | 72.74a |
BiLSTM-CRF | 73.93 | 73.38 | 73.56 a | |
Layered BiLSTM-CRF | 69.79 | 70.41 | 70.10 | |
Outermost | CRF | 73.63 | 66.41 | 69.83a |
BiLSTM-CRF | 75.61 | 67.35 | 71.24a | |
Layered BiLSTM-CRF | 74.00 | 74.54 | 74.27 | |
All | CRF | 75.44 | 67.61 | 71.31a |
BiLSTM-CRF | 74.71 | 70.42 | 72.50a | |
Layered BiLSTM-CRF | 77.02 | 75.45 | 76.23 |
Note: For each different level, the best precision (P), recall (R), and F-score (F) amongst the 3 models is shown in bold.
Abbreviations: NER: named entity recognition; CRF: conditional random field.
A significant difference between CRF and (flat) BiLSTM-CRF models at P < .05. Since the layered BiLSTM-CRF takes as input different entities than the baseline models (ie, all entities vs innermost or outermost entities), we did not apply significance testing between layered and flat models.