Skip to main content
. 2019 Jun 14;27(1):22–30. doi: 10.1093/jamia/ocz075

Table 2.

Performance of CRF and NN models on the development set. For each model, the best lenient metrics of precision, recall, and F-score are shown in bold

Model Precision Recall F-score
CRF
Baseline (Lexical and syntactic features) 0.9525 0.8825 0.9162
Baseline + word shape (ws) 0.9527 0.8815 0.9157
Baseline + dictionary features (df) 0.9511 0.8829 0.9157
Baseline + cluster features (cf)* 0.9504 0.8902 0.9193
Baseline + ws + df 0.9523 0.8821 0.9158
Baseline + ws + cf 0.9491 0.8898 0.9185
Baseline + df + cf 0.9494 0.8903 0.9189
Baseline + ws + df + cf 0.9486 0.8900 0.9184
Neural Network
Baseline (word + characters) 0.9476 0.8995 0.9230
Csub (characters + subword) 0.9502 0.9042 0.9266
Wsub (word + subword) 0.9496 0.9044 0.9264
Wcsub (word + subword + characters)* 0.9498 0.9066 0.9277
Ensemble
Inter-CRF 0.9466 0.8935 0.9193
Intra-csub 0.9656 0.8981 0.9306
Intra-wsub 0.9638 0.9013 0.9315
Intra-wcsub 0.9641 0.9010 0.9315
Inter-NN 0.9591 0.9084 0.9331
NN-CRF 0.9401 0.9209 0.9304
*

represents significance value at P < .05 with approximate randomization significance test.39

Abbreviations: CRF, conditional random fields; NN, neural network.