Table 2.
Corpus and model | F1 (%), mean (SD) | Improvement compared with MetaMap or DNorma | |
MADEb (gold entitiesc) |
|
|
|
|
BERTd | 67.87 (0.25) | N/Ae |
|
BioBERT | 68.22 (0.11) | N/A |
|
EhrBERT500kf | 68.74 (0.14) | N/A |
|
EhrBERT1Mg | 68.82 (0.29) | N/A |
MADE (predicted entitiesh) |
|
|
|
|
MetaMap [19] | 38.59 (0) | N/A |
|
BERT | 40.81 (0.08) | +2.22 |
|
BioBERT | 40.87 (0.06) | +2.28 |
|
EhrBERT500k | 40.95 (0.04) | +2.36 |
|
EhrBERT1M | 40.95 (0.07) | +2.36 |
NCBIi |
|
|
|
|
DNorm [1] | 88.37 (0) | N/A |
|
BERT | 89.43 (0.99) | +1.06 |
|
EhrBERT500k | 90.00 (0.48) | +1.63 |
|
EhrBERT1M | 90.35 (1.12) | +1.98 |
|
BioBERT | 90.71 (0.37) | +2.34 |
CDRj |
|
|
|
|
DNorm [1] | 89.92 (0) | N/A |
|
BERT | 93.11 (0.54) | +3.19 |
|
BioBERT | 93.42 (0.10) | +3.50 |
|
EhrBERT500k | 93.45 (0.09) | +3.53 |
|
EhrBERT1M | 93.82 (0.15) | +3.90 |
aDNorm: disease name normalization.
bMADE: Medication, Indication, and Adverse Drug Events.
cWe used gold entity mentions as input.
dBERT: bidirectional encoder representations from transformers.
eN/A: not applicable.
fEhrBERT500k: BERT-based model that was trained using 500,000 electronic health record notes.
gEhrBERT1M: BERT-based model that was trained using 1 million electronic health record notes.
hWe used MetaMap-predicted entity mentions as input.
iNCBI: National Center for Biotechnology Information.
jCDR: Chemical-Disease Relations.