Skip to main content
. 2019 Sep 12;7(3):e14830. doi: 10.2196/14830

Table 2.

F1s and standard deviations.

Corpus and model F1 (%), mean (SD) Improvement compared with MetaMap or DNorma
MADEb (gold entitiesc)


BERTd 67.87 (0.25) N/Ae

BioBERT 68.22 (0.11) N/A

EhrBERT500kf 68.74 (0.14) N/A

EhrBERT1Mg 68.82 (0.29) N/A
MADE (predicted entitiesh)


MetaMap [19] 38.59 (0) N/A

BERT 40.81 (0.08) +2.22

BioBERT 40.87 (0.06) +2.28

EhrBERT500k 40.95 (0.04) +2.36

EhrBERT1M 40.95 (0.07) +2.36
NCBIi


DNorm [1] 88.37 (0) N/A

BERT 89.43 (0.99) +1.06

EhrBERT500k 90.00 (0.48) +1.63

EhrBERT1M 90.35 (1.12) +1.98

BioBERT 90.71 (0.37) +2.34
CDRj


DNorm [1] 89.92 (0) N/A

BERT 93.11 (0.54) +3.19

BioBERT 93.42 (0.10) +3.50

EhrBERT500k 93.45 (0.09) +3.53

EhrBERT1M 93.82 (0.15) +3.90

aDNorm: disease name normalization.

bMADE: Medication, Indication, and Adverse Drug Events.

cWe used gold entity mentions as input.

dBERT: bidirectional encoder representations from transformers.

eN/A: not applicable.

fEhrBERT500k: BERT-based model that was trained using 500,000 electronic health record notes.

gEhrBERT1M: BERT-based model that was trained using 1 million electronic health record notes.

hWe used MetaMap-predicted entity mentions as input.

iNCBI: National Center for Biotechnology Information.

jCDR: Chemical-Disease Relations.