Skip to main content
. 2019 Jul 2;26(11):1297–1304. doi: 10.1093/jamia/ocz096

Table 3.

Test set comparison in exact F1 of embedding methods across tasks

Method i2b2 2010
i2b2 2012
SemEval 2014 Task 7
SemEval 2015 Task 14
General MIMIC General MIMIC General MIMIC General MIMIC
word2vec 80.38 84.32 71.07 75.09 72.2 77.48 73.09 76.42
GloVe 84.08 85.07 74.95 75.27 70.22 77.73 72.13 76.68
fastText 83.46 84.19 73.24 74.83 69.87 76.47 72.67 77.85
ELMo 83.83 87.8 76.61 80.5 72.27 78.58 75.15 80.46
BERTBASE 84.33 89.55 76.62 80.34 76.76 80.07 77.57 80.67
BERTLARGE 85.48 90.25 b 78.14 80.91 b 78.75 80.74 b 77.97 81.65 b
BioBERT 84.76 77.77 77.91 79.97
Prior SOTA 88.6034 a42 80.339 81.343

i2b2: Informatics for Integrating Biology and the Bedside; MIMIC: Medical Information Mart for Intensive Care; SOTA: state-of-the-art.

aThe SOTA on the i2b2 2012 task is only reported in partial-matching F1. That result, 92.29,42 is below the equivalent we achieve on partial-matching F1 with BERTLARGE(MIMIC), 93.18.

bThe best performing result in the respective task.