. 2019 Jul 2;26(11):1297–1304. doi: 10.1093/jamia/ocz096

Table 3.

Test set comparison in exact F1 of embedding methods across tasks

Method	i2b2 2010		i2b2 2012		SemEval 2014 Task 7		SemEval 2015 Task 14
Method	General	MIMIC	General	MIMIC	General	MIMIC	General	MIMIC
word2vec	80.38	84.32	71.07	75.09	72.2	77.48	73.09	76.42
GloVe	84.08	85.07	74.95	75.27	70.22	77.73	72.13	76.68
fastText	83.46	84.19	73.24	74.83	69.87	76.47	72.67	77.85
ELMo	83.83	87.8	76.61	80.5	72.27	78.58	75.15	80.46
BERT_BASE	84.33	89.55	76.62	80.34	76.76	80.07	77.57	80.67
BERT_LARGE	85.48	90.25 ^b	78.14	80.91 ^b	78.75	80.74 ^b	77.97	81.65 ^b
BioBERT	84.76	–	77.77	–	77.91	–	79.97	–
Prior SOTA	88.60³⁴		^a42		80.3³⁹		81.3⁴³

i2b2: Informatics for Integrating Biology and the Bedside; MIMIC: Medical Information Mart for Intensive Care; SOTA: state-of-the-art.

^aThe SOTA on the i2b2 2012 task is only reported in partial-matching F1. That result, 92.29,⁴² is below the equivalent we achieve on partial-matching F1 with BERT_LARGE(MIMIC), 93.18.

^bThe best performing result in the respective task.