. 2019 Jul 23;27(1):47–55. doi: 10.1093/jamia/ocz120

Table 2.

The pretrained word embeddings used in the study

Name	Corpus	Dimension	Vocab. size
GloVe^a	Wikipedia and English Gigaword	200	400 000
fastText^b	Wikipedia	300	2 519 370
nlplab^c	PubMed and PMC	200	2 231 684
word2vec_GN	Google News	300	3 000 000
Numberbatch^d	Hybrid of ConceptNet, word2vec_GN and GloVe	300	417 194
BioWordVec^e	PubMed and MIMIC-III	200	16 545 451
word2vec_MIMIC	MIMIC-III	300	320 313
ConcatenatedVec	Hybrid of GloVe, fastText, and word2vec_MIMIC	700	228 763
AddedVec	Hybrid of fastText and word2vec_MIMIC	300	46 404
PurifiedVec	Postprocessed GloVe vectors	200	400 000

MIMIC-III: Medical Information Mart for Intensive Care III; PMC: PubMed Central.

This is the embedding we used during the n2c2 ADME track, which is available at http://evexdb.org/pmresources/vec-space-models/.