Skip to main content
. Author manuscript; available in PMC: 2023 Jan 1.
Published in final edited form as: J Biomed Inform. 2021 Dec 14;125:103971. doi: 10.1016/j.jbi.2021.103971

Table 1:

Summary of training algorithms, text corpora, and vector dimensions used for training word embeddings. For training algorithm, word-level describes training on whole tokens in the training corpus while sub-word describes training on n-grams.

Training Algorithm Text Corpus Vector
Dimension
word2vec (word-level) MIMIC-III (MIMIC) 100
fasttext (sub-word) PMC Open Access Subset- All manuscripts (OA-All) 300
GLoVE (word-level) PMC Open Access Subset- Case reports only (OA-CR) 600
University of Pennsylvania Health System (UPHS) 1200
Wikipedia- English (Wiki)