Table 1:
Summary of training algorithms, text corpora, and vector dimensions used for training word embeddings. For training algorithm, word-level describes training on whole tokens in the training corpus while sub-word describes training on n-grams.
Training Algorithm | Text Corpus | Vector Dimension |
---|---|---|
word2vec (word-level) | MIMIC-III (MIMIC) | 100 |
fasttext (sub-word) | PMC Open Access Subset- All manuscripts (OA-All) | 300 |
GLoVE (word-level) | PMC Open Access Subset- Case reports only (OA-CR) | 600 |
University of Pennsylvania Health System (UPHS) | 1200 | |
Wikipedia- English (Wiki) |