Table 2.
Method | Resource (tokens / vocab) | Size | Language model |
---|---|---|---|
word2vec | Google News (100B / 3M) | 300 | NA |
Glove | Gigaword5 + Wikipedia2014 (6B / 0.4M) | 300 | NA |
fastText | Wikipedia 2017 + UMBC corpus + statmt.org news (16B / 1M) | 300 | NA |
ELMo | WMT 2008-2012 + Wikipedia (5.5B / 0.7M) | 512 | 2-layer, 4096-hidden,93.6M parameters |
BERTBASE | BooksCorpus + English Wikipedia (3.3B / 0.03Ma) | 768 | 12-layer, 768-hidden, 12 heads, 110M parameters |
BERTLARGE | BooksCorpus + English Wikipedia (3.3B / 0.03Ma) | 1024 | 24-layer, 1024-hidden, 16-heads, 340M parameters |
B: billion; M: million; NA: Not Applicable.
aVocabulary size calculated after wordpiece tokenization.