Skip to main content
. 2019 Dec 23;19(Suppl 7):274. doi: 10.1186/s12911-019-0981-y

Table 2.

The corpora used to generate the embeddings

Swedish Spanish
Corpora Size Vocabulary size Size Vocabulary size
Out-of-domain (gen) 2.89 GB 1 040 025 8.3 GB 1 000 655
General medical (genMed) 130 MB 118 683 176 MB 168 500
EHR 1.2 GB 300 825 1.1 GB 286 986