Skip to main content
. 2024 Jul 11;2024:baae057. doi: 10.1093/database/baae057

Table 10.

Sizes of the raw text entries that we used to perform knowledge base lookup and the corresponding embedding sizes.

Knowledge base Raw size Embedding size
NCBI Genea (10) 3.9 GB 5.5 GB
CTD diseases (16, 17) 6 MB 376 MB
MeSH (42) 46 MB 2.6 GB
dbSNPb (64)
NCBI Taxonomy (63) 317 MB 16 GB
Cellosaurus (5) 6.3 MB 595 MB
Total 4.28 GB 25 GB

a We only embedded the genes for most frequent species.

b As mentioned, we use LitVar2 for performing lookups on dbSNP