Skip to main content
. 2021 Mar 16;4(1):ooab022. doi: 10.1093/jamiaopen/ooab022

Table 2.

Clustering scores by embedding method

Model Embedding Embedding dimension NMI (200) NMI (400) AMI (200) AMI (400) Silhouette
Med2Vec Sep 200 0.2011 0.2400 0.0498 0.0480 −0.5822
CBOW Sep 10 0.4254 0.4774 0.1493 0.1408 −0.4647
CBOW Sep 50 0.4261 0.4635 0.1776 0.1679 −0.3897
CBOW Sep 100 0.4011 0.4335 0.1702 0.1622 −0.3884
SG Sep 10 0.5221 0.5737 0.2052 0.1837 −0.3161
SG Sep 50 0.5500 0.5905 0.2754 0.2572 −0.1981
SG Sep 100 0.5288 0.5751 0.2852 0.2797 −0.2001
CBOW Co 10 0.4313 0.4773 0.1590 0.1429 −0.4639
CBOW Co 50 0.4576 0.4935 0.2287 0.2133 −0.3549
CBOW Co 100 0.4478 0.4825 0.2323 0.2154 −0.3448
SG Co 10 0.5220 0.5798 0.2035 0.1913 −0.3197
SG Co 50 0.5726 0.6144 0.2979 0.2864 −0.1648
SG Co 100 0.5605 0.6134 0.2963 0.3107 0.1615
Med2Vec* N/A 200 0.2755 0.3472 0.0524 0.0437 −0.5001
SG* Co 100 0.5843 0.6448 0.2559 0.2598 −0.1722
*

Models trained on a subsample of codes which occurred in the translated Med2Vec comparison.

Note that the “Co” designation in the embedding column indicates a model which trained category and code embeddings jointly, whereas a “Sep” designation indicates that these embeddings were trained separately.