Figure 2.
Analogy completion task performance by various embeddings for (a) top 1 accuracy results and (b) top 3 accuracy results. Performance is separated by semantic categories. Significant differences in category performance according to McNemar’s test with continuity correction between Radiopaedia and WG embeddings of a given dimension. Significance is denoted * for BH adjusted-p <0.05, ** for BH adjusted-p< 0.01 and *** for BH adjusted-p < 0.001. No marking means no statistical significance. Abbreviations: WG = Wikipedia 2014 + Gigaword 5 embeddings, d = embedding dimensions, BH = Benjamini-Hochberg