TABLE IV:
Pearson correlation coefficient between similarity scores from human judgments and those from word embeddings on four measurement datasets. The asterisk indicates that difference between word embeddings trained on EHR and those on other resources is statistically significant using t-test (p<0.01).
| Dataset | EHR | MedLit | GloVe | Google News |
|---|---|---|---|---|
| Pedersen’s | 0.632* | 0.569 | 0.403 | 0.357 |
| Hliaoutakis’s | 0.482* | 0.311 | 0.247 | 0.243 |
| MayoSRS | 0.412* | 0.300 | 0.082 | 0.084 |
| UMNSRS | 0.440* | 0.404 | 0.177 | 0.154 |