Table 1.
Quality metrics for the embeddings
Data | Dim. | Acc. (%) | RMSE | Recall (%) |
---|---|---|---|---|
PubMedBERT | 768 | 69.7 | 8.4 | – |
TF-IDF | 4,679,130 | 65.2 | 8.8 | – |
t-SNE(BERT) | 2 | 62.6 | 10.2 | 6.2 |
t-SNE(TF-IDF) | 2 | 50.6 | 11.2 | 0.7 |
Chance | – | 4.3 | 12.4 | 0.0 |
Acc., kNN accuracy () of label prediction; RMSE, root mean-squared error of kNN prediction of publication year; Recall, overlap between k nearest neighbors in the 2D embedding and in the high-dimensional space. See experimental procedures for details.