Skip to main content
. 2024 Apr 9;5(6):100968. doi: 10.1016/j.patter.2024.100968

Table 1.

Quality metrics for the embeddings

Data Dim. Acc. (%) RMSE Recall (%)
PubMedBERT 768 69.7 8.4
TF-IDF 4,679,130 65.2 8.8
t-SNE(BERT) 2 62.6 10.2 6.2
t-SNE(TF-IDF) 2 50.6 11.2 0.7
Chance 4.3 12.4 0.0

Acc., kNN accuracy (k=10) of label prediction; RMSE, root mean-squared error of kNN prediction of publication year; Recall, overlap between k nearest neighbors in the 2D embedding and in the high-dimensional space. See experimental procedures for details.