Table 3:
A comparison between ClinVar-BERT and BioBERT on embedding clustering quality of language model embeddings in the d-dimensional space. For each attribute, we report Silhouette scores (higher is better) and Davies-Bouldin indices (lower is better) as mean ± standard deviation computed with 100 independent and randomly sampled runs.
Attribute | ClinVar-BERT | BioBERT | ||
---|---|---|---|---|
Silhouette | Davies-Bouldin | Silhouette | Davies-Bouldin | |
Clinical Significance | 0.4376 ± 0.0027 | 1.4352 ± 0.0091 | 0.0991 ± 0.0026 | 4.4854 ± 0.0555 |
Submission Classification | 0.6446 ± 0.0020 | 0.5278 ± 0.0026 | 0.1170 ± 0.0029 | 3.2261 ± 0.0237 |
Submitter | −0.2696 ± 0.0510 | 2.8505 ± 0.0883 | 0.1381 ± 0.0200 | 1.5892 ± 0.0453 |