Skip to main content
[Preprint]. 2025 Apr 17:2024.12.31.24319792. Originally published 2024 Dec 31. [Version 2] doi: 10.1101/2024.12.31.24319792

Table 3:

A comparison between ClinVar-BERT and BioBERT on embedding clustering quality of language model embeddings in the d-dimensional space. For each attribute, we report Silhouette scores (higher is better) and Davies-Bouldin indices (lower is better) as mean ± standard deviation computed with 100 independent and randomly sampled runs.

Attribute ClinVar-BERT BioBERT
Silhouette Davies-Bouldin Silhouette Davies-Bouldin
Clinical Significance 0.4376 ± 0.0027 1.4352 ± 0.0091 0.0991 ± 0.0026 4.4854 ± 0.0555
Submission Classification 0.6446 ± 0.0020 0.5278 ± 0.0026 0.1170 ± 0.0029 3.2261 ± 0.0237
Submitter −0.2696 ± 0.0510 2.8505 ± 0.0883 0.1381 ± 0.0200 1.5892 ± 0.0453