Skip to main content
[Preprint]. 2025 Apr 17:2024.12.31.24319792. Originally published 2024 Dec 31. [Version 2] doi: 10.1101/2024.12.31.24319792

Table 2:

Evaluation results of BioBERT-base models trained with three different text processing and sampling methods (evidence-only, rule-based filtered, and raw ClinVar data), compared to the pre-trained BioBERT-base baseline, evaluated on orthogonally generated DMS data. Ground truth labels are created using functional scores, with the 27.1 percentile as P/LP, 72.5 percentile as B/LB, and the rest being VUS.

Model Pair-wise AUC Avg AUC-ROC Accuracy Precision Recall F1 Score
P/LP vs B/LB P/LP vs VUS B/LB vs VUS
BioBERT-base + ClinVar (evidence-only) 0.9272 0.8043 0.5470 0.7595 0.4753 0.4930 0.4753 0.4219
BioBERT-base + ClinVar (rule-based) 0.9096 0.7938 0.5377 0.7470 0.4891 0.5098 0.4891 0.4399
BioBERT-base + ClinVar (raw-data) 0.9037 0.7882 0.5826 0.7582 0.4840 0.5306 0.4840 0.4192
BioBERT-base 28 0.3953 0.5428 0.3953 0.4503 0.2713 0.0736 0.2713 0.1158