Table 2:
Evaluation results of BioBERT-base models trained with three different text processing and sampling methods (evidence-only, rule-based filtered, and raw ClinVar data), compared to the pre-trained BioBERT-base baseline, evaluated on orthogonally generated DMS data. Ground truth labels are created using functional scores, with the 27.1 percentile as P/LP, 72.5 percentile as B/LB, and the rest being VUS.
| Model | Pair-wise AUC | Avg AUC-ROC | Accuracy | Precision | Recall | F1 Score | ||
|---|---|---|---|---|---|---|---|---|
| P/LP vs B/LB | P/LP vs VUS | B/LB vs VUS | ||||||
| BioBERT-base + ClinVar (evidence-only) | 0.9272 | 0.8043 | 0.5470 | 0.7595 | 0.4753 | 0.4930 | 0.4753 | 0.4219 |
| BioBERT-base + ClinVar (rule-based) | 0.9096 | 0.7938 | 0.5377 | 0.7470 | 0.4891 | 0.5098 | 0.4891 | 0.4399 |
| BioBERT-base + ClinVar (raw-data) | 0.9037 | 0.7882 | 0.5826 | 0.7582 | 0.4840 | 0.5306 | 0.4840 | 0.4192 |
| BioBERT-base 28 | 0.3953 | 0.5428 | 0.3953 | 0.4503 | 0.2713 | 0.0736 | 0.2713 | 0.1158 |