[Preprint]. 2025 Apr 17:2024.12.31.24319792. Originally published 2024 Dec 31. [Version 2] doi: 10.1101/2024.12.31.24319792

Table 2:

Evaluation results of BioBERT-base models trained with three different text processing and sampling methods (evidence-only, rule-based filtered, and raw ClinVar data), compared to the pre-trained BioBERT-base baseline, evaluated on orthogonally generated DMS data. Ground truth labels are created using functional scores, with the 27.1 percentile as P/LP, 72.5 percentile as B/LB, and the rest being VUS.

Model	Pair-wise AUC			Avg AUC-ROC	Accuracy	Precision	Recall	F1 Score
	P/LP vs B/LB	P/LP vs VUS	B/LB vs VUS
BioBERT-base + ClinVar (evidence-only)	0.9272	0.8043	0.5470	0.7595	0.4753	0.4930	0.4753	0.4219
BioBERT-base + ClinVar (rule-based)	0.9096	0.7938	0.5377	0.7470	0.4891	0.5098	0.4891	0.4399
BioBERT-base + ClinVar (raw-data)	0.9037	0.7882	0.5826	0.7582	0.4840	0.5306	0.4840	0.4192
BioBERT-base ²⁸	0.3953	0.5428	0.3953	0.4503	0.2713	0.0736	0.2713	0.1158