. 2025 Feb 5;17:18. doi: 10.1186/s13321-025-00952-2

Table 1.

Benchmarking BarlowDTI against other models using Kang et al. splits [41]

Dataset	Model	ROC AUC	PR AUC
BioSNAP	BarlowDTI	0.9599 ± 0.0004	0.9670 ± 0.0004
	XGBoost	0.9142	0.9229
	MolTrans [42]	0.895 ± 0.002	0.901 ± 0. 004
	Kang et al. [41]	0.914 ± 0.006	0.900 ± 0.007
	DLM-DTI [17]	0.914 ± 0.003	0.914 ± 0.006
	ConPLex [43]	–	0.897 ± 0.001
BindingDB	BarlowDTI	0.9364 ± 0.0003	0.7344 ± 0.0018
	XGBoost	0.9261	0.6948
	MolTrans [42]	0.914 ± 0.001	0.622 ± 0.007
	Kang et al. [41]	0.922 ± 0.001	0.623 ± 0.010
	DLM-DTI [17]	0.912 ± 0.004	0.643 ± 0.006
	ConPLex [43]	–	0.628 ± 0.012
DAVIS	BarlowDTI	0.9480 ± 0.0008	0.5524 ± 0.0011
	XGBoost	0.9285	0.4782
	MolTrans [42]	0.907 ± 0.002	0.404 ± 0.016
	Kang et al. [41]	0.920 ± 0.002	0.395 ± 0.007
	DLM-DTI [17]	0.895 ± 0.003	0.373 ± 0.017
	ConPLex [43]	–	0.458 ± 0.016

Performance was evaluated against three established benchmarks, and the mean and standard deviation of the performance of five replicates are presented. Results per benchmark that are both the best and statistically significant (Two-sided Welch’s t-test [52, 53], $α = 0.001$ with Benjamini-Hochberg [54] multiple test correction) are highlighted in bold