Skip to main content
[Preprint]. 2024 Aug 18:2024.08.16.608288. [Version 1] doi: 10.1101/2024.08.16.608288

Table 2:

The AUC results for binary sequence classification tasks which have multi-species involved, including promoter region prediction (first four rows), human vs worm classification and mouse transcription factor binding site (TFBS) identification. The results for mouse TFBS are averaged over 5 independent datasets focusing on different TFBSs. Using sentence-level summary token pooling method. **DeLong Test significance < 0.01. Bolded value: DeLong Test significance < 0.05.

Data DNABERT-2 NT-v2 HyenaDNA
Promoter B_amyloliquefaciens 0.856** 0.797 0.688
Promoter R_capsulatus 0.661 0.668 0.602
Promoter Arabidopsis NonTATA 0.891** 0.85 0.814
Promoter Arabidopsis TATA 0.903** 0.855 0.82
Human vs worm 0.946** 0.919 0.837
Mouse TFBS 0.700 0.722 0.624