Table 2.
The top 21 6-mers: Z-DNABERT attention rank versus the 6-mer frequency rank in the experimental datasets tested for tuning the model.
| hg38 Kouzine et al | hg38 Shin et al | |||
|---|---|---|---|---|
| Attention rank | 6-mer | Frequency | 6-mer | Frequency |
| 1 | GCGCGC | 1 | TGTGTG | 1 |
| 2 | GTGTGT | 5 | GTGTGT | 2 |
| 3 | CGCGCG | 2 | CGCGCG | 4 |
| 4 | ACACAC | 6 | GCGCGC | 3 |
| 5 | TGTGTG | 3 | CACACA | 5 |
| 6 | GCGCGG | 7 | ACACAC | 6 |
| 7 | CACACA | 4 | GGGGAA | 40 |
| 8 | CCGCGC | 10 | AAAAAA | 17 |
| 9 | GGGCGC | 11 | CAGGGA | 43 |
| 10 | GCGCCC | 12 | GTGCGC | 11 |
| 11 | GTGCGC | 17 | TGGGGA | 331 |
| 12 | GGCGCG | 9 | GGGGGA | 39 |
| 13 | GTGTGC | 14 | GCTGGG | 9 |
| 14 | GCGCAC | 19 | GTGTGC | 7 |
| 15 | GCACAC | 15 | TGCGCG | 8 |
| 16 | GCCCGC | 20 | TGCATG | 21 |
| 17 | GCGGGC | 16 | GGGAAG | 33 |
| 18 | CGCGCC | 8 | AGGGAG | 429 |
| 19 | GCGTGC | 25 | GGGAGC | 458 |
| 20 | GCACGC | 26 | AGAAAG | 38 |
| 21 | CCCGCG | 18 | GGGAAA | 80 |
The model based on the experimental Kouzine et al data was used in the paper rather than the much smaller 150 bp resolution ChIP-seq data of Shin et al.