Skip to main content
. 2020 Oct 2;22(4):bbaa229. doi: 10.1093/bib/bbaa229

Table 1.

Different parameters for k-mers

Length Window Tokenized Vectorization
3 3 ATC GCG TAC GAT CCG 0321 3412 4532 4214
4 4 ATCG CGTA CGAT 0123 3412 4532
5 5 ATCGC GTACG ATCCG 4124 5124 2134
4 2 ATCG CGCG CGTA TACG CGAT ATCC 2563 3124 4236 3578 2145
4 3 ATCG GCGT TACG GATC 4252 5134 2136 3451 2411

It shows DNA sequence ‘ATCGCGTACGATCCG’ is cut into multiple different k-mers and his vector when the length is (3,4,5,4,4) and the window is (3,4,5,2,3).