Skip to main content
. 2021 Feb 9;7:e365. doi: 10.7717/peerj-cs.365

Table 1. Input dimensions and training time for corresponding encoding techniques and k-mer sizes.

Techniques Training time (min) Input vector size Acc Sn Sp MCC
1-mer
One Hot Encoding 28 4 X 1000 0.95 0.98 0.90 0.90
Frequency Based Tokenization 14 1 X 1000 0.97 0.98 0.99 0.97
2-mer
One Hot Encoding 240 16 X 999 0.96 0.97 0.95 0.93
Frequency Based Tokenization 14.3 1 x 999 0.96 0.98 0.93 0.89