. 2021 Feb 9;7:e365. doi: 10.7717/peerj-cs.365

Table 1. Input dimensions and training time for corresponding encoding techniques and k-mer sizes.

Techniques	Training time (min)	Input vector size	Acc	Sn	Sp	MCC
	1-mer
One Hot Encoding	28	4 X 1000	0.95	0.98	0.90	0.90
Frequency Based Tokenization	14	1 X 1000	0.97	0.98	0.99	0.97
	2-mer
One Hot Encoding	240	16 X 999	0.96	0.97	0.95	0.93
Frequency Based Tokenization	14.3	1 x 999	0.96	0.98	0.93	0.89