Skip to main content
. 2020 Sep 4;8:1032. doi: 10.3389/fbioe.2020.01032

TABLE 1.

Common ways of encoding DNA sequences.

Encoding method Features
Sequential encoding This method encodes each base as a number. For example, change [A,T,G,C] to [0.25, 0.5, 0.75, 1.0], and any other character can be recorded as zero.
One-hot encoding This method is widely used in deep learning methods. For example, [A,T,G,C] will become [0,0,0,1], [0,0,1,0], [0,1,0,0], [1,0,0,0]. These coded vectors can be connected or turned into a two-dimensional array.
K-mer encoding First take a longer biological sequence and decompose it into k-length overlapping fragments. For example, if we use a segment of length 6, “ATGCATGCA” will become: “ATGCAT,” “TGCATG,” “GCATGC,” “CATGCA.”