Skip to main content
. 2002 Jul;12(7):1106–1111. doi: 10.1101/gr.224502

Figure 5.

Figure 5

Binary encoding of DNA sequences. (A) Examples of how a DNA sequence, using 8-mers as an example, can be represented by two binary strings (row and column) or their corresponding integers. As an example, ATGCCTAT will be represented by the combination of two integers, 162 and 154, where 162 is the decimal-based value of the binary string 10100010 and 154 is that of 10011010; the two binary strings together encode the sequence ATGCCTAT because A is coded as (1,1), T as (0,0), G as (1,0), and C as (0,1) (see text). The first two sequences in the figure are neighboring 8-mers. (B) The data structure, sorted by row integers, used to store these fixed-length DNA sequences.