Skip to main content
. 2021 Apr 28;11:9134. doi: 10.1038/s41598-021-88708-4

Figure 4.

Figure 4

K-mer sequence decomposition and reconstruction. (a) Sequences can be decomposed into all possible masked 3-mers (i.e. X1X2X3 separated into X1X2_ and X1_X3) as shown for PTX7. Each masked 3-mer is counted generating an 882-dimension vector (non-zero elements shown). Vectors are normalized (divided by 15 in the case of PTX7) and multiplied by their enrichment score. Principal component analysis (PCA) identifies enriched k-mers, which allow a sequence to be reconstructed. An arbitrary cut-off (0.1) can be used to minimize noise and facilitate assembly. (b) The sequence reconstruction of the first PCA component calculated from the second library selection.