Skip to main content
. 2021 Feb 5;9:e10805. doi: 10.7717/peerj.10805

Figure 1. Closed syncmers.

Figure 1

Construction of k = 5, s = 2 closed syncmers with lexicographic coding. A k-mer is a closed syncmer if its smallest s-mer is at the beginning or end of the k-mer sequence. Consider a window of three k-mers (length L = 2ks − 1 = 7 letters) with the sequence shown in (A). The smallest s-mer is AA (orange background). (B) Shows the six s-mers in the sequence in (A). Each s-mer is shown with a gray background in the k-mer where it appears in the first or last position. (B) Illustrates that every s-mer in the sequence shown in (A) appears at the start or end of a k-mer. Therefore, regardless of which s-mer has the smallest value, there is a k-mer in the window for which this s-mer appears at the first or last position. In this example, AA appears at the end of GGCAA, marked with an asterisk (*) and GGCAA is therefore a syncmer. This shows that every window of length L must contain at least one syncmer. Note that while flanking sequence is shown in the figure, GGCAA is recognized as a syncmer from its sequence alone because its smallest 2-mer appears at the end. Closed syncmers tend to form pairs spaced at the maximum possible distance (ks) as illustrated in (C). (D) Illustrates how k = 5, s = 2 closed syncmers are identified in a longer string. The smallest s-mer in each k-mer is shaded with a color. Blue background indicates that the smallest s-mer is not at the start or end; if it does appears at the start or end then it has an orange background and the k-mer is a closed syncmer (indicated by an asterisk).