Figure - PMC

Skip to main content

An official website of the United States government

Here's how you know

Here's how you know

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

View full-text article in PMC

. 2024 Oct;34(10):1661–1673. doi: 10.1101/gr.279449.124

Search in PMC
Search in PubMed
View in NLM Catalog
Add to search

© 2024 Derelle et al.; Published by Cold Spring Harbor Laboratory Press

This article, published in Genome Research, is available under a Creative Commons License (Attribution 4.0 International), as described at http://creativecommons.org/licenses/by/4.0/.

PMC Copyright notice

Figure 1. — Overview of functions and methods in SKA2. Split k-mers allow matching variant positions, whereas contiguous k-mers mismatch any variation. ska build creates split k-mer dictionaries from input sequence data. The example shows four sequences that are aligned and on the same strand for clarity, but in real input data, neither is necessary. Split k-mers are used as keys, and their middle bases are stored in lists. This dictionary is compressed using snappy to make split k-mer files (SKFs). ska align makes reference-free alignments with no coordinate system by writing out the middle bases, applying filters on the frequency of missing data, constant sites, and ambiguous sites. ska map makes reference-based mappings as ALN or VCF, with the same coordinate system as the reference. In both modes, the conserved sites are also written out but are not shown for clearer visualization. ska cov counts k-mers and fits a mixture model to find a threshold for count when using reads as input to ska build. ska distance calculates SNP distances and mismatches between samples by multiplying the middle base matrix by its transpose. The cluster_dists.py script can be run on this distance matrix to make phylogeny, single-linkage clusters with a provided threshold, and a Microreact visualization. Operations to merge, delete samples and split k-mers, and write out the contents of SKFs are also implemented but are not shown.