Table 1.
Category | Analysis | Tool | Primary features | Implementation | Reference | URL |
---|---|---|---|---|---|---|
Mapping | Transcript quantification | kallisto | Transcript abundance quantification from RNA-seq data (uses pseudoalignment for rapid determination of read compatibility with targets) | Software (C++) | [69] | https://pachterlab.github.io/kallisto/ |
Sailfish | Estimation of isoform abundances from reference sequences and RNA-seq data (k-mer based) | Software (C++) | [67] | http://www.cs.cmu.edu/~ckingsf/software/sailfish/ | ||
Salmon | Quantification of the expression of transcripts using RNA-seq data (uses k-mers) | [70] | https://combine-lab.github.io/salmon/ | |||
RNA-Skim | RNA-seq quantification at transcript-level (partitions the transcriptome into disjoint transcript clusters; uses sig-mers, a special type of k-mers) | Software (C++) | [68] | http://www.csbio.unc.edu/rs/ | ||
Variant calling | ChimeRScope | Fusion transcript prediction using gene k-mers profiles of the RNA-seq paired-end reads | Software (Java) | [74] | https://github.com/ChimeRScope/ChimeRScope/wiki | |
FastGT | Genotyping of known SNV/SNP variants directly from raw NGS sequence reads by counting unique k-mers | Software (C) | [73] | https://github.com/bioinfo-ut/GenomeTester4/ | ||
Phy-Mer | Reference-independent mitochondrial haplogroup classifier from NGS data (k-mer based) | Software (Python) | [157] | https://github.com/danielnavarrogomez/phy-mer | ||
LAVA | Genotyping of known SNPs (dbSNP and Affymetrix's Genome-Wide Human SNP Array) from raw NGS reads (k-mer based) | Software (C) | [71] | http://lava.csail.mit.edu/ | ||
MICADo | Detection of mutations in targeted third-generation NGS data (can distinguish patients’ specific mutations; algorithm uses k-mers and is based on colored de Bruijn graphs) | Software (Python) | [72] | http://github.com/cbib/MICADo | ||
General mapper | Minimap | Lightweight and fast read mapper and read overlap detector (uses the concept of “minimazers”, a special type of k-mers) | Software (C) | [77] | https://github.com/lh3/minimap | |
Assembly | De novo genome assembly | MHAP | Produces highly continuous assembly (fully resolved chromosome arms) from third-generation long and noisy reads (10 kbp) using a dimensionality reduction technique MinHash | Software (Java) | [76] | https://github.com/marbl/MHAP |
Miniasm | Assembler of long noisy reads (SMRT, ONT) using the Overlap-Layout Consensus (OLC) approach without the necessity of an error correction stage (uses minimap) | Software (C) | [77] | https://github.com/lh3/miniasm | ||
LINKS | Scaffolding genome assembly with error-containing long sequence (e.g., ONT or PacBio reads, draft genomes) | Software (Perl) | [75] | https://github.com/warrenlr/LINKS/ | ||
Read clustering | afcluster | Clustering of reads from different genes and different species based on k-mer counts | Software (C++) | [158] | https://github.com/luscinius/afcluster | |
QCluster | Clustering of reads with alignment-free measures (k-mer based) and quality values | Software (C++) | [159] | http://www.dei.unipd.it/~ciompin/main/qcluster.html | ||
Reads error correction | Lighter | Correction of sequencing errors in raw, whole genome sequencing reads (k-mer based) | Software (C++) | [94] | https://github.com/mourisl/Lighter | |
QuorUM | Error corrector for Illumina reads using k-mers | Software (C++) | [93] | https://github.com/gmarcais/Quorum | ||
Trowel | Software (C++) | [95] | https://sourceforge.net/projects/trowel-ec/ | |||
Metagenomics | Assembly-free phylogenomics | AAF | Phylogeny reconstruction directly from unassembled raw sequence data from whole genome sequencing projects; provides bootstrap support to assess uncertainty in the tree topology (k-mer based) | Software (Python) | [78] | https://github.com/fanhuan/AAF |
kSNP v3 | Reference-free SNP identification and estimation of phylogenetic trees using SNPs (based on k-mer analysis) | Software (C) | [80, 81] | https://sourceforge.net/projects/ksnp/files/ | ||
NGS-MC | Phylogeny of species based on NGS reads using alignment-free sequence dissimilarity measures d2* and d2 S under different Markov chain models (using k-words) | R package | [79, 160] | http://www-rcf.usc.edu/~fsun/Programs/NGS-MC/NGS-MC.html | ||
Species identification/taxonomic profiling | CLARK | Taxonomic classification of metagenomic reads to known bacterial genomes using k-mer search and LCA assignment | Software (C++) | [84] | http://clark.cs.ucr.edu/ | |
FOCUS | Reports organisms present in metagenomic samples and profiles their abundances (uses composition-based approach and non-negative least squares for prediction) | Web service Software (Python) | [161] | http://edwards.sdsu.edu/FOCUS/ | ||
GSM | Estimation of abundances of microbial genomes in metagenomic samples (k-mer based) | Software (Go) | [162] | https://github.com/pdtrang/GSM | ||
Mash | Species identification using assembled or unassembled Illumina, PacBio, and ONT data (based on MinHash dimensionality-reduction technique) | Software (C++) | [163] | https://github.com/marbl/mash | ||
Kraken | Taxonomic assignment in metagenome analysis by exact k-mer search; LCA assignment of short reads based on a comprehensive sequence database | Software (C++) | [83] | https://ccb.jhu.edu/software/kraken/ | ||
LMAT | Assignment of taxonomic labels to reads by k-mers searches in precomputed database | Software (C++/Python) | [82] | https://sourceforge.net/projects/lmat/ | ||
stringMLST | k-mer-based tool for MLST directly from the genome sequencing reads | Software (Python) | [86] | http://jordan.biology.gatech.edu/page/software/stringMLST | ||
Taxonomer | k-mer-based ultrafast metagenomics tool for assigning taxonomy to sequencing reads from clinical and environmental samples | Web service | [164] | http://taxonomer.iobio.io/ | ||
Other | d2-tools | Word-based (k-tuple) comparison (pairwise dissimilarity matrix using d2S measure) of metatranscriptomic samples from NGS reads | Software (Python/R) | [56, 165] | https://code.google.com/p/d2-tools/ | |
VirHostMatcher | Prediction of hosts from metagenomic viral sequences based on ONF using various distance measures (e.g., d2) | Software (C++) | [153] | https://github.com/jessieren/VirHostMatcher | ||
MetaFast | Statistics calculation of metagenome sequences and the distances between them based on assembly using de Bruijn graphs and Bray–Curtis dissimilarity measure | Software (Java) | [166] | https://github.com/ctlab/metafast |
The up-to-date list of currently available programs can be found at http://www.combio.pl/alfree/tools/. Accessed 23 August 2017
LCA lowest common ancestor, NGS next-generation sequencing, SNP single-nucleotide polymorphism, SNV single-nucleotide variant