Skip to main content
. 2017 Oct 3;18:186. doi: 10.1186/s13059-017-1319-7

Table 2.

Alignment-free sequence comparison tools available for research purposes

Category Name Features Implementation Reference URL
Pairwise and multiple sequence comparison ALF Calculation of pairwise similarity scores (using N2 measure) for sequences in fasta file Software (C++) [101] https://github.com/seqan/seqan/tree/master/apps/alf
Alfree 25 word-based measures, 8 IT-based measures, 3 graph-based measures, W-metric Web service Software (Python) This article http://www.combio.pl/alfree
decaf + py 13 word-based measures, Lempel–Ziv complexity-based measure, average common substring distance, W-metric Software (Python) [52, 53] http://bioinformatics.org.au/tools/decaf+py/
multiAlignFree Multiple alignment-free sequence comparison using five word-based statistics R package [167] http://www-rcf.usc.edu/~fsun/Programs/multiAlignFree/
NASC Non-aligned sequence comparison: four word-based measures and 2 IT-based measures Matlab framework [38] http://web.ist.utl.pt/susanavinga/NASC/
Whole-genome phylogeny ALFRED ALFRED-G Phylogenetic tree reconstruction based on the average common substring approach Software (C++) [168, 169] http://alurulab.cc.gatech.edu/phylo
andi Computation of evolutionary distances between closely related genomes by approximation of local alignments (k-mer based da measure); scalable to thousands of bacterial genomes Software (C) [170] https://github.com/evolbioinf/andi/
CAFE Alignment-free analysis platform for studying the relationships among genomes and metagenomes (offers 28 word-based dissimilarity measures) Software (C) [171] https://github.com/younglululu/CAFE
CVTree3 Phylogeny reconstruction from whole genome sequences based on word composition Web service [172, 173] http://tlife.fudan.edu.cn/cvtree3
DLTree Automated whole genome/proteome-based phylogenetic analysis based on alignment-free dynamical language method Web Service [174] http://dltree.xtu.edu.cn
FFP Feature frequency profile-based measures for whole genome/proteome comparisons (from viral to mammalian scale) Software (C/Perl) [34, 55, 112] https://sourceforge.net/projects/ffp-phylogeny/
jD2Stat (JIWA) Generation of the distance matrix using D 2 statistics to extract k-mers from large-scale unaligned genome sequences Software (Java) [54] http://bioinformatics.org.au/tools/jD2Stat/
kr Efficient word-based estimation of mutation distances from unaligned genomes Software (C) [175] http://guanine.evolbio.mpg.de/cgi-bin/kr2/kr.cgi.pl
FSWM/kmacs/Spaced Three tools for alignment-free sequence comparison based on inexact word matches Software (C++) Web service [36, 176] Software currently unavailable
Software currently unavailable
Software currently unavailable
SlopeTree Whole genome phylogeny that corrects for HGT Software (C++) http://prodata.swmed.edu/download/pub/slopetree_v1/
Underlying Approach Phylogeny of whole genomes using composition of subwords Software (Java) [139] http://www.dei.unipd.it/~ciompin/main/underlying.html
Sequence similarity search tool RAFTS3 Searches of similar protein sequences against a protein database (>300 times faster than BLAST) Matlab [177] https://sourceforge.net/projects/rafts3/
Annotation of long non-coding RNA FEELnc Prediction of lncRNAs from RNA-seq samples based word frequencies and relaxed open reading frames Software (Perl/R) [178] https://github.com/tderrien/FEELnc
lncScore Identification of long non-coding RNA from assembled novel transcripts Software (Python) [152] https://github.com/WGLab/lncScore
Horizontal gene transfer alfy Alignment-free local homology calculation for detecting horizontal gene transfer Software (C) [104, 109] http://guanine.evolbio.mpg.de/alfy/
rush Detection of recombination between two unaligned DNA sequences Software (C) [105] http://guanine.evolbio.mpg.de/rush/
Smash Identification and visualization of DNA rearrangements between pairs of sequences Software (C) [179] http://bioinformatics.ua.pt/software/smash/
TF-IDF Detection of HGT regions and the transfer direction in nucleotide/protein sequences Software (C++) [110, 180] https://github.com/congyingnan/TF-IDF
Regulatory elements D2Z Identification of functionally related homologous regulatory elements Software (Perl) [102] http://veda.cs.uiuc.edu/d2z/
MatrixREDUCE Prediction of functional regulatory targets of TFs by predicting the total affinity of each promoter and orthologous promoters Software (Python) [181] https://systemsbiology.columbia.edu/matrixreduce
RRS Detection of functionally similar group of enhancers and their regions Software (Perl/C) [182] http://goo.gl/7gW578
Sequence clustering d2_cluster Word-based clustering EST and full-length cDNA sequences Software (C) [123] https://github.com/shaze/wcdest/
d2-vlmc Word-based clustering of metatranscriptomic samples using variable length Markov chains Software (Python) [183] https://d2vlmc.codeplex.com/
mBKM Clustering of DNA sequences using Shannon entropy and Euclidean distance Software (Java) [124] https://github.com/Huiyang520/DMk-BKmeans
kClust Large-scale clustering of protein sequences (down to 20–30% sequence identity) Software (C++) [125] https://github.com/soedinglab/kClust
Other COMET Rapid classification of HIV-1 nucleotide sequences into subtypes based on prediction by partial matching compression Web service [184] https://comet.lih.lu/
PPI Identification of protein–protein interaction by coevolution analysis using discrete Fourier transform Software (Python) [185] https://github.com/cyinbox/PPI
VaxiJen Antigen prediction based on uniform vectors of principal amino acid properties Web service [127] http://www.ddg-pharmfac.net/vaxijen/VaxiJen/VaxiJen.html

The up-to-date list of currently available programs can be found at http://www.combio.pl/alfree/tools/. Accessed 23 August 2017

HGT horizontal gene transfer, IT information theory