Table 2.
Category | Name | Features | Implementation | Reference | URL |
---|---|---|---|---|---|
Pairwise and multiple sequence comparison | ALF | Calculation of pairwise similarity scores (using N2 measure) for sequences in fasta file | Software (C++) | [101] | https://github.com/seqan/seqan/tree/master/apps/alf |
Alfree | 25 word-based measures, 8 IT-based measures, 3 graph-based measures, W-metric | Web service Software (Python) | This article | http://www.combio.pl/alfree | |
decaf + py | 13 word-based measures, Lempel–Ziv complexity-based measure, average common substring distance, W-metric | Software (Python) | [52, 53] | http://bioinformatics.org.au/tools/decaf+py/ | |
multiAlignFree | Multiple alignment-free sequence comparison using five word-based statistics | R package | [167] | http://www-rcf.usc.edu/~fsun/Programs/multiAlignFree/ | |
NASC | Non-aligned sequence comparison: four word-based measures and 2 IT-based measures | Matlab framework | [38] | http://web.ist.utl.pt/susanavinga/NASC/ | |
Whole-genome phylogeny | ALFRED ALFRED-G | Phylogenetic tree reconstruction based on the average common substring approach | Software (C++) | [168, 169] | http://alurulab.cc.gatech.edu/phylo |
andi | Computation of evolutionary distances between closely related genomes by approximation of local alignments (k-mer based da measure); scalable to thousands of bacterial genomes | Software (C) | [170] | https://github.com/evolbioinf/andi/ | |
CAFE | Alignment-free analysis platform for studying the relationships among genomes and metagenomes (offers 28 word-based dissimilarity measures) | Software (C) | [171] | https://github.com/younglululu/CAFE | |
CVTree3 | Phylogeny reconstruction from whole genome sequences based on word composition | Web service | [172, 173] | http://tlife.fudan.edu.cn/cvtree3 | |
DLTree | Automated whole genome/proteome-based phylogenetic analysis based on alignment-free dynamical language method | Web Service | [174] | http://dltree.xtu.edu.cn | |
FFP | Feature frequency profile-based measures for whole genome/proteome comparisons (from viral to mammalian scale) | Software (C/Perl) | [34, 55, 112] | https://sourceforge.net/projects/ffp-phylogeny/ | |
jD2Stat (JIWA) | Generation of the distance matrix using D 2 statistics to extract k-mers from large-scale unaligned genome sequences | Software (Java) | [54] | http://bioinformatics.org.au/tools/jD2Stat/ | |
kr | Efficient word-based estimation of mutation distances from unaligned genomes | Software (C) | [175] | http://guanine.evolbio.mpg.de/cgi-bin/kr2/kr.cgi.pl | |
FSWM/kmacs/Spaced | Three tools for alignment-free sequence comparison based on inexact word matches | Software (C++) Web service | [36, 176] | Software currently unavailable Software currently unavailable Software currently unavailable |
|
SlopeTree | Whole genome phylogeny that corrects for HGT | Software (C++) | http://prodata.swmed.edu/download/pub/slopetree_v1/ | ||
Underlying Approach | Phylogeny of whole genomes using composition of subwords | Software (Java) | [139] | http://www.dei.unipd.it/~ciompin/main/underlying.html | |
Sequence similarity search tool | RAFTS3 | Searches of similar protein sequences against a protein database (>300 times faster than BLAST) | Matlab | [177] | https://sourceforge.net/projects/rafts3/ |
Annotation of long non-coding RNA | FEELnc | Prediction of lncRNAs from RNA-seq samples based word frequencies and relaxed open reading frames | Software (Perl/R) | [178] | https://github.com/tderrien/FEELnc |
lncScore | Identification of long non-coding RNA from assembled novel transcripts | Software (Python) | [152] | https://github.com/WGLab/lncScore | |
Horizontal gene transfer | alfy | Alignment-free local homology calculation for detecting horizontal gene transfer | Software (C) | [104, 109] | http://guanine.evolbio.mpg.de/alfy/ |
rush | Detection of recombination between two unaligned DNA sequences | Software (C) | [105] | http://guanine.evolbio.mpg.de/rush/ | |
Smash | Identification and visualization of DNA rearrangements between pairs of sequences | Software (C) | [179] | http://bioinformatics.ua.pt/software/smash/ | |
TF-IDF | Detection of HGT regions and the transfer direction in nucleotide/protein sequences | Software (C++) | [110, 180] | https://github.com/congyingnan/TF-IDF | |
Regulatory elements | D2Z | Identification of functionally related homologous regulatory elements | Software (Perl) | [102] | http://veda.cs.uiuc.edu/d2z/ |
MatrixREDUCE | Prediction of functional regulatory targets of TFs by predicting the total affinity of each promoter and orthologous promoters | Software (Python) | [181] | https://systemsbiology.columbia.edu/matrixreduce | |
RRS | Detection of functionally similar group of enhancers and their regions | Software (Perl/C) | [182] | http://goo.gl/7gW578 | |
Sequence clustering | d2_cluster | Word-based clustering EST and full-length cDNA sequences | Software (C) | [123] | https://github.com/shaze/wcdest/ |
d2-vlmc | Word-based clustering of metatranscriptomic samples using variable length Markov chains | Software (Python) | [183] | https://d2vlmc.codeplex.com/ | |
mBKM | Clustering of DNA sequences using Shannon entropy and Euclidean distance | Software (Java) | [124] | https://github.com/Huiyang520/DMk-BKmeans | |
kClust | Large-scale clustering of protein sequences (down to 20–30% sequence identity) | Software (C++) | [125] | https://github.com/soedinglab/kClust | |
Other | COMET | Rapid classification of HIV-1 nucleotide sequences into subtypes based on prediction by partial matching compression | Web service | [184] | https://comet.lih.lu/ |
PPI | Identification of protein–protein interaction by coevolution analysis using discrete Fourier transform | Software (Python) | [185] | https://github.com/cyinbox/PPI | |
VaxiJen | Antigen prediction based on uniform vectors of principal amino acid properties | Web service | [127] | http://www.ddg-pharmfac.net/vaxijen/VaxiJen/VaxiJen.html |
The up-to-date list of currently available programs can be found at http://www.combio.pl/alfree/tools/. Accessed 23 August 2017
HGT horizontal gene transfer, IT information theory