Skip to main content
. 2016 Dec 14;2016:7983236. doi: 10.1155/2016/7983236

Table 1.

Computational tools Description Website References
Alignment tools
Burrows-Wheeler Aligner (BWA) Perform short reads alignment using BWT approach against a references genome allowing for gaps/mismatches. http://bio-bwa.sourceforge.net/ [8]
Bowtie (1 & 2) Performs short read alignment using the Burrows-Wheeler index in order to be memory efficient, while still maintaining an alignment speed of over 25 million 35 bp reads per hour. http://bowtie-bio.sourceforge.net/index.shtml [9, 10]
ELAND Short read aligner that achieves speed by splitting reads into equal lengths and applying seed templates to guarantee hits with only 2 mismatches. http://www.illumina.com/ Illumina, Inc.
GEM Short read aligner using string matching instead of BWT to deliver precision and speed. http://algorithms.cnag.cat/wiki/The_GEM_library [11]
GSNAP Performs short and long read alignment, detects long and short distance splicing, SNPs, and is capable of detecting bisulfite-treated DNA for methylation studies. http://research-pub.gene.com/gmap/ [12]
MAQ Short read aligner compatible with Illumina-Solexa and ABI SOLiD data, performs ungapped alignment allowing 2-3 mismatches for single-end reads and one mismatch for paired-end reads. http://maq.sourceforge.net/ [13]
mrFAST Performs short read alignment allowing for INDELs up to 8 bp, for Illumina generated data. Paired-end mapping using a one end anchored algorithm allows for detection of novel insertions. http://mrfast.sourceforge.net/ [14]
Novoalign Alignment done on paired-end or single-end sequences, also capable of doing methylation studies. Allows for a mismatch up to 50% of a read length and has built-in adapter and base quality trimming. http://www.novocraft.com/products/novoalign/ http://www.novocraft.com/
SOAP (1 & 2) SOAP2 improved speed by an order of magnitude over SOAP1 and can align a wide range of read lengths at the speed of 2 minutes for one million single-end reads using a two-way BWT algorithm. http://soap.genomics.org.cn/ [15, 16]
SSAHA Uses a hashing algorithm to find exact or close to exact matching in DNA and protein databases, analogous to doing a BLAST search for each read. https://www.vectorbase.org/glossary/ssaha-sequence-search-and-alignment-hashing-algorithm/ [17]
Stampy Alignment done using a hashing algorithm and statistical model, to align Illumina reads for genome, RNA, and Chip sequencing allowing for a large number or variations including insertions and deletions. http://www.well.ox.ac.uk/project-stampy [18]
YOABS Uses a 0(n) algorithm that uses both hash and tri-based methods that are effective in aligning sequences over 200 bp with 3 times less memory and ten times faster than SSAHA. Available by request for noncommercial use [19]
HTSeq Python based package with many functions to facilitate several aspects of sequencing studies. http://www-huber.embl.de/HTSeq/doc/overview.html

Auxiliary tools
FastUniq Imports, sorts, and identifies PCR duplicates of short sequences from sequencing data. https://sourceforge.net/projects/fastuniq/ [23]
Picard Picard is a set of command line tools for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF. http://picard.sourceforge.net/
SAMtools Suite of tools capable of viewing, indexing, editing, writing, and reading SAM, BAM, and CRAM formatted files. http://www.htslib.org/ [7]

SNV and SV calling
GATK Variant calling of SNPs and small INDELs; can also be used on nonhuman and nondiploid organisms. https://www.broadinstitute.org/gatk/ [46]
SAMtools Suite of tools capable of viewing, indexing, editing, writing, and reading SAM, BAM, and CRAM formatted files. http://www.htslib.org/ [7]
VCMM Detection of SNVs and INDELs using the multinomial probabilistic method in WES and WGS data. http://emu.src.riken.jp/VCMM/ [25]
FreeBayes Detection of SNPs, MNPs, INDELs, and structural variants (SVs) from sequencing alignments using Bayesian statistical methods. https://github.com/ekg/freebayes [27]
indelMINER Splitread algorithm to identify breakpoint in INDELs from paired-end sequencing data. https://github.com/aakrosh/indelMINER [32]
Pindel Detection of INDELs using a pattern growth approach with anchor points to provide nucleotide-level resolution. http://gmt.genome.wustl.edu/packages/pindel/ [30]
Platypus Detection of SNPs, MNPs, INDELs, replacements, and structural variants (SVs) from sequencing alignments using local realignment and local assembly to achieve high specificity and sensitivity. http://www.well.ox.ac.uk/platypus [26]
Splitread Detection of INDELs less than 50 bp long from WES or WGS data, using a split-read algorithm. http://splitread.sourceforge.net/ [31]
Sprites Detection of INDELs is done using a split-read and soft-clipping approach that is especially sensitive in datasets with low coverage. https://github.com/zhangzhen/sprites [33]

VCF annotation
ANNOVAR Provides up-to-date annotation of VCF files by gene, region, and filters from several other databases. http://annovar.openbioinformatics.org/ [34]
MuTect Postprocesses variants to eliminate artifacts from hybrid capture, short read alignment, and next-generation sequencing. http://www.broadinstitute.org/cancer/cga/mutect [35]
SnpEff Uses 38,000 genomes to predict and annotate the effects of variants on genes. http://snpeff.sourceforge.net/ [36]
SnpSift Tools to manipulate VCF files including filtering, annotation, case controls, transition, and transversion rates and more. http://snpeff.sourceforge.net/SnpSift.html [37]
VAT Annotation of variants by functionality in a cloud computing environment. http://vat.gersteinlab.org/ [38]

Database filtration
1000 Genomes Project Genotype information from a population of 1000 healthy individuals. http://www.1000genomes.org/ [41]
dbSNP Database of genomic variants from 53 organisms. https://www.ncbi.nlm.nih.gov/projects/SNP/ [39]
LOVD Open source database of freely available gene-centered collection of DNA variants and storage of patient and NGS data. http://www.lovd.nl/3.0/home [40]
COSMIC Database containing somatic mutations from human cancers separated into expert curated data and genome-wide screen published in scientific literature. http://cancer.sanger.ac.uk/cosmic [42]
NHLBI GO Exome Sequencing Project (ESP) Database of genes and mechanisms that contribute to blood, lung, and heart disorders through NGS data in various populations. http://evs.gs.washington.edu/EVS/
Exome Aggregation Consortium (ExAC) Database of 60,706 unrelated individuals from disease and population exome sequencing studies. http://exac.broadinstitute.org/ [3]
SeattleSeq Annotation Part of the NHBLI sequencing project; this database contains novel and known SNVs and INDELs including accession number, function of the variant, and HapMap frequencies, clinical association, and PolyPhen predictions. http://snp.gs.washington.edu/SeattleSeqAnnotation137/

Functional predictors
CADD Machine learning algorithm to score all possible 8.6 million substitutions in the human reference genome from 1 to 99 based on known and simulated functional variants. http://cadd.gs.washington.edu/info [49]
FATHMM Uses Hidden Markov Models to predict the functional consequences of SNVs in coding and noncoding variants through a web server. http://fathmm.biocompute.org.uk/ [46]
LRT Uses the Likelihood Ratio statistical test to compare a variant to known variants and determine if they are predicted to be benign, deleterious, or unknown. http://genome.cshlp.org/content/19/9/1553.long [45]
PolyPhen-2 Predicts potential impact of a nonsynonymous variant using comparative and physical characteristics. http://genetics.bwh.harvard.edu/pph2/ [44]
SIFT By using PSI-BLAST, a prediction can be made on the effect of a nonsynonymous mutation within a protein. http://sift.jcvi.org/ [43]
VEST Machine learning approach to determine the probability that a missense mutation will impair the functionality of a protein. http://karchinlab.org/apps/appVest.html [48]
MetaSVM & MetaLR Integration of a Support Vector Machine and Logistic Regression to integrate nine deleterious prediction scores of missense mutations. https://sites.google.com/site/jpopgen/dbNSFP [47]

Significant somatic mutations
SomaticSniper Using two bam files as input, this tool uses the genotype likelihood model of MAZ to calculate the probability that the tumor and normal samples are different, thus identifying somatic variants. http://gmt.genome.wustl.edu/packages/somatic-sniper/ [50]
MuTect Using statistical analysis to predict the likelihood of a somatic mutation using two Bayesian approaches. https://www.broadinstitute.org/cancer/cga/mutect [35]
VarSim By leveraging on previously reported mutations, a random mutation simulation is preformed to predict somatic mutations. http://bioinform.github.io/varsim/ [51]
SomVarIUS Identification of somatic variants from unpaired tissue samples with a sequencing depth of 150x and 67% precision, implemented in Python. https://github.com/kylessmith/SomVarIUS [52]

Copy number alteration
Control-FREEC Detects copy number changes and loss of heterozygosity (LOH) from paired SAM/BAM files by computing and normalizing copy number and beta allele frequency. http://bioinfo-out.curie.fr/projects/freec/ [59]
CNV-seq Mapped read count is calculated over a sliding window in Perl and R to determine copy number from HTS studies. http://tiger.dbs.nus.edu.sg/cnv-seq/ [53]
SegSeq Using 14 million aligned sequence reads from cancer cell lines, equal copy number alterations are calculated from sequencing data. https://www.broadinstitute.org/cancer/cga/segseq [54]
VarScan2 Determines copy number changes in matched or unmatched samples using read ratios and then postprocessed with a circular binary segmentation algorithm. http://dkoboldt.github.io/varscan/using-varscan.html [61]
ExomeAI Detects allele imbalance including LOH in unmatched tumor samples using a statistical approach that is capable of handling low-quality datasets. http://gqinnovationcenter.com/index.aspx [64]
CNVseeqer Exon coverage between matched sequences was calculated using log2⁡ ratios followed by the circular binary segmentation algorithm. http://icb.med.cornell.edu/wiki/index.php?title=Elementolab/CNVseeqer&redirect=no [60]
EXCAVATOR Detects copy number variants from WES data in 3 steps using a Hidden Markov Model algorithm. https://sourceforge.net/projects/excavatortool/ [57]
ExomeCNV R package used to detect copy number variants of loss of heterozygosity from WES data. https://secure.genome.ucla.edu/index.php/ExomeCNV_User_Guide [58]
ADTEx Detection of aberrations in tumor exomes by detecting B-allele frequencies and implemented in R. http://adtex.sourceforge.net/ [55]
CONTRA Uses normalized depth of coverage to detect copy number changes from targeted resequencing data including WES. https://sourceforge.net/projects/contra-cnv/ [56]

Driver prediction tools
CHASM Machine learning method that predicts the functional significance of somatic mutations. http://karchinlab.org/apps/appChasm.html [65]
Dendrix De novo drivers are discovered from cancer only mutational data including genes, nucleotides, or domains that have high exclusivity and coverage. http://compbio.cs.brown.edu/projects/dendrix/ [66]
MutSigCV Gene-specific and patient-specific mutation frequencies are incorporated to find mutations in genes that are mutated more often than would be expected by chance. http://www.broadinstitute.org/cancer/software/genepattern/modules/docs/MutSigCV [67]

Pathway analysis tools and resources
KEGG Database using maps of known biological processes that allows searching for genes and color coding of results. http://www.genome.jp/kegg/ [68]
DAVID Allows for users to input a large set of genes and discover the functional annotation of the gene list including pathways, gene ontology terms, and more. https://david.ncifcrf.gov/ [69]
STRING Network visualization of protein-protein interactions of over 2,031 organisms. http://string-db.org/ [70]
BEReX Uses biomedical knowledge to allow users to search for relationships between biomedical entities. http://infos.korea.ac.kr/berex/ [71]
DAPPLE Uses a list of genes to determine physical connectivity among proteins according to protein-protein interactions. http://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1001273 [72]
SNPsea Uses a linkage disequilibrium to determine pathways and cell types that are likely to be affected based on SNP data. http://www.broadinstitute.org/mpg/snpsea/ [73]

Tools and resources for linking variants to therapeutics
cBioPortal Database that allows the download, analysis, and visualization of cancer sequencing studies, including providing patient and clinical data for samples. http://www.cbioportal.org/ [78]
My Cancer Genome Database for cancer research that provides linkage of mutational status to therapies and available clinical trials. https://www.mycancergenome.org/ http://www.mycancergenome.org/
ClinVar Database of relationship between phenotypes and human variations, showing the relationship between health status and human variations and known implications. https://www.ncbi.nlm.nih.gov/clinvar/ [74]
DSigDB Database of drug signatures that includes 19,531 genes and 17,389 compounds that can in part help identify compounds for drug repurposing studies in translational research. http://tanlab.ucdenver.edu/DSigDB [77]
PharmGKB Knowledge base allowing visualization of a variety of drug-gene knowledge. https://www.pharmgkb.org/ [75]
DrugBank Contains detailed drug information with comprehensive drug target information for 8,206 drugs. http://www.drugbank.ca/ [76]

WES data analysis pipelines
fast2VCF Whole Exome Sequencing pipeline that starts with raw sequencing (fastq) files and ends with a VCF file that has good capability for novel and expert users. http://fastq2vcf.sourceforge.net/ [80]
SeqMule WES or WGS pipeline that combines the information from over ten alignment and analysis tools to arrive at a VCF file that can be used in both Mendelian and cancer studies. http://seqmule.openbioinformatics.org/en/latest/ [79]
IMPACT WES data analysis pipeline that starts with raw sequencing reads and analyzes SNVs and CNAs and links this data to a list of prioritized drugs from clinical trials and DSigDB. http://tanlab.ucdenver.edu/IMPACT/ [81]
Genomes on the Cloud (GotCloud) Automated sequencing pipeline that performs in part alignment, variant calling, and quality control that can be run on Amazon Web Services EC2 as well as local machines and clusters. http://genome.sph.umich.edu/wiki/GotCloud