Alignment tools
|
Burrows-Wheeler Aligner (BWA) |
Perform short reads alignment using BWT approach against a references genome allowing for gaps/mismatches. |
http://bio-bwa.sourceforge.net/
|
[8] |
Bowtie (1 & 2) |
Performs short read alignment using the Burrows-Wheeler index in order to be memory efficient, while still maintaining an alignment speed of over 25 million 35 bp reads per hour. |
http://bowtie-bio.sourceforge.net/index.shtml
|
[9, 10] |
ELAND |
Short read aligner that achieves speed by splitting reads into equal lengths and applying seed templates to guarantee hits with only 2 mismatches. |
http://www.illumina.com/
|
Illumina, Inc. |
GEM |
Short read aligner using string matching instead of BWT to deliver precision and speed. |
http://algorithms.cnag.cat/wiki/The_GEM_library
|
[11] |
GSNAP |
Performs short and long read alignment, detects long and short distance splicing, SNPs, and is capable of detecting bisulfite-treated DNA for methylation studies. |
http://research-pub.gene.com/gmap/
|
[12] |
MAQ |
Short read aligner compatible with Illumina-Solexa and ABI SOLiD data, performs ungapped alignment allowing 2-3 mismatches for single-end reads and one mismatch for paired-end reads. |
http://maq.sourceforge.net/
|
[13] |
mrFAST |
Performs short read alignment allowing for INDELs up to 8 bp, for Illumina generated data. Paired-end mapping using a one end anchored algorithm allows for detection of novel insertions. |
http://mrfast.sourceforge.net/
|
[14] |
Novoalign |
Alignment done on paired-end or single-end sequences, also capable of doing methylation studies. Allows for a mismatch up to 50% of a read length and has built-in adapter and base quality trimming. |
http://www.novocraft.com/products/novoalign/
|
http://www.novocraft.com/
|
SOAP (1 & 2) |
SOAP2 improved speed by an order of magnitude over SOAP1 and can align a wide range of read lengths at the speed of 2 minutes for one million single-end reads using a two-way BWT algorithm. |
http://soap.genomics.org.cn/
|
[15, 16] |
SSAHA |
Uses a hashing algorithm to find exact or close to exact matching in DNA and protein databases, analogous to doing a BLAST search for each read. |
https://www.vectorbase.org/glossary/ssaha-sequence-search-and-alignment-hashing-algorithm/
|
[17] |
Stampy |
Alignment done using a hashing algorithm and statistical model, to align Illumina reads for genome, RNA, and Chip sequencing allowing for a large number or variations including insertions and deletions. |
http://www.well.ox.ac.uk/project-stampy
|
[18] |
YOABS |
Uses a 0(n) algorithm that uses both hash and tri-based methods that are effective in aligning sequences over 200 bp with 3 times less memory and ten times faster than SSAHA. |
Available by request for noncommercial use |
[19] |
HTSeq |
Python based package with many functions to facilitate several aspects of sequencing studies. |
http://www-huber.embl.de/HTSeq/doc/overview.html
|
|
|
Auxiliary tools
|
FastUniq |
Imports, sorts, and identifies PCR duplicates of short sequences from sequencing data. |
https://sourceforge.net/projects/fastuniq/
|
[23] |
Picard |
Picard is a set of command line tools for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF. |
http://picard.sourceforge.net/
|
|
SAMtools |
Suite of tools capable of viewing, indexing, editing, writing, and reading SAM, BAM, and CRAM formatted files. |
http://www.htslib.org/
|
[7] |
|
SNV and SV calling
|
GATK |
Variant calling of SNPs and small INDELs; can also be used on nonhuman and nondiploid organisms. |
https://www.broadinstitute.org/gatk/
|
[4–6] |
SAMtools |
Suite of tools capable of viewing, indexing, editing, writing, and reading SAM, BAM, and CRAM formatted files. |
http://www.htslib.org/
|
[7] |
VCMM |
Detection of SNVs and INDELs using the multinomial probabilistic method in WES and WGS data. |
http://emu.src.riken.jp/VCMM/
|
[25] |
FreeBayes |
Detection of SNPs, MNPs, INDELs, and structural variants (SVs) from sequencing alignments using Bayesian statistical methods. |
https://github.com/ekg/freebayes
|
[27] |
indelMINER |
Splitread algorithm to identify breakpoint in INDELs from paired-end sequencing data. |
https://github.com/aakrosh/indelMINER
|
[32] |
Pindel |
Detection of INDELs using a pattern growth approach with anchor points to provide nucleotide-level resolution. |
http://gmt.genome.wustl.edu/packages/pindel/
|
[30] |
Platypus |
Detection of SNPs, MNPs, INDELs, replacements, and structural variants (SVs) from sequencing alignments using local realignment and local assembly to achieve high specificity and sensitivity. |
http://www.well.ox.ac.uk/platypus
|
[26] |
Splitread |
Detection of INDELs less than 50 bp long from WES or WGS data, using a split-read algorithm. |
http://splitread.sourceforge.net/
|
[31] |
Sprites |
Detection of INDELs is done using a split-read and soft-clipping approach that is especially sensitive in datasets with low coverage. |
https://github.com/zhangzhen/sprites
|
[33] |
|
VCF annotation
|
ANNOVAR |
Provides up-to-date annotation of VCF files by gene, region, and filters from several other databases. |
http://annovar.openbioinformatics.org/
|
[34] |
MuTect |
Postprocesses variants to eliminate artifacts from hybrid capture, short read alignment, and next-generation sequencing. |
http://www.broadinstitute.org/cancer/cga/mutect
|
[35] |
SnpEff |
Uses 38,000 genomes to predict and annotate the effects of variants on genes. |
http://snpeff.sourceforge.net/
|
[36] |
SnpSift |
Tools to manipulate VCF files including filtering, annotation, case controls, transition, and transversion rates and more. |
http://snpeff.sourceforge.net/SnpSift.html
|
[37] |
VAT |
Annotation of variants by functionality in a cloud computing environment. |
http://vat.gersteinlab.org/
|
[38] |
|
Database filtration
|
1000 Genomes Project |
Genotype information from a population of 1000 healthy individuals. |
http://www.1000genomes.org/
|
[41] |
dbSNP |
Database of genomic variants from 53 organisms. |
https://www.ncbi.nlm.nih.gov/projects/SNP/
|
[39] |
LOVD |
Open source database of freely available gene-centered collection of DNA variants and storage of patient and NGS data. |
http://www.lovd.nl/3.0/home
|
[40] |
COSMIC |
Database containing somatic mutations from human cancers separated into expert curated data and genome-wide screen published in scientific literature. |
http://cancer.sanger.ac.uk/cosmic
|
[42] |
NHLBI GO Exome Sequencing Project (ESP) |
Database of genes and mechanisms that contribute to blood, lung, and heart disorders through NGS data in various populations. |
http://evs.gs.washington.edu/EVS/
|
|
Exome Aggregation Consortium (ExAC) |
Database of 60,706 unrelated individuals from disease and population exome sequencing studies. |
http://exac.broadinstitute.org/
|
[3] |
SeattleSeq Annotation |
Part of the NHBLI sequencing project; this database contains novel and known SNVs and INDELs including accession number, function of the variant, and HapMap frequencies, clinical association, and PolyPhen predictions. |
http://snp.gs.washington.edu/SeattleSeqAnnotation137/
|
|
|
Functional predictors
|
CADD |
Machine learning algorithm to score all possible 8.6 million substitutions in the human reference genome from 1 to 99 based on known and simulated functional variants. |
http://cadd.gs.washington.edu/info
|
[49] |
FATHMM |
Uses Hidden Markov Models to predict the functional consequences of SNVs in coding and noncoding variants through a web server. |
http://fathmm.biocompute.org.uk/
|
[46] |
LRT |
Uses the Likelihood Ratio statistical test to compare a variant to known variants and determine if they are predicted to be benign, deleterious, or unknown. |
http://genome.cshlp.org/content/19/9/1553.long
|
[45] |
PolyPhen-2 |
Predicts potential impact of a nonsynonymous variant using comparative and physical characteristics. |
http://genetics.bwh.harvard.edu/pph2/
|
[44] |
SIFT |
By using PSI-BLAST, a prediction can be made on the effect of a nonsynonymous mutation within a protein. |
http://sift.jcvi.org/
|
[43] |
VEST |
Machine learning approach to determine the probability that a missense mutation will impair the functionality of a protein. |
http://karchinlab.org/apps/appVest.html
|
[48] |
MetaSVM & MetaLR |
Integration of a Support Vector Machine and Logistic Regression to integrate nine deleterious prediction scores of missense mutations. |
https://sites.google.com/site/jpopgen/dbNSFP
|
[47] |
|
Significant somatic mutations
|
SomaticSniper |
Using two bam files as input, this tool uses the genotype likelihood model of MAZ to calculate the probability that the tumor and normal samples are different, thus identifying somatic variants. |
http://gmt.genome.wustl.edu/packages/somatic-sniper/
|
[50] |
MuTect |
Using statistical analysis to predict the likelihood of a somatic mutation using two Bayesian approaches. |
https://www.broadinstitute.org/cancer/cga/mutect
|
[35] |
VarSim |
By leveraging on previously reported mutations, a random mutation simulation is preformed to predict somatic mutations. |
http://bioinform.github.io/varsim/
|
[51] |
SomVarIUS |
Identification of somatic variants from unpaired tissue samples with a sequencing depth of 150x and 67% precision, implemented in Python. |
https://github.com/kylessmith/SomVarIUS
|
[52] |
|
Copy number alteration
|
Control-FREEC |
Detects copy number changes and loss of heterozygosity (LOH) from paired SAM/BAM files by computing and normalizing copy number and beta allele frequency. |
http://bioinfo-out.curie.fr/projects/freec/
|
[59] |
CNV-seq |
Mapped read count is calculated over a sliding window in Perl and R to determine copy number from HTS studies. |
http://tiger.dbs.nus.edu.sg/cnv-seq/
|
[53] |
SegSeq |
Using 14 million aligned sequence reads from cancer cell lines, equal copy number alterations are calculated from sequencing data. |
https://www.broadinstitute.org/cancer/cga/segseq
|
[54] |
VarScan2 |
Determines copy number changes in matched or unmatched samples using read ratios and then postprocessed with a circular binary segmentation algorithm. |
http://dkoboldt.github.io/varscan/using-varscan.html
|
[61] |
ExomeAI |
Detects allele imbalance including LOH in unmatched tumor samples using a statistical approach that is capable of handling low-quality datasets. |
http://gqinnovationcenter.com/index.aspx
|
[64] |
CNVseeqer |
Exon coverage between matched sequences was calculated using log2 ratios followed by the circular binary segmentation algorithm. |
http://icb.med.cornell.edu/wiki/index.php?title=Elementolab/CNVseeqer&redirect=no
|
[60] |
EXCAVATOR |
Detects copy number variants from WES data in 3 steps using a Hidden Markov Model algorithm. |
https://sourceforge.net/projects/excavatortool/
|
[57] |
ExomeCNV |
R package used to detect copy number variants of loss of heterozygosity from WES data. |
https://secure.genome.ucla.edu/index.php/ExomeCNV_User_Guide
|
[58] |
ADTEx |
Detection of aberrations in tumor exomes by detecting B-allele frequencies and implemented in R. |
http://adtex.sourceforge.net/
|
[55] |
CONTRA |
Uses normalized depth of coverage to detect copy number changes from targeted resequencing data including WES. |
https://sourceforge.net/projects/contra-cnv/
|
[56] |
|
Driver prediction tools
|
CHASM |
Machine learning method that predicts the functional significance of somatic mutations. |
http://karchinlab.org/apps/appChasm.html
|
[65] |
Dendrix |
De novo drivers are discovered from cancer only mutational data including genes, nucleotides, or domains that have high exclusivity and coverage. |
http://compbio.cs.brown.edu/projects/dendrix/
|
[66] |
MutSigCV |
Gene-specific and patient-specific mutation frequencies are incorporated to find mutations in genes that are mutated more often than would be expected by chance. |
http://www.broadinstitute.org/cancer/software/genepattern/modules/docs/MutSigCV
|
[67] |
|
Pathway analysis tools and resources
|
KEGG |
Database using maps of known biological processes that allows searching for genes and color coding of results. |
http://www.genome.jp/kegg/
|
[68] |
DAVID |
Allows for users to input a large set of genes and discover the functional annotation of the gene list including pathways, gene ontology terms, and more. |
https://david.ncifcrf.gov/
|
[69] |
STRING |
Network visualization of protein-protein interactions of over 2,031 organisms. |
http://string-db.org/
|
[70] |
BEReX |
Uses biomedical knowledge to allow users to search for relationships between biomedical entities. |
http://infos.korea.ac.kr/berex/
|
[71] |
DAPPLE |
Uses a list of genes to determine physical connectivity among proteins according to protein-protein interactions. |
http://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1001273
|
[72] |
SNPsea |
Uses a linkage disequilibrium to determine pathways and cell types that are likely to be affected based on SNP data. |
http://www.broadinstitute.org/mpg/snpsea/
|
[73] |
|
Tools and resources for linking variants to therapeutics
|
cBioPortal |
Database that allows the download, analysis, and visualization of cancer sequencing studies, including providing patient and clinical data for samples. |
http://www.cbioportal.org/
|
[78] |
My Cancer Genome |
Database for cancer research that provides linkage of mutational status to therapies and available clinical trials. |
https://www.mycancergenome.org/
|
http://www.mycancergenome.org/
|
ClinVar |
Database of relationship between phenotypes and human variations, showing the relationship between health status and human variations and known implications. |
https://www.ncbi.nlm.nih.gov/clinvar/
|
[74] |
DSigDB |
Database of drug signatures that includes 19,531 genes and 17,389 compounds that can in part help identify compounds for drug repurposing studies in translational research. |
http://tanlab.ucdenver.edu/DSigDB
|
[77] |
PharmGKB |
Knowledge base allowing visualization of a variety of drug-gene knowledge. |
https://www.pharmgkb.org/
|
[75] |
DrugBank |
Contains detailed drug information with comprehensive drug target information for 8,206 drugs. |
http://www.drugbank.ca/
|
[76] |
|
WES data analysis pipelines
|
fast2VCF |
Whole Exome Sequencing pipeline that starts with raw sequencing (fastq) files and ends with a VCF file that has good capability for novel and expert users. |
http://fastq2vcf.sourceforge.net/
|
[80] |
SeqMule |
WES or WGS pipeline that combines the information from over ten alignment and analysis tools to arrive at a VCF file that can be used in both Mendelian and cancer studies. |
http://seqmule.openbioinformatics.org/en/latest/
|
[79] |
IMPACT |
WES data analysis pipeline that starts with raw sequencing reads and analyzes SNVs and CNAs and links this data to a list of prioritized drugs from clinical trials and DSigDB. |
http://tanlab.ucdenver.edu/IMPACT/
|
[81] |
Genomes on the Cloud (GotCloud) |
Automated sequencing pipeline that performs in part alignment, variant calling, and quality control that can be run on Amazon Web Services EC2 as well as local machines and clusters. |
http://genome.sph.umich.edu/wiki/GotCloud
|
|