Skip to main content
. 2016 Dec 4;10:267–289. doi: 10.4137/BBI.S38427

Table 2.

Software and tools for epigenomic data analysis.

SOFTWARE/TOOL DESCRIPTION URL REFS
1. DNA methylation
1.1. Mapping BS-seq reads
1.1.1. General aligners with a BS-Seq module
GSNAP A wild-card bisulfite aligner included in a general-purpose alignment tool (Genomic Short-read Nucleotide Alignment Program) http://share.gene.com/gmap 323
LAST A wild-card bisulfite aligner included in a general-purpose alignment tool http://last.cbrc.jp 161
RMAP A Wild-card bisulfite aligner included in a general-purpose alignment tool http://rulai.cshl.edu/rmap/ 6
segemehl A wild-card bisulfite aligner included in a general-purpose alignment tool http://www.bioinf.uni-leipzig.de/Software/segemehl 304
1.1.2 Specific BS-Seq aligner that use a three-letter approach
Bismark A widely used three-letter bisulfite aligner based on Bowtie/Bowtie2 http://www.bioinformatics.babraham.ac.uk/projects/bismark 165
BRAT A bisulfite-treated reads tool using the three-letter alignment http://compbio.cs.ucr.edu/brat 166
BS-Seeker A three-letter bisulfite aligner based on Bowtie https://github.com/BSSeeker/Bsseeker2 324
MethylCoder A three-letter bisulfite aligner based on Bowtie/GSNAP https://github.com/brentp/methylcode 168
1.1.3 The specific BS-Seq aligner by wild-card approch
BSMAP A widely used wild-card aligner for bisulfite sequencing reads http://code.google.com/p/bsmap 325
Pash A wild-card bisulfite aligner using gapped k-mer and multi-positional hash table http://brl.bcm.tmc.edu/pash 170172
1.1.4 Other BS-seq aligners
BISMA Mapping and clustering of bisulfite sequencing data for individual clones from unique and repetitive sequences http://biochem.jacobs-university.de/BDPC/BISMA/ 326
BRAT-BW A fast, accurate and memory-efficient BS aligner using the FM-index (Burrows-Wheeler transform) http://compbio.cs.ucr.edu/brat/ 304
B-SOLANA A aligner for bisulfite-sequencing data of ABI SOLiD sequencers http://code.google.com/p/bsolana 327
RRBSMAP A wild-card aligner for RRBS reads http://rrbsmap.computational-epigenetics.org 328
1.2. Detecting differential methylated regions (DMRs)
1.2.1 Software for DMR calling only
BiSeq An R package for detect differentially methylated regions (DMRs) for BS data https://www.bioconductor.org/packages/release/bioc/html/BiSeq.html 175
bumphunter Bump hunting to identify differentially methylated regions http://bioconductor.org/packages/release/bioc/html/bumphunter.html 177
DMRcate An R package for detecting differentially methylated regions (DMRs) based on tunable kernel smoothing www.bioconductor.org/packages/release/bioc/html/DMRcate.html 178
IMA An R package for high-throughput analysis of Illumina’s 450K Infinium methylation data http://www.rforge.net/IMA 329
M3D An R package for detecting differentially methylated regions (DMRs) using a non-parametric, kernel-based method https://www.bioconductor.org/packages/release/bioc/html/M3D.html 330
methylSig An R package for detecting differentially methylated sites (DMCs) or regions (DMRs) using a beta-binomial model https://github.com/sartorlab/methylSig 331
metilene A fast and sensitive tool for detecting DMR by a binary segmentation algorithm combined with a two-dimensional statistical test http://www.bioinf.uni-leipzig.de/Software/metilene/ 185
MOABS A tool for detecting differentially methylated sites (DMCs) or regions (DMRs) based on a Beta-Binomial hierarchical model with relative low CpG coverage (~10X) https://code.google.com/archive/p/moabs/ 332
NHMMfdr An R package for detecting differential DNA methylation based on non-homogeneeous hidden Markov model (NHMM) by estimating false discovery rates (FDRs) http://www.ams.sunysb.edu/~pfkuan/NHMMfdr/ 182
QDMR A tool for detecting DMR based on Shannon entropy http://bioinfo.hrbmu.edu.cn/qdmr 333
1.2.2 Pipeline for both BS-seq mapping and DMR calling
Bsmooth Bsmooth is a pipeline for analyzing whole genome bisulfite sequencing (WGBS) data. It includes tools for aligning the data, quality control, and identifying differentially methylated regions (DMRs). http://rafalab.jhsph.edu/bsmooth/ 304
MethPipe A computational pipeline for analyzing bisulfite sequencing data (WGBS and RRBS), including BS mapping (Wild-Card aligner) and DMR calling http://smithlabresearch.org/software/methpipe/ 334
RefFreeDMA Mapping for RRBS reads and DMR calling without a reference genome https://github.com/jklughammer/RefFreeDMA 335
2. Histone Modifications and DNA-binding Proteins
2.1 Short-read Alignment
BWA A fast and efficientlight-weighted tool that aligns short sequences to a sequence database; based on the Burrows–Wheeler transform http://bio-bwa.sourceforge.net 233
Bowtie Ultrafast, memory-efficient short read aligner. Uses a Burrows-Wheeler-Transformed (BWT) index http://bowtie-bio.sourceforge.net 232
ELAND Efficient Large-Scale Alignment of Nucleotide Databases. Whole genome alignments to a reference genome http://support.illumina.com/help/SequencingAnalysisWorkflow/Content/Vault/Informatics/Sequencing_Analysis/CASAVA/swSEQ_mCA_ReferenceFiles.htm Illumina
GenomeMapper GenomeMapper is a short read mapping tool designed for accurate read alignments. It quickly aligns millions of reads either with ungapped or gapped alignments http://1001genomes.org/software/genomemapper. html 336
GNUMAP Genomic Next-generation Universal MAPper is a program designed to accurately map sequence data obtained from next-generation sequencing machines back to a genome of any size. It seeks to align reads from nonunique repeats using statistics http://dna.cs.byu.edu/gnumap/ 323
HiCUP A tool for mapping and performing quality control on Hi-C data http://www.bioinformatics.babraham.ac.uk/projects/hicup/ 337
GSNAP Considers a set of variant allele inputs to better align to heterozygous sites http://research-pub.gene.com/gmap 160
MAQ Mapping and Assembly with Qualities (renamed from MAPASS2). Particularly designed for Illumina with preliminary functions to handle ABI SOLiD data http://maq.sourceforge.net/ 230
SOAP SOAP (Short Oligonucleotide Alignment Program). A program for efficient gapped and ungapped alignment of short oligonucleotides onto reference sequences http://soap.genomics.org.cn/ 229
SOAP2 SOAP2 used a Burrows Wheeler Transformation (BWT) compression index to substitute the seed strategy for indexing the reference sequence in the main memory http://soap.genomics.org.cn/soapaligner.html 234
ZOOM ZOOM (Zillions Of Oligos Mapped) is designed to map millions of short reads, emerged by next-generation sequencing technology, back to the reference genomes, and carry out post-analysis http://omictools.com/zoom-tool 231
2.2 Peak Detection
2.2.1 Peak Caller
BroadPeak A novel algorithm for identifying broad peaks in diffuse ChIP-seq datasets http://jordan.biology.gatech.edu/page/software/broadpeak/ 237
MACS MACS fits data to a dynamic Poisson distribution; works with and without control data http://liulab.dfci.harvard.edu/MACS 238
PeakSeq PeakSeq takes into account differences in mappability of genomic regions; enrichment based on FDR calculation http://info.gersteinlab.org/PeakSeq 338
SICER A clustering approach for identification of enriched domains from histone modification ChIP-Seq data http://home.gwu.edu/~wpeng/Software.htm 236
SISSRS A novel algorithm for precise identification of binding sites from short reads generated from ChIP-Seq experiments http://sissrs.rajajothi.com/ 239
ZINBA ZINBA can incorporate multiple genomic factors, such as mappability and GC content; can work with point-source and broad-source peak data http://code.google.com/p/zinba 339
2.2.2 Differential Peak Caller
baySeq An R package that uses empirical Bayes approach to identify significant differences; assumes negative binomial distribution of data http://www.bioconductor.org/packages/release/bioc/html/baySeq.html 340
ChIPDiff A toolkit for the genome-wide comparison of histone modification sites identified by ChIP-seq, differential histone modification sites (DHMS) identification, uses binomial distribution, Baum-Welch expectation maximization (EM) algorithm, forward-backward algorithm http://cmb.gis.a-star.edu.sg/ChIPSeq/paperChIP-Diff.htm 341
edgeR An R package that uses negative binomial distribution to model differences in tag counts; uses replicates to better estimate significant differences http://www.bioconductor.org/packages/2.9/bioc/html/edgeR.html 257
DESeq DESeq uses negative binomial distribution, but differs in the calculation of the mean and variance of the distribution http://www-huber.embl.de/users/anders/DESeq 253
SAMSeq SAMSeq based on the popular SAM software; a non-parametric method that uses resampling to normalize for differences in sequencing depth http://www.stanford.edu/~junli07/research.html#SAM 342
3. ncRNAs
3.1 ncRNAs detection and quantification
miRDeep miRDeep was developed to discover active known or novel miRNAs from deep sequencing data after the removal of adapters with a number of scripts to preprocess and score the mapped data https://www.mdc-berlin.de/8551903/en/ 248
miRDeep2 miRDeep2 is more sensitively and robustly to carry out identifying known and novel miRNAs by evaluating the structure and signature for each precursor, quantifying known miRNAs based on the annotation in miRBase and predicting secondary structure by RNAfold tool https://www.mdc-berlin.de/8551903/en/ 252
miRDeep* miRDeep* is an integrated standalone miRNA identification application with a user-friendly graphic interface to conduct sequence alignment, pre-miRNA secondary structure calculation, and graphical display with low memory requirement http://www.australianprostatecentre.org/research/software/mirdeep-star 249
DARIO DARIO is a web service for studying short read data from small RNA-seq experiments. It provides a wide range of analysis features, including quality control, read normalization, ncRNA quantification and prediction of putative ncRNA candidates http://dario.bioinf.uni-leipzig.de/index.py 343
ncPRO-seq ncPRO-seq is a tool for annotation and profiling of ncRNAs from small-RNA sequencing data. It aims to interrogate and perform detailed analysis on small RNAs derived from annotated non-coding regions in miRBase, piRBase, Rfam and repeatMasker, and regions defined by users. The ncPRO pipeline also has a module to identify regions significantly enriched with short reads that cannot be classified as known ncRNA families https://sourceforge.net/projects/ncproseq/ 344
CoRAL CoRAL is a machine-learning package that can predict the precursor class of small RNAs present in a high-throughput RNA-sequencing dataset and produces information about the features that are most important for discriminating different populations of small non-coding RNAs http://wanglab.pcbi.upenn.edu/coral/ 345
RNA-CODE RNA-CODE is designed for ncRNA identification in NGS data that lack quality reference genomes. Given a set of short reads, it classifies the reads into different types of ncRNA families. The classification results can be used to quantify the expression levels of different types of ncRNAs in RNA-seq data and ncRNA composition profiles in metagenomic data, respectively http://www.cse.msu.edu/~chengy/RNA_CODE/ 346
CAP-miRSeq A comprehensive analysis pipeline for deep microRNA sequencing that integrates read preprocessing, alignment, mature/precursor/novel miRNA qualification, variant detection in miRNA coding region, and flexible differential expression between experimental conditions http://bioinformaticstools.mayo.edu/research/capmirseq/ 256
iMir A modular pipeline for comprehensive analysis of smallRNA-Seq data, comprising specific tools for adapter trimming, quality filtering, differential expression analysis, biological target prediction and other useful options by integrating multiple open source modules and resources in an automated workflow http://www.labmedmolge.unisa.it/inglese/research/imir 250
UEA sRNA workbench UEA sRNA workbench performs complete analysis of single or multiple-sample small RNA datasets to identify novel micro RNA sequences and profiling small RNA expression patterns in genetic data http://srna-workbench.cmp.uea.ac.uk/ 260
omiRas omiRas is a web server for annotation, comparison and visualization of interaction networks of non-coding RNAs derived from small RNA-Sequencing http://tools.genxpro.net/omiras/ 259
sRNAtoolbox sRNAtoolbox provide several tools including sRNAbench for sRNA expression profiling and prediction of novel microRNAs, sRNAde for differential expression analysis, miRNA-consTarget for prediction of miRNAs, sRNAjBrowserDE for visualization differential expression as a fuction of read length and sRNAfuncTerms for determination of over represented functional annotations in target gene set http://bioinfo5.ugr.es/srnatoolbox 347
iSeeRNA iSeeRNA is a support vector machine (SVM)-based classifier for the identification of lincRNAs http://137.189.133.71/software.html 261
Sebnif Sebnif is an Integrated Bioinformatics Pipeline for the Identification of Novel Large Intergenic Noncoding RNAs (lincRNAs) base on iSeeRNA http://137.189.133.71/sebnif/ 262
LncRNA2Function LncRNA2Function – a comprehensive resource for functional investigation of human lncRNAs based on RNA-seq data http://mlg.hit.edu.cn/lncrna2function/ 264
3.2 RIP-seq and CLIP-seq
3.2.1 Differential Peak Caller and Binding site detector from C LIP-seq
Novoalign An accurate NGS short reads aligner for aligning to reference genome http://www.novocraft.com/products/novoalign/ 267
PIPE-CLIP A Galaxy framework-based comprehensive online pipeline for reliable analysis of data generated by three types of CLIP-seq protocol http://pipeclip.qbrc.org/ 270
PARalyzer It utilizes this nucleotide ubstation in a kernel density estimate classifier to generate the high-resolution set of Protein-RNA interaction sites https://ohlerlab.mdc-berlin.de/software/PARalyzer_85/ 271
Piranha Piranha is a peak finding and differential binding detection algorithm http://smithlabresearch.org/software/piranha/ 266
wavClusteR An integrated pipeline for the analysis of PAR-CLIP data https://bioconductor.org/packages/release/bioc/html/wavClusteR.html 272
dCLIP dCLIP is designed for quantitative CLIP-seq comparative analysis is able to effectively identify differential binding regions of RBPs in four CLIP-seq datasets http://qbrc.swmed.edu/software/ 273
3.2.2 Motif Discovery
GraphProt GraphProt is a machine learning computational framework for learning sequence- and structure-binding preferences of RNA-RBPs from high-throughput experimental data http://www.bioinf.uni-freiburg.de/Software/GraphProt/ 280
MEME Perform motif discovery on DNA, RNA or protein datasets http://meme-suite.org/ 348
cERMIT cERMIT is a computationally efficient motif discovery tool based on analyzing genome-wide quantitative regulatory evidence https://ohlerlab.mdc-berlin.de/software/cERMIT_82/ 276
GLAM2 (Gapped Local Alignment of Motifs) GLAM2 is a motif detection tool for discovering motifs allowing indels in a fully general manner from DNA, RNA and protein datasets http://bioinformatics.org.au/glam2 277
MatrixREDUCE A motif discovery tool for genome-wide ChIP-seq and CLIP-seq data analysis http://www.bussemakerlab.org/ 278
RNA Bind-n-Seq A quantitative assessment of the sequence and structural binding specificity 349
CapR An efficient algorithm that calculates the probability that each RNA base position is located within each secondary structural context https://sites.google.com/site/fukunagatsu/software/capr 281
RNAcontext An efficient motif finding method ideally suited for using large-scale RNA-binding affinity datasets to determine the relative binding preferences of RBPs for a wide range of RNA sequences and structures http://www.cs.toronto.edu/~hilal/rnacontext/ 279
ViennaRNA Package 2.0 A widely used compilation of RNA secondary structure http://www.tbi.univie.ac.at/RNA/ 279
4. Storing, retrieving and visualizing epigenomics data
4.1 Genome browser for visualizing DNA methylation
Ensembl A widely used Web-based genome browser with various epigenome data sets http://www.ensembl.org 283
IGV A widely used graphical genome browser that is run locally on the user’s computer http://www.broadinstitute.org/igv 286
UCSC Genome Browser Widely used Web-based genome browser hosting all ENCODE data http://genome.ucsc.edu 282
BDPC Web-based tool for bisulfite sequencing data presentation and compilation http://biochem.jacobs-university.de/BDPC 350
DaVIE The database with an intuitive user interface to perform visual comparisons across large DNA methylation data sets https://github.com/apfejes/epigenetics-software 285
EpiExplorer A web server provides an interactive gateway for exploring large-scale epigenetic datasets of the human and mouse genome http://epiexplorer.mpi-inf.mpg.de 351
EpiGRAPH A user-friendly software for advanced (epi-) genome analysis and prediction by powerful machine learning algorithms http://epigraph.mpi-inf.mpg.de 352
WashU Epigenome Browser Web-based genome browser focusing on the human epigenome http://epigenomegateway.wustl.edu 353
4.2 Specialized-DNA methylation databases
MethBase A central reference methylome database created from public BS-seq datasets http://smithlabresearch.org/software/methbase/ 334
MethDB A database for DNA methylation and environmental epigenetic effects http://www.methdb.de 288
MethyCancer Database of cancer DNA methylation data http://methycancer.psych.ac.cn 354
PubMeth Database of DNA methylation literature http://www.pubmeth.org 290
4.3 Specialized histone modification databases
ChromatinDB A database of genome-wide histone modification patterns for Saccharomyces cerevisiae http://integbio.jp/dbcatalog/en/record/nbdc00939?jtpl=56 294
CR Cistrome A ChIP-Seq database for chromatin regulators and histone modification linkages in human and mouse http://cistrome.org/cr/ 293
Histome A relational knowledgebase of human histone proteins and histone modifying enzymes http://www.actrec.gov.in/histome/ 292
HHMD The human histone modification database http://202.97.205.78/hhmd/ 291
4.4 Specialized nc RNA and RBPs interaction database
starBase V2.0 starBase is designed for decoding ncRNA and the RNA-protein interaction networks and predicting functions especially incancer samples http://starbase.sysu.edu.cn/ 296,297
CLIPZ CLIPZ supports the automatic functional annotation and visualization of CLIP-seq identified binding sites http://www.clipz.unibas.ch/ 298
doRiNA A database of RNA interactions in post-transcriptional regulation http://dorina.mdc-berlin.de/ 300
CLIPdb An intergrated resource for characterizing the regulatory networks between RBPs and various RNA transcript classes http://lulab.life.tsinghua.edu.cn/clipdb/ 301

Note:

*

The descriptions are adapted from the software/tools website descriptions.