Table 1.
Step | Tool | Description | Ease of Use | Ref. |
---|---|---|---|---|
1 | FastQC | Provides a report of raw read quality. Implemented in JAVA and accepts BAM, SAM, and FastQ file formats. Available at (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/). | GUI* | 152 |
FastX | Part of the FastX-Tool kit. Implemented either through Galaxy or through command-line. Pre-compiled binaries are available in Linux and MacOS X platforms. | Command line | 153 | |
PRINSeq | Used to check quality of RNA-seq data. Can also filter, reformat, and trim reads. Provides summary statistics of the reads in both graphical and tabular format. | GUI | 154 | |
2 | Trimmomatic | Is a flexible trimmer that can handle paired-end data. It is implemented through Java and is available at www.usadellab.org/cms/index.php?page=trimmomatic. Only works with Illumina generate data. Does not automatically detect the PHRED score automatically. | Command line | 155 |
AdapterRemoval | Trimming tool that can remove adaptor sequences. Implemented in C++. Useful in processing large data sets, with longer reads, on a desktop machine. | Command line | 156 | |
TagCleaner | Automatically detects an adaptor sequence. It is available at (http://edwards.sdsu.edu/tagcleaner) and is implemented using Perl 5.8, through a user web-interface. | GUI | 157 | |
3 | BWA | Burrows-Wheeler Aligner’s (BWA) is able to align both short and long reads. It allows for mismatches and gaps. Performance is faster compared to other aligners such as MAQ. Available at http://bio-bwa.sourceforge.net | Command line | 158,159 |
Bowtie | Aligns shorts reads and requires less memory allowing implementation in a desktop computer. It is faster than comparable programs. It is available at http://bowtie.cbcb.umd.edu | Command line | 160 | |
STAR | Aligns non-contiguous sequences directly to the reference genome. Is able to detect splice junctions, multiple mismatches, and indels. Benefits include its ability to accurately align long reads, having the lowest false-positive rate while maintaining high sensitivity, and being fast. Implemented in C++. | Command line | 161 | |
4 | RNA-SeQC | Provides important measures of alignment quality including: yield, alignment and duplicate rates, GC bias, rRNA content, regions of alignment, continuity of coverage, 3’/5’ bias, and count of detectable transcripts. Implemented through Java or through the GenePattern web interface (www.GenePattern.org). | GUI | 162 |
RSeQC | Can evaluate sequence quality, GC bias, PCR bias, nucleotide composition bias, sequencing depth, strand specificity, coverage uniformity, and read distribution over the genome structure. It is the most comprehensive and efficient program. | Command line | 163 | |
Qualimap 2 | Can compare multiple sequencing data sets and includes a novel mode that aids in the discovery of biases and problems specific to RNA-seq technology. It is available in a user-friendly interface at http://qualimap.bioinfo.cipf.es | GUI | 164 | |
5 | Flux Capacitor | Quantifies the abundance of annotated alternatively spliced transcripts by distributing the reads mapping to a given splice junction among the transcripts including the exon. Written in Java; requires a Java Virtual Machine; platform independent. | Command line | 165 |
Cufflink | Allows for the probabilistic deconvolution of RNA-seq fragment densities and accounts for cases in which genome alignments of fragments do not uniquely correspond to source transcripts. It is an open-source C++ program and can be implemented in Linux and Mac OS X. | Command line | 166 | |
HTSeq | Using the Htseg-count function, it counts the overlap between reads and genes, and counts only reads that map unambiguously to a single gene. Implemented in Python. | Command line | 167 | |
6 | EBSeq | Uses an empirical Bayes hierarchical model approach to identify differentially expressed isoforms. It can compare two or more biological conditions. It is a robust method for identifying differentially expressed genes. Implemented in R and can be implemented through a user-friendly interface available at https://www.biostat.wisc.edu/ñingleng/EBSeq_Package/EBSeq_Interface/ | GUI | 168 |
DESeq2 | Uses shrinkage estimators for dispersion and fold change which improves its stability and reproducibility. Ideal for analysis of small studies with few replicates. Allows for a more quantitative analysis focused on the strength rather than the mere presence of differential expression. Implemented in R. | Command line | 169 | |
Limma+Voom | Transforms the normalized counts to logarithmic base 2 and adds a precision weight for each observation. Can model the data in normal Gaussian distribution, thus allowing the data to be tested statistically. It is computationally fast and can be used with small sample sizes, with a minimum of two replicates per group. Implemented in R. | Command line | 170 |
GUI = Graphical user interface.