Skip to main content
. Author manuscript; available in PMC: 2020 Sep 1.
Published in final edited form as: Am J Reprod Immunol. 2019 Jun 26;82(3):e13157. doi: 10.1111/aji.13157

Table 1.

Programs used to complete steps in RNA-seq data analysis outline in Figure 3.

Step  Tool Description Ease of Use Ref.
1 FastQC Provides a report of raw read quality. Implemented in JAVA and accepts BAM, SAM, and FastQ file formats. Available at (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/). GUI* 152
FastX Part of the FastX-Tool kit. Implemented either through Galaxy or through command-line. Pre-compiled binaries are available in Linux and MacOS X platforms. Command line 153
PRINSeq Used to check quality of RNA-seq data. Can also filter, reformat, and trim reads. Provides summary statistics of the reads in both graphical and tabular format. GUI 154
2 Trimmomatic Is a flexible trimmer that can handle paired-end data. It is implemented through Java and is available at www.usadellab.org/cms/index.php?page=trimmomatic. Only works with Illumina generate data. Does not automatically detect the PHRED score automatically. Command line 155
AdapterRemoval Trimming tool that can remove adaptor sequences. Implemented in C++. Useful in processing large data sets, with longer reads, on a desktop machine. Command line 156
TagCleaner Automatically detects an adaptor sequence. It is available at (http://edwards.sdsu.edu/tagcleaner) and is implemented using Perl 5.8, through a user web-interface. GUI 157
3 BWA Burrows-Wheeler Aligner’s (BWA) is able to align both short and long reads. It allows for mismatches and gaps. Performance is faster compared to other aligners such as MAQ. Available at http://bio-bwa.sourceforge.net Command line 158,159
Bowtie Aligns shorts reads and requires less memory allowing implementation in a desktop computer. It is faster than comparable programs. It is available at http://bowtie.cbcb.umd.edu Command line 160
STAR Aligns non-contiguous sequences directly to the reference genome. Is able to detect splice junctions, multiple mismatches, and indels. Benefits include its ability to accurately align long reads, having the lowest false-positive rate while maintaining high sensitivity, and being fast. Implemented in C++. Command line 161
4 RNA-SeQC Provides important measures of alignment quality including: yield, alignment and duplicate rates, GC bias, rRNA content, regions of alignment, continuity of coverage, 3’/5’ bias, and count of detectable transcripts. Implemented through Java or through the GenePattern web interface (www.GenePattern.org). GUI 162
RSeQC Can evaluate sequence quality, GC bias, PCR bias, nucleotide composition bias, sequencing depth, strand specificity, coverage uniformity, and read distribution over the genome structure. It is the most comprehensive and efficient program. Command line 163
Qualimap 2 Can compare multiple sequencing data sets and includes a novel mode that aids in the discovery of biases and problems specific to RNA-seq technology. It is available in a user-friendly interface at http://qualimap.bioinfo.cipf.es GUI 164
5 Flux Capacitor Quantifies the abundance of annotated alternatively spliced transcripts by distributing the reads mapping to a given splice junction among the transcripts including the exon. Written in Java; requires a Java Virtual Machine; platform independent. Command line 165
Cufflink Allows for the probabilistic deconvolution of RNA-seq fragment densities and accounts for cases in which genome alignments of fragments do not uniquely correspond to source transcripts. It is an open-source C++ program and can be implemented in Linux and Mac OS X. Command line 166
HTSeq Using the Htseg-count function, it counts the overlap between reads and genes, and counts only reads that map unambiguously to a single gene. Implemented in Python. Command line 167
6 EBSeq Uses an empirical Bayes hierarchical model approach to identify differentially expressed isoforms. It can compare two or more biological conditions. It is a robust method for identifying differentially expressed genes. Implemented in R and can be implemented through a user-friendly interface available at https://www.biostat.wisc.edu/ñingleng/EBSeq_Package/EBSeq_Interface/ GUI 168
DESeq2 Uses shrinkage estimators for dispersion and fold change which improves its stability and reproducibility. Ideal for analysis of small studies with few replicates. Allows for a more quantitative analysis focused on the strength rather than the mere presence of differential expression. Implemented in R. Command line 169
Limma+Voom Transforms the normalized counts to logarithmic base 2 and adds a precision weight for each observation. Can model the data in normal Gaussian distribution, thus allowing the data to be tested statistically. It is computationally fast and can be used with small sample sizes, with a minimum of two replicates per group. Implemented in R. Command line 170
*

GUI = Graphical user interface.