. Author manuscript; available in PMC: 2020 Sep 1.

Published in final edited form as: Am J Reprod Immunol. 2019 Jun 26;82(3):e13157. doi: 10.1111/aji.13157

Table 1.

Programs used to complete steps in RNA-seq data analysis outline in Figure 3.

Step	Tool	Description	Ease of Use	Ref.
1	FastQC	Provides a report of raw read quality. Implemented in JAVA and accepts BAM, SAM, and FastQ file formats. Available at (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/).	GUI^*	152
	FastX	Part of the FastX-Tool kit. Implemented either through Galaxy or through command-line. Pre-compiled binaries are available in Linux and MacOS X platforms.	Command line	153
	PRINSeq	Used to check quality of RNA-seq data. Can also filter, reformat, and trim reads. Provides summary statistics of the reads in both graphical and tabular format.	GUI	154
2	Trimmomatic	Is a flexible trimmer that can handle paired-end data. It is implemented through Java and is available at www.usadellab.org/cms/index.php?page=trimmomatic. Only works with Illumina generate data. Does not automatically detect the PHRED score automatically.	Command line	155
	AdapterRemoval	Trimming tool that can remove adaptor sequences. Implemented in C++. Useful in processing large data sets, with longer reads, on a desktop machine.	Command line	156
	TagCleaner	Automatically detects an adaptor sequence. It is available at (http://edwards.sdsu.edu/tagcleaner) and is implemented using Perl 5.8, through a user web-interface.	GUI	157
3	BWA	Burrows-Wheeler Aligner’s (BWA) is able to align both short and long reads. It allows for mismatches and gaps. Performance is faster compared to other aligners such as MAQ. Available at http://bio-bwa.sourceforge.net	Command line	158,159
	Bowtie	Aligns shorts reads and requires less memory allowing implementation in a desktop computer. It is faster than comparable programs. It is available at http://bowtie.cbcb.umd.edu	Command line	160
	STAR	Aligns non-contiguous sequences directly to the reference genome. Is able to detect splice junctions, multiple mismatches, and indels. Benefits include its ability to accurately align long reads, having the lowest false-positive rate while maintaining high sensitivity, and being fast. Implemented in C++.	Command line	161
4	RNA-SeQC	Provides important measures of alignment quality including: yield, alignment and duplicate rates, GC bias, rRNA content, regions of alignment, continuity of coverage, 3’/5’ bias, and count of detectable transcripts. Implemented through Java or through the GenePattern web interface (www.GenePattern.org).	GUI	162
	RSeQC	Can evaluate sequence quality, GC bias, PCR bias, nucleotide composition bias, sequencing depth, strand specificity, coverage uniformity, and read distribution over the genome structure. It is the most comprehensive and efficient program.	Command line	163
	Qualimap 2	Can compare multiple sequencing data sets and includes a novel mode that aids in the discovery of biases and problems specific to RNA-seq technology. It is available in a user-friendly interface at http://qualimap.bioinfo.cipf.es	GUI	164
5	Flux Capacitor	Quantifies the abundance of annotated alternatively spliced transcripts by distributing the reads mapping to a given splice junction among the transcripts including the exon. Written in Java; requires a Java Virtual Machine; platform independent.	Command line	165
	Cufflink	Allows for the probabilistic deconvolution of RNA-seq fragment densities and accounts for cases in which genome alignments of fragments do not uniquely correspond to source transcripts. It is an open-source C++ program and can be implemented in Linux and Mac OS X.	Command line	166
	HTSeq	Using the Htseg-count function, it counts the overlap between reads and genes, and counts only reads that map unambiguously to a single gene. Implemented in Python.	Command line	167
6	EBSeq	Uses an empirical Bayes hierarchical model approach to identify differentially expressed isoforms. It can compare two or more biological conditions. It is a robust method for identifying differentially expressed genes. Implemented in R and can be implemented through a user-friendly interface available at https://www.biostat.wisc.edu/ñingleng/EBSeq_Package/EBSeq_Interface/	GUI	168
	DESeq2	Uses shrinkage estimators for dispersion and fold change which improves its stability and reproducibility. Ideal for analysis of small studies with few replicates. Allows for a more quantitative analysis focused on the strength rather than the mere presence of differential expression. Implemented in R.	Command line	169
	Limma+Voom	Transforms the normalized counts to logarithmic base 2 and adds a precision weight for each observation. Can model the data in normal Gaussian distribution, thus allowing the data to be tested statistically. It is computationally fast and can be used with small sample sizes, with a minimum of two replicates per group. Implemented in R.	Command line	170

GUI = Graphical user interface.