Abstract
Robust methods have been developed that leverage next-generation sequencing (NGS) to measure abundance of all mRNAs (RNA-seq) in samples as small as individual cells in order to study the testicular transcriptome in mammals. In this chapter, we present robust options for implementing bioinformatics workflows for the analysis of bulk RNA-seq from aggregate samples of hundreds to millions of cells and single-cell RNA-seq from individual cells. We also provide detailed protocols for using the R packages DESeq2 and Seurat, important parameters for successful implementation, and considerations for drawing conclusions from the results.
Keywords: Single-cell RNA-seq, Bulk RNA-seq, Bioinformatics, Spermatogenesis
1. Introduction
Mammalian spermatogenesis is the complex cell lineage that begins with spermatogonial stem cells (SSCs) and culminates in the production of spermatozoa. Understanding the fundamental biology underlying the myriad of cellular changes during this developmental process, such as SSC fate regulation, spermatogonial differentiation, male meiosis, and spermiogenesis, relies heavily on inclusive measurements of all expressed genes at the mRNA level. Initially, this was done with gene expression microarrays [1, 2] that are necessarily biased by the repertoire of probes on the arrays, but now, transcriptomes are typically produced with next-generation sequencing measurements of cDNA libraries [3-5]. In either case, measuring mRNA abundance patterns during spermatogenesis is performed using RNA extracted from either whole testes or enriched populations of cells [6-8]. Investigating mRNA changes temporally across the first wave of mouse spermatogenesis permits discovery of transcripts that change as new spermatogenic cell types first emerge during the initial spermatogenic wave [6-9]. Ultimately, such experiments permit identification of mRNAs that exhibit statistically significant changes in abundance (termed “differential gene expression” from here forward) between distinct enriched populations, developmental time-points, or experimental conditions [8, 10, 11]. For instance, bulk RNA-seq has been previously used to compare SSC-enriched/depleted populations isolated from developing mouse testes by fluorescence-activated cell sorting (FACS) selection for TSPAN8 cell-surface labeling or ID4-EGFP epifluorescence intensity [10-12]. Invariably, though, enrichment methods like StaPut grav-ity sedimentation and FACS fail to purify biologically distinct cell types (hence the term “enrichment”) and, instead, produce mixtures of cells with varying purity of the cell types of interest. Thus, results from bulk RNA-seq investigating cell mixtures constitutes ensemble averaging derived from aggregates of multiple spermatogenic and/or somatic cell types that can mask the phenotypes of less prevalent cell types such as SSCs and ignore heterogeneity among cells. To overcome these limitations, transcriptome profiling at the single-cell level is necessary. Recently, a plethora of single-cell RNA-seq studies have begun to comprehensively characterize the full extent of cellular heterogeneity in the testis, revealing rare, transitional, or novel cell types [4, 5, 9, 13, 14]. In the following sections, we will outline the typical steps taken to produce transcriptomes from bulk and single-cell samples and their subsequent use for differential gene expression analyses.
The basic workflow for the generation and analysis of both bulk and single-cell RNA-seq data consists of the following steps as shown in Fig. 1: (1) RNA from the cells of interest is reverse transcribed to complementary DNA (cDNA), which are then sequenced by high throughput (typically short-read) platforms; (2) quality control of the raw sequencing data is performed based on metrics generated by the sequencing platform or calculated from the raw reads; (3) the raw reads are aligned to a reference genome or transcriptome; (4) expression quantification is performed to generate a matrix of gene expression values (columns = samples or cells, rows = genes); (5) gene expression values are normalized to account for technical variation/biases; and (6) differential gene expression analysis is performed between samples.
Fig. 1.
A generalized bulk and single-cell RNA sequencing workflow is shown
1.1. cDNA Library Preparation
Generation of cDNA libraries consists of RNA capture, fragmentation, reverse transcription into first strand cDNA, second strand synthesis, and cDNA amplification. Most methods for library preparation begin with capture of poly(A)-tailed RNA to enrich for protein-coding transcripts and exclude abundant RNA types such as rRNA and tRNA [15]. Poly(A)+RNA selection using magnetic or cellulose beads coated with oligo-dT molecules is favored due to its relatively low cost and ease of use. Alternatively, oligo-dT priming-based methods combine poly(A)+ selection and reverse transcription (RT) into one step. However, this method can exhibit 3′ bias, resulting in sequencing reads enriched for the 3′ portion of the transcript. Furthermore, internal poly(A) priming, in which oligo-dT prime at internal A-rich sequences of the transcripts, can generate a high frequency of truncated cDNAs [16,17]. Following capture, RNA samples are fragmented to a specific size range before RT based on the size limitation of sequencing platform (<600 bp for Illumina sequencers). When the goal is to sequence full-length RNA transcripts, intact (i.e., unfragmented) RNA is first reverse-transcribed into cDNA and then subjected to fragmentation. Fragmentation can be accomplished enzymatically (with tagmentase or DNase), chemically (with Zinc, KOAc, or MGOAc), or using mechanical forces (e.g., ultrasound sonication).
Following poly(A)+ selection, RNA is reverse transcribed into stable cDNA. During this step, most scRNA-seq methods add single-cell specific barcodes within the oligo-dT molecules to allow multiplexed processing of pooled samples. In methods that utilize 3′ end-counting chemistry, random nucleotide-sequences in oligo-dT molecules serve as unique molecule identifiers (UMIs), or molecular tags to detect and quantify unique RNA molecules. Incorporating UMIs can help correct for amplification bias and reduce technical noise [18]. An engineered version of the Moloney murine leukemia virus reverse transcriptase with increased thermostability and low RNase H activity is commonly used for first strand synthesis [19]. Second strand cDNA synthesis can be achieved by poly(A) tailing or by a template-switching mechanism (SMART-Seq) [20-23]. The latter approach has the advantage of uniform coverage without loss of strand specificity. The synthesized cDNAs is then further amplified using conventional PCR or in vitro transcription (IVT). For this purpose, adaptor sequences or T7 polymerase promoter sequences are included in the oligonucleotides for PCR and IVT, respectively [24]. Full-length or 3′ transcript end sequencing can be performed for transcriptome profiling based on the objectives of the experiment. Full-length transcript analysis facilitates discovery of alternative-splicing events. In comparison, protocols for digital counting of 3′ transcript ends can be combined with UMIs.
2. Bionformatic Analysis of Bulk RNA-Seq Data
2.1. Quality Control
Quality control is based on quality metrics generated by the sequencing platform or calculated from raw sequencing reads. FASTQC is a widely used tool to assess the sequencing quality of the raw reads (pre-adapter removal) and trimmed reads (post-adapter removal) [25]. FASTQC is run independently for every FASTQ file and provides a summary of the per base and per sequence quality scores, per sequence GC content, per sequence adapter content, per sequence read lengths, and overrepresented sequences. After using FASTQC, quality trimming and adaptor sequence removal is an optional step that can be omitted if the preliminary QC does not reveal any quality bias or adapter sequences. Trimmomatic is a popular tool that performs both quality trimming and adaptor sequence removal, but other tools like CutAdapt and NGS QC Toolkit are also available [26-28]. In addition, RSeQC can be used to assess post-alignment quality [29]. RSeQC calculates the distribution of insert size and enables junction annotation. Table 1 provides a list of the commonly used quality control software.
Table 1.
Analysis steps and relevant software tools for generating bulk RNA-seq transcriptomes
| Analysis step | Software tool | Reference |
|---|---|---|
| Quality control Trimming | Fastqc | [25] |
| RSeQC | [29] | |
| Trimmomatic | [26] | |
| CutAdapt | [27] | |
| NGS QC Toolkit | [28] | |
| Read alignment | TopHat2 | [32] |
| STAR | [33] | |
| HISAT2 | [34] | |
| Bowtie2 | [30] | |
| BWA | [31] | |
| Salmon | [36] | |
| Kallisto | [37] | |
| Expression quantification | Cufflinks | [41] |
| RSEM | [42] | |
| Stringtie | [43] | |
| featureCounts | [44] | |
| HTSeq | [82] |
2.2. Read Alignment
Following quality control, reads are assigned to transcripts either by mapping to a reference genome or transcriptome. Reference-based alignment tools are broadly categorized as “splice aware” and “splice unaware” aligners. When using a transcriptome as reference, splice unaware tools including Bowtie2 and Burrow–Wheeler Aligner (BWA) can be used [30, 31]. When a genome is used as reference, using splice aware tools such as TopHat2, STAR and HISAT2 are advantageous as they can handle gapped (or spliced) alignments [32-34]. While all three tools are generally considered to produce fast results, they differ in computational speed and memory usage. Among these tools, TopHat2 outperforms STAR in terms of high-alignment rates when using a genome annotation file [35]. For de novo assembly, alignment-free mappers such as Salmon and Kallisto are useful [36, 37]. Although they have been shown to perform as well as the reference-based tools on simulated data, Salmon and Kallisto are still under development and are prone to spurious alignments, especially for lowly expressed genes [38]. Most of the tools discussed above are available on the open source, web-based Galaxy platform [39].
2.3. Transcriptome Reconstruction and Expression Quantification
Transcriptome reconstruction, or the identification of all transcripts in a sample, can be performed using two approaches: reference-based assembly or de novo assembly. Reference-based approaches rely on the availability and accuracy of a reference but are less computationally intensive. In comparison, de novo assembly is applied to organisms that lack a reference genome and overcomes the challenge of mapping uncertainty. However, this strategy requires more computational resources and deeper sequencing, making it typically unsuitable for large and complex mammalian transcriptomes [40]. Reference-guided expression quantification tools such as Cufflinks [41], RSEM [42], and Stringtie [43] use a transcript-based approach. A major drawback of transcript-based quantification is that estimating the expression of individual isoforms is more difficult. The union exon-based approach, used by featureCounts, assigns reads to genes with higher confidence compared with assigning reads to isoforms, which allows for a more accurate estimation of gene expression [44].
2.4. Normalization
Normalization of raw read counts is a critical step that corrects for non-biological (technical) variation introduced between the time that samples were isolated and sequencing data were produced. Such variation may be between and within samples and may arise due to the library preparation, sample sequencing read depth, gene length, read mapping bias, gene sequence composition, and sequence similarity [45]. Table 1 provides a detailed summary of the computational tools used to model technical variations in bulk RNA-seq. One source of technical variation between samples is differences in sequencing depth (often called library size), which represents the total number of NGS reads generated for a given sample. Normalization methods apply various global scaling factors on raw read counts to make library sizes comparable across samples. Global scaling quantile normalization methods include quantile (Q), upper quartile (UQ), median (Med), and TC (per sample total count). Methods which are implemented in the DESeq2 and edgeR packages include relative log expression (RLE) and trimmed mean of M-values (TMM), respectively [46-50]. For full-length transcriptomes, gene length can also impact abundance estimates as longer genes yield a higher read count compared to shorter genes expressed at equal levels. The most widely used approaches for gene length normalization include FPKM (fragments per kilobase per million reads), RPKM (reads per kilobase per million reads), and TPM (transcripts per million reads), which is a slight modification of RPKM [51].
Among these methods, RPKM and TPM are generally not used directly for differential expression analysis. TPM is useful for qualitative comparison such as clustering analysis, but should not be used for comparisons across samples when the total RNA content and distribution are very different [52]. In addition to correcting for known artifacts, numerous methods can identify potential latent factors that capture sources of technical variation. These methods include remove unwanted variation (RUV), surrogate variable analysis (SVA), and principal component analysis (PCA). These are used with statistical approaches (e.g., linear regression model or ComBat) to generate normalized data. Although the correct usage of these approaches can potentially increase statistical power in differential analysis, distinguishing latent factors from biological factors of interest is crucial [53].
2.5. Differential Gene Expression Analysis
Identification of differentially expressed genes is crucial for interpreting the biological differences between the compared conditions. Computational tools for differential gene expression analysis estimate the magnitude of differential expression between two or more conditions based on normalized read counts (i.e., calculate a fold change value). These tools also estimate the statistical significance of the difference and correct for multiple testing (i.e., calculate a p-value or adjusted p-value for each gene). Thresholds for the fold change and p-value can be chosen to improve the conciseness of top markers for defining groups while mitigating the risk of discarding useful genes. Typically, a cutoff of fold changes greater than 1.5 and p-value less than 0.05 is used. Methods for differential expression analysis can be grouped into parametric and non-parametric. Parametric methods capture all information about the data within the parameters, making it possible to predict the value of unknown data from observing the adopted model. Parametric methods are preferred when repeated biological measures from a defined population follow a normal distribution when sampled several times. However, these models assume a particular data distribution. In contrast, non-parametric methods capture more details about the data distribution by not imposing a rigid model to be fitted. If the modeling assumption is valid, parametric approaches are preferred. When the sample size is small, non-parametric methods are advised.
Computational tools can be further classified into three categories based on the statistical approaches used (Table 2). First, negative binomial model-based methods are implemented in edgeR, DESeq2, Cuffdiff2, baySeq, and EBSeq. Second, non-parametric methods which do not assume any distribution are used in SAMseq and NOIseq. Third, a generalized linear model is used in limma, which tests differential expression using the moderated t-statistics. The performance of 11 DE methods including edgeR, DESeq, baySeq, EBSeq, SAMseq, and limma was evaluated using both simulation studies and real RNA-Seq data [54]. All of these methods demonstrated low power with small sample sizes. DESeq2 and limma tended to be more conservative than edgeR, with better control of false positives. Limma-voom was robust to outliers and computationally efficient, but performed worse when the variances were unequal between the groups. Previous experimental validation of selected DEG from three methods (CuffDiff2, edgeR, and DESeq2) reported high-false discovery rates using the CuffDiff2 method [55]. A review of eight DE analysis methods (baySeq, DESeq, DESeq2, EBSeq, edgeR, voom, NOIseq, and SAMseq) evaluated precision, accuracy, and sensitivity using qRT-PCR data as reference [56]. According to this study, voom, NOIseq, and DESeq2 showed more consistent results than the other methods. In addition, SAMseq required larger sample sizes to detect significant DE genes. Overall, DESeq2, edgeR, and limma are the best performing tools [57], but none were completely infallible. One strategy to maximize the robustness of DEG sets is to identify consensus DEGs obtained from multiple approaches, which can improve accuracy and reduce false discovery rates; however, it is worth noting that this approach may also increase the false negative rate.
Table 2.
Normalization and differential gene expression analysis tools for bulk RNA-seq
| Software | Normalization method | Reach count distribution model |
Differential expression test | Reference |
|---|---|---|---|---|
| edgeR | TMM/Upper quartile/RLE/None (all scaling factors set to one) | Negative binomial | Exact test | [83, 84] |
| DESeq2 | DESeq size factors | Negative binomial | Wald test | [85] |
| baySeq | Scaling factors (quantile/TMM/total) | Negative binomial | Posterior probability through Bayesian approach | [86] |
| EBSeq | DESeq median normalization | Negative binomial | Posterior probability through Bayesian approach | [87] |
| Cuffdiff2 | Geometric/quartile/FPKM | Negative binomial/ Beta | t-test analogical method | [88] |
| NOIseq | RPKM/TMM/Upper quartile | Non-parametric | Corresponding log of fold change and absolute expression differences have a higher probability than noise values | [89,90] |
| SAMseq | Based on mean read count over null features of dataset | Non-parametric | Wilcoxon rank statistics-based permutation test | [91] |
| limma | TMM | Voom transformation of counts (linear) | Moderated t-test | [92] |
2.6. Analysis of Bulk RNA-Seq Data Using DEseq2
This section describes a basic workflow for the analysis of bulk RNA-seq matrix data (transcriptomes) using the R package DESeq2, which is included in the Bioconductor collection of R libraries. The code provided below can be directly used in R.
2.6.1. Resources Required
Software: R version 3.6.2 or newer: https://www.r-project.org/.
Hardware: Computer with Windows, Unix, Mac OS X, or Linux Operating systems with sufficient RAM and storage space for output files.
2.6.2. Initialization, Quality Control, and Filtering
To start using DESeq2, download and install the DESeq2 package within the R environment using the first two commands. Once installed, load the packages using the library command.
Load the matrix of read counts which we named cts from the file counts.csv and the group assignments (e.g., Tspan8high or Tspan8Low sample) which we named coldata for each sample from the file metadata.csv. Examine the count matrix and column data using the head function. Fix the order of the samples in the read count matrix to be consistent with the sample annotation file. Verify that the columns of the count matrix and rows of the column data are in the same order.
Construct a DESeqDataSet object in DESeq2 named dds using the count matrix cts, the sample information coldata, and a design formula. The design formula specifies which variables in the sample metadata coldata we will use for differential expression testing. The design formula can also include batch information to account for batch effects (see Note 1).
Perform pre-filtering to include only rows that have at least 10 read counts total.
2.6.3. Differential Gene Expression Analysis
Set the factor levels to specify the comparison to make during differential expression analysis.
Perform differential expression analysis using the function DESeq on raw read counts. Create a results table for a comparison of the last level over the first level specified in step 5 using the function results and inspect it. The first column, baseMean, contains the average of normalized count values divided by size factors taken over all samples. The remaining columns are log2FoldChange (log2 fold change estimates), lfcSE (standard error estimate for log2 fold change), stat (Wald statistic), p-value (using Wald test), and p-adjusted values.
Perform log fold change shrinkage using the function and specify the apeglm (adaptive t prior shrinkage estimator) method. Shrunken log fold changes are useful for ranking of genes and visualization (see Note 2).
Reorder the results table by the smallest p-values and view a summary of the results. Export results to an Excel file using the function write.csv.
2.6.4. Sample Clustering and Visualization
Create an MA-plot, which is a scatter plot of log2 fold changes (on the y-axis) versus mean of normalized counts (on the x-axis) (Fig. 2). To examine read counts for a single gene across the samples, plot normalized counts plus a pseudocounts of 0.5 using the plotCounts function.
Although differential expression testing is done on raw counts, it is useful to extract transformed values for visualization and clustering. The following commands apply the shifted logarithm transformation using the normTransform function, variance stabilizing transformation (VST) using the vst function, or regularized logarithm transformation (rlog) using the rlog function. Using the assay function, extract a matrix of normalized values. These transformation approaches produce data on the log2 scale which has been normalized with respect to the library size or other normalization factors specified.
Create a principle component analysis (PCA) plot using the transformed data showing the first two principle components, which capture the greatest variance between samples (Fig. 3).
To generate a heatmap of the transformed count data generated in step 2, the R package gplots is used. Select and order the genes to be included in the heatmap using the order function. Plot a heatmap of the shifted logarithm transformed values using the heatmap.2 function (Fig. 4).
The list of differentially expressed genes from step 8 can be used for gene set enrichment analysis using a variety of tools including DAVID, EnrichR and ingenuity pathway analysis (IPA).
Fig. 2.
An example of MA plot, a two-dimensional scatter plot to visualize gene expression changes from two different conditions is shown. The plot shows each gene are points, with log2 fold changes on the y-axis and mean of normalized counts on the x-axis. Genes with similar expression values in both conditions cluster around M-0 line. Points are colored red if the adjusted p-value is less than 0.1. Points which fall outside the plot limits are open triangles pointing either up or down
Fig. 3.
Example of a Principal component analysis (PCA) plot shows six samples over the first two principal components (PC1 and PC2). Points are colored by group (condition)
Fig. 4.
Heatmap of expression values of 10 differentially expressed genes across individual samples is shown. The columns represent samples, and the rows represent genes. Color indicate expression values (log transformed counts) for each gene
3. Bioinformatic Analysis of Single-Cell RNA-Seq Data
Single-cell transcriptome profiling enables an unbiased, high-resolution view of cellular heterogeneity. This allows investigators to identify cell types, investigate rare cell populations, define cellular states, and predict developmental or differentiation trajectories, each of which is hidden by ensemble averaging in bulk RNA sequencing. Two of the most widely used methodologies for assaying individual cells include microfluidic capture SMART-Seq using the Fluidigm C1 platform and droplet-digital RNA-seq using the 10× Genomics platform. The SMART-Seq protocol offers higher sensitivity and generates full-length transcriptomes, which makes it suitable for the detection of splice variants [23]. In addition, the Fluidigm C1 platform allows visual quality control check by microscopic examination. However, this method precludes multiplexing of samples and has a much lower cell throughput (10’s to 100’s of cell) at a considerable expense (>$50/cell). The increased read depth of the SMART-Seq protocol enables the study of low-abundance transcripts. In comparison, droplet-digital RNA-seq utilizes 3′ end-counting, providing a larger throughput of cells (1000’s to 10000’s of cells) at a lower sequencing cost per cell (>$1/cell). The ability to profile large amounts of cells, albeit at a reduced RNA capture efficiency, facilitates identification of rare cell populations [58]. Because cell barcodes are introduced randomly, this approach does not allow visual detection of doublets and association of cell properties such as fluorescent signals with transcriptome profiles. Other droplet-based methods include inDrops and Drop-seq, both of which employ 3′ end sequencing, making them cost efficient and suitable for analyzing a larger number of cells. However, these have a lower capture efficiency compared with 10× [59]. In contrast with the methods above, split-pool barcoding-based approaches, including sci-RNA-seq and SPLiT-seq, are highly scalable, with low-cost cDNA library preparation and longer sample storage [60,61]. A systematic comparison of some scRNA-seq technologies including SMART-seq, SMART-seq2, CEL-seq2, SCRB-seq, and MARS-seq was performed using mouse embryonic stem cells [62]. Overall, SMARTseq2 was found to be the most sensitive and accurate method to analyze full-length transcriptomes and detect alternative splice forms [62]. However, SMARTseq2 is limited by lower throughput and considerable costs. Although less sensitive, SCRB-seq, CEL-seq2, and MARS-seq quantified mRNA levels with less amplification bias due to the use of UMIs [62], while SCRB-seq, MARS-seq, and SMARTseq2 are preferred for analyzing fewer cells [62].
3.1. Quality Control
In droplet-based method, quality filtering must be performed to exclude cell barcodes that are unlikely to represent intact single cells. A commonly used approach is to calculate a dataset-specific threshold for the minimum number of transcripts or UMIs per barcode which is required to consider transcripts arising from a particular cell barcode as actually arising from an individual, live cell. Small quantities of transcripts detected per cell barcode may indicate the presence of a dead cell or ambient RNA molecules, i.e., cell-free transcripts in solution (arising from previously dead/lysed cells). Inversely, a large number of transcripts detected for a barcode may indicate doublets. Alternatively, the EmptyDrops method (implemented in the R package DropletUtils) detects the presence of cells by estimating background levels of RNA present in empty droplets and identifying barcodes with significant deviation from the background. EmptyDrops outperforms methods using UMI count thresholds by recovering cells with low-total RNA content. To identify cell multiplets in droplets, tools such as scrublet and DoubletFinder define a threshold to distinguish the inferred doublets from the assumed singlets. Multiplet rates can be determined empirically and addressed using bioinformatics [63]. Another round of quality control is necessary to distinguish viable single cells from damaged or dying cells based on the number of detected genes, the proportion of RNA derived from the mitochondrial genome and the proportions of unmapped or multi-mapped reads [64]. Cells with high proportions of mitochondrially derived genes, few detected genes, or high proportions of unmapped or multi-mapped reads are often damaged or dying cells. The threshold used for excluding cells with a high-mitochondrial content is specific for each dataset, with cutoffs typically ranging from 5% to 25% mitochondrial gene representation depending on sample type and processing method.
3.2. Normalization
Normalization aims to remove the influence of unwanted technical effects from scRNA-seq data while preserving true heterogeneity. Technical biases such as differences in PCR amplification, mRNA capture, and reverse transcription efficiency must be accounted for prior to downstream analysis. Although bulk and scRNA-seq data share similar features, such as overdispersion of gene expression, they also have unique features, such as the high proportion of zero read counts. Consequently, bulk-based normalization methods may be unsuitable for single cell transcriptomics because they may cause overcorrection of lowly expressed genes. For example, the simplest and most common normalization strategy for bulk RNA-seq data is to calculate a quantity related to the sequencing depth of the samples, known as the “size factor,” and divides the expression of all genes by this value [48]. A similar method specifically tailored to single-cell data, called scran, clusters the cells and computes cell-specific size factors more robustly in the presence of zero inflation [65]. This approach yields more accurate estimates of scaling factors with a low runtime. Instead of using cell-specific factors, scnorm and sctransform use gene group-specific factors to perform normalization [66]. The scnorm method uses quantile regression to group genes with similar dependence on sequencing depth and estimates scaling factors for each group. Despite the longer runtime compared to scran, scnorm has better performance, especially for low-throughput high sequencing depth data. For single-cell UMI datasets, sctransform is the preferred statistical approach to perform normalization and variance stabilization transformation [67]. The sctransform method applies gene-specific negative binomial regression to effectively mitigate sequencing depth-dependent differences [66].
Alternatively, spike-in RNAs from the External RNA Control Consortium or housekeeping genes can be used to estimate size factors [68].
3.3. Differential Gene Expression Analysis
The goal of differential expression analysis is to identify genes with statistically significant differences in expression between two or more groups. This analysis can be used to quantify differences in single-cell transcriptomes across different developmental states, treatments, or disease conditions. In contrast to bulk RNA-Seq data, scRNA-seq data have a high proportion of zero or low read counts, which may reflect either cellular expression levels or failure to detect present transcripts (known as technical dropout). More-over, gene expression distributions show multimodality, reflecting the existence of multiple cell states within the population [11]. Although bulk methods do not account for the aforementioned characteristics, they have been widely used for single cell data [57].
Differential expression analysis tools developed specifically to address technical dropout in scRNA-seq data include MAST, SCDE, DEsingle, and Monocle2. In addition, Seurat provides several tests for differential expression testing. MAST uses a bimodal distribution and proposes a generalized linear model (GLM) to fit the data [69]. It also uses a fraction of genes detected in each cell as a proxy for technical or biological variation. MAST has better performance and computational efficiency than numerous bulk and single-cell methods [57]. Alternatively, SCDE uses a Bayesian approach to calculate differential expression and accounts for dropout and amplification events [70]. DEsingle employs a zero-inflated negative binomial regression model to estimate the proportion of the real and drop-out zeros [71]. Monocle2 proposes a generalized additive model (GAM) to fit the data and a Tobit model is to account for dropouts [72-74]. However, all of the aforementioned tools ignore multimodal distributions of scRNA-seq data.
A recently developed tool called D3E specifically addresses the multimodal distributions of single-cell data, but does not consider dropout events [75]. D3E fits the bursting model of transcriptional regulation [76, 77] to the data and compares the gene expression distribution in one group with respect to another giving estimates of burst size, duty cycle, frequency, and mean of transcription.
3.4. Single Cell Trajectory Inference
Single-cell transcriptome data can be used to model dynamic biological processes such as spermatogenesis by computationally inferring the order of cells along developmental trajectories. Trajectory inference, also called pseudotime analysis, places cells along a trajectory based on similarities in their expression patterns. One of the first trajectory inference tools was Monocle (now updated to Monocle3), which uses the reversed graph embedding machine learning technique to order single cells in an unsupervised manner without a priori knowledge of marker genes [72]. In addition, Monocle can also perform clustering and differential gene expression analysis to reveal novel cell states and their relationships. However, pseudo-time analysis is limited by the use of steady-state mRNA abundances, which are static snapshots of gene expression. Numerous trajectory inference tools have been developed, and several published reviews have compared a portion of them in detail [78-80]. RNA velocity analysis is an alternative approach which utilizes the relative abundance of nascent (unspliced) and mature (spliced) mRNA to derive a high-dimensional vector that empirically predicts the future states of individual cells [81]. The ratio of spiced to unspliced transcripts at a single time point can also be used to infer the dynamics of expression for individual genes.
3.5. Analysis of Single-Cell RNA-Seq Data Using Seurat
Numerous single-cell RNA-seq platforms are available that employ different protocols to generate a gene expression matrix. The computational tools used for bulk RNA-seq read mapping and expressing quantification can also be employed for scRNA-seq analysis (Table 1). This section covers a basic computational work-flow for analysis of scRNA-seq data from the widely used 10× Genomics platform. Cell ranger is used for processing sequencing data (FASTQ files) into read counts. The R package Seurat provides a method for integration of multiple datasets across technologies using a normalization method called SCTransform.
3.5.1. Generation of a Gene Expression Count Matrix for Seurat Analysis
Install Cell Ranger from the 10* Genomics website in a Unix/Linux environment.
Make a directory to store the raw sequencing data using the mkdir command. Download the FASTQ files (.tar files) and extract using the tar command. This outputs two sets of FASTQ files with the following naming convention: Sample_S1_L00X_R1_001.fastq.gz.
Download a prebuilt reference transcriptome from the 10× Genomics support site using wget command and decompress it using the tar command (see Note 3).
To generate read counts from the fastq files and reference transcriptome, run the cellranger count command on each FASTQ file (see Note 4). This outputs a folder called //RUN_-NAME/outs/filtered_feature_bc_matrix. The outs directory contains matrix.mtx, genes.tsv (or features.tsv), and barcodes. tsv files, which can be used as input for the Seurat package.
Install and load the packages Seurat, ggplot2, and patchwork within the R environment. Specify the working directory to save outputs from R using setwd command.
Import single-cell expression data into Seurat using the Read10X function by specifying the directory containing the matrix.mtx, genes.tsv, and barcodes.tsv files generate in step 1.
Create a Seurat object using CreateSeuratObject, which contains the non-normalized expression values (raw counts or TPM). The min.cell argument includes expression values for genes that are expressed in at least three cells and the min. features argument includes only cells with at least 200 expressed genes. Repeat steps 3 and 4 for all datasets to be analyzed, providing unique names for each Seurat object.
To create a multisample dataset, merge all data into one single Seurat object (named “testis” herein) using merge by passing a vector of multiple Seurat objects to the y parameter. Specify which dataset a cell comes from by setting the add.cell.ids parameter. The original cell ID is automatically stored in the object meta data under orig.ident.
3.5.2. Quality Control and Filtering Cells
Determine the percentage of reads mapped to mitochondrial genome and store as metadata in the testis R object. Note that gene nomenclature differs among species. Thus, identification of mouse mitochondrial genes, for example, requires setting pattern = "^mt-".
Visualize QC metrics such as the number of expressed genes (nFeature_RNA), the number of UMI (nCount_RNA), and the percentage of mitochondrial counts as violin plots using the VlnPlot function (Fig. 5). Scatter plots of feature-feature relationships are generated using the FeatureScatter function (see Note 5 and 6).
We can remove outliers, doublets, or potentially apoptotic cells based on read counts, feature counts, or mitochondrial gene content. These values are different for each dataset and should be adjusted appropriately to remove outliers while retaining higher-quality cells. The violin plots and feature scatters generated in the previous step are used to determine appropriate filtering parameters for each dataset. For example, here we use the subset function to remove cells that have unique feature counts over 2500 or less than 200 and cells that have greater than 5% mitochondrial counts.
Fig. 5.
Violin plots showing quality control metrics including the number of expressed genes (nFeature_RNA), the number of UMI (nCount_RNA) and the percentage of mitochondrial counts (percent.mt) for the datasets. Color indicates the identity of the dataset the cells originate from
3.5.3. SCTransform Normalization and Integration of Multiple Datasets
Setup a list of Seurat objects by splitting the merged object from step 5 into the original datasets using the SplitObject function (see Note 7).
Perform SCTransform on each of the datasets individually. This single command performs normalization, variance stabilization, and selection of variable genes. We can regress out uninteresting sources of variation in the expression values in order to improve dimensionality reduction and clustering. Here, we regress out the mitochondrial mapping percentage using the vars.to.regress argument (see Notes 8 and 9).
Select the features to use when integrating multiple datasets (n = 3000) using the SelectIntegrationFeatures function (see Note 10). Prepare the object list for integration by calculating the Pearson residuals using the PrepSCTIntegration function.
Identify anchors using the FindIntegrationAnchors function and integrate the datasets using the IntegrateData function, specifying the SCTransform normalization method in both functions.
3.5.4. Cell Clustering and Visualization
Using the integrated object (“testis.integrated”) generated in the previous step, perform linear dimensional reduction using PCA and non-linear dimensional reduction using UMAP (Fig. 6) or tSNE embedding. When using sctransform, a higher number of principle components can be specified using the dims argument compared to the standard normalization in Seurat without introducing undue variation.
Perform shared nearest neighbor (SNN) graph construction using the function FindNeighbors, followed by cluster determi-nation using FindClusters. The number of clusters can be increased or decreased using the resolution parameter. The appropriate number of clusters depends upon sample type, biological context, and goals of the study. The resulting data can be visualized using PCA, UMAP, or tSNE plots generated using the function DimPlot and specifying the reduction method in the reduction argument. To generate UMAP plots for each individual dataset, use the split.by parameter in the function DimPlot,
Fig. 6.
Unsupervised clustering of all cells in UMAP plot colored by cluster identity
3.5.5. Differential Gene Expression Analysis
Differential expression analysis is performed on the RNA assay object, which can be changed using the DefaultAssay function. Identify markers (differentially expressed genes) for each cluster compared to all remaining clusters using the function FindAllMarkers (see Note 11). The resulting table is a ranked list of cluster markers with associated statistics including p-value (p_val), adjusted p-value (p_val_adj), the percentage of cells where the gene is detected in the first group (pct.1), the percentage of cells where the gene is detected in the second group (pct.2), and log fold-change of the average expression between two groups.
Identify the top 10 markers for each cluster and generate an Excel file of the results.
To visualize how gene expression changes across different clusters, various plotting functions are available in Seurat to generated dot plots (Fig. 7), feature plots (Fig. 8), or violin plots (Fig. 9). Input a gene or gene list into the features parameter.
Following differential gene expression analysis, assign cell type identity to the clusters based on the expression of known marker genes as well as identify potential novel cell types. First select the clusters using the WhichCells function and change the identity of these cells using the SetIdent function (see Note 12).
For focused analyses of a particular cell type, we can select cells based on the identity assigned in the previous step and subset the Seurat object using the subset function. The same down-stream analysis shown in steps 4–18 can then be rerun on this dataset.
Fig. 7.
Dot plot showing the expression levels of a marker genes in each cluster. The size of the dots represents the percentage of cells which express the given gene within a cluster. The color of the dot encodes the average expression level across all cells within a cluster (blue denotes high expression)
Fig. 8.
UMAP plot colored by the expression level of the germ cell marker DDX4 (blue denotes high expression)
Fig. 9.
Violin plot showing the expression levels of selected markers of germ (DDX4), Sertoli (SOX9, WT1), Leydig (INSL3), peritubular myoid (ACTA2), and endothelial cells (PECAM1)
4. Notes
In DESeq2, the design formula should be simplified to include only the most important covariates.
Using the IfcShrink function, if your test is apeglm, then you cannot specify a contrast and must specify a coefficient.
10× Genomics provides pre-built custom references for human and mouse genomes. For additional species, a custom reference genome can be built using the cellranger mkref command.
(Optional) To aggregate or combine two cellranger count runs together, use the cellranger aggr command. Specify a .csv file containing the path to hdf5 files.
cellranger aggr --id=1k_10K_human_aggr --csv=human_aggr.csv
-
5.
In Seurat, visualize QC metrics including nCount_RNA, nFeature_RNA and percent.mt to choose appropriate cutoffs for the data. Alternatively, percent.mt and cell cycle effects can be regressed out during the SCTransform step.
-
6.
Following QC, check the dimensions of the expression matrix to calibrate the stringency of the cutoffs before proceeding with downstream analysis.
-
7.
testis@raw.data is a slot in the Seurat object that stores the original gene expression matrix. testis@ident stores the sample ID’s of the cells and can be used to store sample metadata.
-
8.
To determine the number of statistically significant PCs, use the Jackstraw function. Alternatively, the PCElbowPlot function is recommended for larger datasets to reduce computational time. When using SCTransform, a greater number of PC’s can be used without affecting the results.
-
9.
To evaluate the effects of the cell cycle, run the function Cell-CycleScoring prior to SCTransform.
-
10.
By default, SCTransform outputs 3000 most variable genes. It may be beneficial to adjust this parameter higher for larger dataset using the variable.features.n argument.
-
11.
The differential expression test can be specified in the Find-Markers function using the test.use argument. The tests supported are wilcox (default), bimod, roc, t, poisson, negbinom, LR, MAST, and DESeq2.
-
12.
After assigning cell type identity to the clusters, differential expression analysis can be performed on the user-defined groups using the FindMarkers function by specifying the groups using the ident.1 and ident.2 arguments.
Box 1. Install BiocManager and load required packages.
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("DESeq2", "ggplot2", "apeglm",
"gplots")
library(DESeq2)
library(ggplot2)
library(apeglm)
library(gplots)
Box 2. Load read count matrix.
cts <- read.csv(’counts.csv’, header=TRUE, sep = ",") head(cts) coldata <- read.csv(’metadata.csv’, row.names=1) head(coldata) coldata$condition <- factor(coldata$condition) cts <- cts[, rownames(coldata)] all(rownames(coldata) == colnames(cts))
Box 3. Construct DESeqDataSet object in DESeq2.
dds <- DESeqDataSetFromMatrix(countData = cts, colData = coldata, design = ~ condition) dds
Box 4. Perform data filtering.
keep <- rowSums(counts(dds)) >= 10 dds <- dds[keep,]
Box 5. Differential expression testing.
dds$condition <-factor(dds$condition, levels = c
("Tspan8low", "Tspan8high"))
Box 6. Summary of differential gene expression results.
dds <- DESeq(dds) res <- results(dds) res resultsNames(dds)
Box 7. Log fold change shrinkage.
resLFC <- lfcShrink(dds, coef="condition_Tspan8- high_vs_Tspan8low", type="apeglm") resLFC
Box 8. Sort DE results by p-value.
resOrdered <- res[order(res$pvalue),] summary(res) write.csv(as.data.frame(resOrdered), file="results_orderedbypadj.csv")
Box 9. Plot MA-plot and counts.
plotMA(res, ylim=c(−2,2), labels=FALSE) plotCounts(dds, gene=which.min(res$padj), intgroup="condition", labels=FALSE)
Box 10. Data transformation.
ntd <-normTransform(dds) vsd <-vst(dds, blind=FALSE) rld <-rlog(dds, blind=FALSE) head(assay(vsd), 3)
Box 11. PCA Plot.
plotPCA(vsd, intgroup=c("condition"))
Box 12. Plot Heatmap.
select <- order(rowMeans(counts(dds,normalized=TRUE)), decreasing=TRUE)[1:100] heatmap.2(as.matrix(assay(ntd)[select,],), scale="row", hclust=function(x) hclust(x,method="average"), distfun=function(x) as.dist((1-cor(t(x)))/2), trace="none", density="none", labRow="", cexCol=0.7)
Box 13. Download FASTQ files.
mkdir ~/human_cellranger_counts cd ~/human_cellranger_counts tar-xvf human_1k_v3_fastqs.tar
Box 14. Install Seurat.
install.packages("Seurat", "ggplot2", "patchwork")
library(Seurat)
library(ggplot2)
library(patchwork)
library(future)
Box 15. Load single cell gene expression data in Seurat.
Sample1_1.data <-Read10X(data.dir = "/work/hermannlab/ sample1_1/filtered_feature_bc_matrix/") Sample1_2.data <-Read10X(data.dir = "/work/hermannlab/ sample1_2/filtered_feature_bc_matrix/") Sample1_3.data <-Read10X(data.dir = "/work/hermannlab/ sample1_3/filtered_feature_bc_matrix/'}
Box 16. Create SeuratObject.
Sample1_1<-CreateSeuratObject(counts = Sample1_1.data, project = "HumanscRNAseq", min.cells = 3, min.features = 200) Sample1_2<-CreateSeuratObject(counts = Sample1_2.data, project = "HumanscRNAseq", min.cells = 3, min.features = 200) Sample1_3<-CreateSeuratObject(counts = Sample1_3.data, project = "HumanscRNAseq", min.cells = 3, min.features = 200)
Box 17. Merge datasets.
testis <-merge(Sample1_1, y = c(Sample1_2, Sample1_3),
add.cell.ids = c("Sample1_1", " Sample1_2", "Sample1_3",
project = "HumanscRNAseq")
Box 18. Calculate mitochondrial quality control metrics.
testis[["percent.mt"]] <-PercentageFeatureSet(testis, pattern = "^MT-")
Box 19. Visualize QC metrics.
VlnPlot(testis, features = c("nFeature_RNA", "nCount_R-
NA", "percent.mt"), ncol = 3)
FeatureScatter(testis, feature1 = "nCount_RNA", feature2
= "nFeature_RNA")
FeatureScatter(testis, feature1 = "nCount_RNA", feature2
= "percent.mt")
FeatureScatter(testis, feature1 = "nFeature_RNA", fea-
ture2 = "percent.mt")
Box 20. Data filtering using QC metrics.
testis <-subset(testis, subset = nFeature_RNA > 200 & nFeature_RNA < 2500 & percent.mt < 5)
Box 21. Create list of Seurat objects.
testis.list <-SplitObject(testis, split.by = "orig. ident")
Box 22. Data transformation using SCTransform.
for (i in 1:length(testis.list)) {
testis.list[[i]] <-SCTransform(testis.list[[i]],
verbose = FALSE, vars.to.regress=c("percent.mt"))
}
Box 23. Feature selection for Data Integration.
testis.features <-SelectIntegrationFeatures(object.list = testis.list, nfeatures = 3000) options(future.globals.maxSize=9000000000) testis.list <-PrepSCTIntegration(object.list = testis. list, anchor.features = testis.features, verbose = FALSE)
Box 24. Perform data integration.
testis.anchors <-FindIntegrationAnchors(object.list = testis.list, normalization.method = "SCT", anchor.features = testis.features, verbose = FALSE) testis.integrated <-IntegrateData(anchorset = testis. anchors, normalization.method = "SCT", verbose = FALSE)
Box 25. Perform dimensional reduction.
testis.integrated<-RunPCA(testis.integrated, verbose=- FALSE) testis.integrated <-RunUMAP(testis.integrated, dims = 1: 50, label=TRUE) testis.integrated <-RunTSNE(testis.integrated, dims = 1: 50, label=TRUE)
Box 26. Perform cell clustering.
testis.integrated <-FindNeighbors(testis.integrated, dims = 1:50) testis.integrated <-FindClusters(testis.integrated) table(Idents(testis.integrated), 5) DimPlot(testis.integrated, reduction = "PCA", label=- TRUE) DimPlot(testis.integrated, reduction = "umap", label=- TRUE) DimPlot(testis.integrated, reduction = "tsne", label=- TRUE) baseplot<-DimPlot(testis.integrated, reduction = "umap", pt.size= 0.3, split.by = "orig.ident") baseplot + NoLegend()
Box 27. Perform differential expression testing.
DefaultAssay(testis.integrated) <-"RNA" testis.markers<-FindAllMarkers(object=testis.integrated, only.pos=TRUE, min.pct=0.25)
Box 28. Sort DE results by log fold-change.
top10<-testis.markers %>% group_by(cluster) %>% top_n (10, avg_logFC) write.csv(top10, "testis.top10.txt", sep="\t")
Box 29. Visualize differential gene expression using Dot plot, Violin plot and Feature plot.
DotPlot(object = testis.integrated, features = c("DDX4",
"ACTB", "GATA4", "SOX9", "WT1", "INSL3", "ACTA2", "PE-
CAM1"))
FeaturePlot(testis.integrated, features = c("DDX4"), pt.
size =1)
VlnPlot(testis.integrated, features = c("DDX4"), pt.size
= F)
Box 30. Assign cell type identity to clusters.
cells.use1 <-WhichCells(object = testis.integrated,
idents = c("10", "13", "15"))
testis.integrated <-SetIdent(object = testis.inte-
grated, cells = cells.use1, value = 'Germ cells')
Box 31. Subset data by cell type identity.
GC<-subset(testis.integrated, idents = "Germ cells"))
Acknowledgments
The authors graciously acknowledge computational support received from UTSA’s HPC cluster SHAMU, operated by the Research Computing Support Group, Office of Information Technology. Some data used in this chapter were generated in the UTSA Genomics Core, which is supported by NIH grant G12 MD007591 and NSF grants DBI-1337513 and DBI-2018408. This work was supported by NIH grants R01 HD90007 and U01 DA054179.
References
- 1.Sha J, Zhou Z, Li J, Yin L, Yang H, Hu G, Luo M, Chan HC, Zhou K, Zhu H, Zhu H, Shan Y, Lin M, Wang L, Cheng L, Zhou Y, Wang Y (2002) Identification of testic development and spermatogenesis-related genes in human and mouse testes using cDNA arrays. Mol Hum Reprod 8:511. 10.1093/molehr/8.6.511 [DOI] [PubMed] [Google Scholar]
- 2.Feig C, Kirchhoff C, Ivell R, Naether O, Schulze W, Spiess AN (2007) A new paradigm for profiling testicular gene expression during normal and disturbed human spermatogenesis. Mol Hum Reprod 13:33. 10.1093/molehr/gal097 [DOI] [PubMed] [Google Scholar]
- 3.Green CD, Ma Q, Manske GL, Shami AN, Zheng X, Marini S, Moritz L, Sultan C, Gurczynski SJ, Moore BB, Tallquist MD, Li JZ, Hammoud SS (2018) A comprehensive roadmap of murine spermatogenesis defined by single-cell RNA-Seq. Dev Cell 46:651–667. e10. 10.1016/j.devcel.2018.07.025 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Hermann BP, Cheng K, Singh A, Roa-De La Cruz L, Mutoji KN, Chen I-C, Gildersleeve H, Lehle JD, Mayo M, Westernstroer B, Law NC, Oatley MJ, Velte EK, Niedenberger BA, Fritze D, Silber S, Geyer CB, Oatley JM, McCarrey JR (2018) The mammalian spermatogenesis single-cell transcriptome, from spermatogonial stem cells to spermatids. Cell Rep 25:1650–1667.e8. 10.1016/j.celrep.2018.10.026 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Guo J, Grow EJ, Mlcochova H, Maher GJ, Lindskog C, Nie X, Guo Y, Takei Y, Yun J, Cai L, Kim R, Carrell DT, Goriely A, Hotaling JM, Cairns BR (2018) The adult human testis transcriptional cell atlas. Cell Res 28:1141–1157. 10.1038/s41422-018-0099-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Shima JE, McLean DJ, McCarrey JR, Griswold MD (2004) The murine testicular transcriptome: characterizing gene expression in the testis during the progression of spermatogenesis1. Biol Reprod 71:319–330. 10.1095/biolreprod.103.026880 [DOI] [PubMed] [Google Scholar]
- 7.Ikami K, Tokue M, Sugimoto R, Noda C, Kobayashi S, Hara K, Yoshida S (2015) Hierarchical differentiation competence in response to retinoic acid ensures stem cell maintenance during mouse spermatogenesis. Development 142:1582–1592. 10.1242/dev.118695 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Laiho A, Kotaja N, Gyenesei A, Sironen A (2013) Transcriptome profiling of the murine testis during the first wave of spermatogenesis. PLoS One 8:e61558. 10.1371/journal.pone.0061558 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Chen Y, Zheng Y, Gao Y, Lin Z, Yang S, Wang T, Wang Q, Xie N, Hua R, Liu M, Sha J, Griswold MD, Li J, Tang F, Tong M-hH (2018) Single-cell RNA-seq uncovers dynamic processes and critical regulators in mouse spermatogenesis. Cell Res 28:879–896. 10.1038/s41422-018-0074-y [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Helsel AR, Yang Q-E, Oatley MJ, Lord T, Sablitzky F, Oatley JM (2017) ID4 levels dictate the stem cell state in mouse spermatogonia. Development 144:624–634. 10.1242/dev.146928 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Mutoji K, Singh A, Nguyen T, Gildersleeve H, Kaucher Av, Oatley MJ, Oatley JM, Velte EK, Geyer CB, Cheng K, McCarrey JR, Hermann BP (2016) TSPAN8 expression distinguishes spermatogonial stem cells in the prepubertal mouse testis. Biol Reprod 95:117. 10.1095/biolreprod.116.144220 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Cheng K, Chen IC, Cheng CHE, Mutoji K, Hale BJ, Hermann BP, Geyer CB, Oatley JM, McCarrey JR (2020) Unique epigenetic programming distinguishes regenerative Spermatogonial stem cells in the developing mouse testis. iScience 23:101596. 10.1016/j.isci.2020.101596 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Sohni A, Tan K, Song HW, Burow D, de Rooij DG, Laurent L, Hsieh TC, Rabah R, Hammoud SS, Vicini E, Wilkinson MF (2019) The neonatal and adult human testis defined at the single-cell level. Cell Rep 26:1501–1517. 10.1016/j.celrep.2019.01.045 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Suzuki S, McCarrey JR, Hermann BP (2021) An mTORC1-dependent switch orchestrates the transition between mouse spermatogonial stem cells and clones of progenitor spermatogonia. Cell Rep 34:108752. 10.1016/j.celrep.2021.108752 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Cockrum C, Kaneshiro KR, Rechtsteiner A, Tabuchi TM, Strome S (2020) A primer for generating and using transcriptome data and gene sets. Development 147:dev193854. 10.1242/dev.193854 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Nam DK, Lee S, Zhou G, Cao X, Wang C, Clark T, Chen J, Rowley JD, Wang SM (2002) Oligo(dT) primer generates a high frequency of truncated cDNAs through internal poly(a) priming during reverse transcription. Proc Natl Acad Sci U S A 99:6152. 10.1073/pnas.092140899 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Adiconis X, Borges-Rivera D, Satija R, Deluca DS, Busby MA, Berlin AM, Sivachenko A, Thompson DA, Wysoker A, Fennell T, Gnirke A, Pochet N, Regev A, Levin JZ (2013) Comparative analysis of RNA sequencing methods for degraded or low-input samples. Nat Methods 10:623. 10.1038/nmeth.2483 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Islam S, Zeisel A, Joost S, La Manno G, Zajac P, Kasper M, Lonnerberg P, Linnarsson S (2014) Quantitative single-cell RNA-seq with unique molecular identifiers. Nat Methods 11:163. 10.1038/nmeth.2772 [DOI] [PubMed] [Google Scholar]
- 19.Arezi B, Hogrefe H (2009) Novel mutations in Moloney Murine Leukemia Virus reverse transcriptase increase thermostability through tighter binding to template-primer. Nucleic Acids Res 37:473–481. 10.1093/nar/gkn952 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Tang F, Barbacioru C, Wang Y, Nordman E, Lee C, Xu N, Wang X, Bodeau J, Tuch BB, Siddiqui A, Lao K, Surani MA (2009) mRNA-Seq whole-transcriptome analysis of a single cell. Nat Methods 6:377. 10.1038/nmeth.1315 [DOI] [PubMed] [Google Scholar]
- 21.Sasagawa Y, Nikaido I, Hayashi T, Danno H, Uno KD, Imai T, Ueda HR (2013) Quartz-Seq: a highly reproducible and sensitive single-cell RNA sequencing method, reveals nongenetic gene-expression heterogeneity. Genome Biol 14:1–17. 10.1186/gb-2013-14-4-r31 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Islam S, Kjällquist U, Moliner A, Zajac P, Fan JB, Lönnerberg P, Linnarsson S (2011) Characterization of the single-cell transcriptional landscape by highly multiplex RNA-seq. Genome Res 21:1160–1167. 10.1101/gr.110882.110 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Ramsköld D, Luo S, Wang YC, Li R, Deng Q, Faridani OR, Daniels GA, Khrebtukova I, Loring JF, Laurent LC, Schroth GP, Sandberg R (2012) Full-length mRNA-Seq from single-cell levels of RNA and individual circulating tumor cells. Nat Biotechnol 30:777. 10.1038/nbt.2282 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Hashimshony T, Wagner F, Sher N, Yanai I (2012) CEL-Seq: single-cell RNA-Seq by multiplexed linear amplification. Cell Rep 2:666–673. 10.1016/j.celrep.2012.08.003 [DOI] [PubMed] [Google Scholar]
- 25.Andrews S (2010) FastQC, Babraham Bioinforma [Google Scholar]
- 26.Bolger AM, Lohse M, Usadel B (2014) Trimmomatic: a flexible trimmer for illumina sequence data. Bioinformatics 30:2114–2120. 10.1093/bioinformatics/btu170 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J 17:10–12. 10.14806/ej.17.1.200 [DOI] [Google Scholar]
- 28.Williams CR, Baccarella A, Parrish JZ, Kim CC (2016) Trimming of sequence reads alters RNA-Seq gene expression estimates. BMC Bioinf 17:103. 10.1186/s12859-016-0956-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Wang L, Wang S, Li W (2012) RSeQC: quality control of RNA-seq experiments. Bioinformatics 28:2184–2185. 10.1093/bioinformatics/bts356 [DOI] [PubMed] [Google Scholar]
- 30.Langmead B, Salzberg SL (2012) Fast gappedread alignment with Bowtie 2. Nat Methods 9: 357. 10.1038/nmeth.1923 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows-Wheeler trans-form. Bioinformatics 25:1754–1760. 10.1093/bioinformatics/btp324 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Trapnell C, Pachter L, Salzberg SL (2009) TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25:1105–1111. 10.1093/bioinformatics/btp120 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR(2013) STAR: Ultrafast universal RNA-seq aligner. Bioinformatics 29:15–21. 10.1093/bioinformatics/bts635 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Kim D, Langmead B, Salzberg SL (2015) HISAT: a fast spliced aligner with low memory requirements. Nat Methods 12:357–360. 10.1038/nmeth.3317 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Engström PG, Steijger T, Sipos B, Grant GR, Kahles A, Rätsch G, Goldman N, Hubbard TJ, Harrow J, Guigó R, Bertone P, Alioto T, Behr J, Bohnert R, Campagna D, Davis CA, Dobin A, Gingeras TR, Jean G, Kosarev P, Li S, Liu J, Mason CE, Molodtsov V, Ning Z, Ponstingl H, Prins JF, Ribeca P, Seledtsov I, Solovyev V, Valle G, Vitulo N, Wang K, Wu TD, Zeller G (2013) Systematic evaluation of spliced alignment programs for RNA-seq data. Nat Methods 10:1185–1191. 10.1038/nmeth.2722 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Patro R, Duggal G, Kingsford C (2015) Salmon: accurate, versatile and ultrafast quantification from RNA-seq data using lightweight-alignment. BioRxiv [Google Scholar]
- 37.Bray NL, Pimentel H, Melsted P, Pachter L (2016) Near-optimal probabilistic RNA-seq quantification. Nat Biotechnol 34:525. 10.1038/nbt.3519 [DOI] [PubMed] [Google Scholar]
- 38.Srivastava A, Malik L, Sarkar H, Zakeri M, Almodaresi F, Soneson C, Love MI, Kingsford C, Patro R (2019) Alignment and mapping methodology influence transcript abundance estimation. BioRxiv. 10.1101/657874 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Goecks J, Nekrutenko A, Taylor J, Afgan E, Ananda G, Baker D, Blankenberg D, Chakrabarty R, Coraor N, Goecks J, Von Kuster G, Lazarus R, Li K, Taylor J, Vincent K (2010) Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol 11:1–13. 10.1186/gb-2010-11-8-r86 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Benjamin AM, Nichols M, Burke TW, Ginsburg GS, Lucas JE (2014) Comparing reference-based RNA-Seq mapping methods for non-human primate data. BMC Genomics 15:570. 10.1186/1471-2164-15-570 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, Van Baren MJ, Salzberg SL, Wold BJ, Pachter L (2010) Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol 28:511–515. 10.1038/nbt.1621 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Li B, Dewey CN (2011) RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinf 12:323. 10.1186/1471-2105-12-323 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Pertea M, Pertea GM, Antonescu CM, Chang TC, Mendell JT, Salzberg SL (2015) StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat Biotechnol 33: 290–295. 10.1038/nbt.3122 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Liao Y, Smyth GK, Shi W (2014) Feature-Counts: An efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30:923–930. 10.1093/bioinformatics/btt656 [DOI] [PubMed] [Google Scholar]
- 45.Bullard JH, Purdom E, Hansen KD, Dudoit S (2010) Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments. BMC Bioinf 11:1–13. 10.1186/1471-2105-11-94 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Robinson MD, Oshlack A (2010) A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol 11:1–9. 10.1186/gb-2010-11-3-r25 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Vallejos CA, Risso D, Scialdone A, Dudoit S, Marioni JC (2017) Normalizing single-cell RNA sequencing data: challenges and opportunities. Nat Methods 14:565. 10.1038/nmeth.4292 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Anders S, Huber W (2010) Differential expression analysis for sequence count data. Genome Biol 11. 10.1186/gb-2010-11-10-r106 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Smyth GK (2005) Limma: linear models for microarray data. Bioinforma Comput Biol Solut Using R Bioconductor. 10.1007/0-387-29362-0_23 [DOI] [Google Scholar]
- 50.Bolstad BM, Irizarry RA, Åstrand M, Speed TP (2003) A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19: 185–193. 10.1093/bioinformatics/19.2.185 [DOI] [PubMed] [Google Scholar]
- 51.Wagner GP, Kin K, Lynch VJ (2012) Measurement of mRNA abundance using RNA-seq data: RPKM measure is inconsistent among samples. Theory Biosci 131:281–285. 10.1007/s12064-012-0162-3 [DOI] [PubMed] [Google Scholar]
- 52.Conesa A, Madrigal P, Tarazona S, Gomez-Cabrero D, Cervera A, McPherson A, Szcześniak MW, Gaffney DJ, Elo LL, Zhang X, Mortazavi A (2016) A survey of best practices for RNA-seq data analysis. Genome Biol 17:1–19. 10.1186/s13059-016-0881-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Abbas-Aghababazadeh F, Li Q, Fridley BL (2018) Comparison of normalization approaches for gene expression studies completed with highthroughput sequencing. PLoS One 13:e0206312. 10.1371/journal.pone.0206312 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Soneson C, Delorenzi M (2013) A comparison of methods for differential expression analysis of RNA-seq data. BMC Bioinf 14:91. 10.1186/1471-2105-14-91 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Rajkumar AP, Qvist P, Lazarus R, Lescai F, Ju J, Nyegaard M, Mors O, B0rglum AD, Li Q, Christensen JH (2015) Experimental validation of methods for differential gene expression analysis and sample pooling in RNA-seq. BMC Genomics 16:548. 10.1186/s12864-015-1767-y [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Costa-Silva J, Domingues D, Lopes FM (2017) RNA-Seq differential expression analysis: an extended review and a software tool. PLoS One 12:e0190152. 10.1371/journal.pone.0190152 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Van den Berge K, Perraudeau F, Soneson C, Love MI, Risso D, Vert JP, Robinson MD, Dudoit S, Clement L (2018) Observation weights unlock bulk RNA-seq tools for zero inflation and single-cell applications. Genome Biol 19:24. 10.1186/s13059-018-1406-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Zheng GXY, Terry JM, Belgrader P, Ryvkin P, Bent ZW, Wilson R, Ziraldo SB, Wheeler TD, McDermott GP, Zhu J, Gregory MT, Shuga J, Montesclaros L, Underwood JG, Masquelier DA, Nishimura SY, Schnall-Levin M, Wyatt PW, Hindson CM, Bharadwaj R, Wong A, Ness KD, Beppu LW, Deeg HJ, McFarland C, Loeb KR, Valente WJ, Ericson NG, Stevens EA, Radich JP, Mikkelsen TS, Hindson BJ, Bielas JH (2017) Massively parallel digital transcriptional profiling of single cells. Nat Commun 8:14049. 10.1038/ncomms14049 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Zhang X, Li T, Liu F, Chen Y, Yao J, Li Z, Huang Y, Wang J (2019) Comparative analysis of droplet-based ultra-high-throughput single -cell RNA-Seq systems. Mol Cell 73:130. 10.1016/j.molcel.2018.10.020 [DOI] [PubMed] [Google Scholar]
- 60.Cao J, Packer JS, Ramani V, Cusanovich DA, Huynh C, Daza R, Qiu X, Lee C, Furlan SN, Steemers FJ, Adey A, Waterston RH, Trapnell C, Shendure J (2017) Comprehensive single-cell transcriptional profiling of a multi-cellular organism. Science (80-) 357:661–667. 10.1126/science.aam8940 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Rosenberg AB, Roco CM, Muscat RA, Kuchina A, Sample P, Yao Z, Graybuck LT, Peeler DJ, Mukherjee S, Chen W, Pun SH, Sellers DL, Tasic B, Seelig G (2018) Single-cell profiling of the developing mouse brain and spinal cord with split-pool barcoding. Science (80-) 360:176–182. 10.1126/science.aam8999 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Ziegenhain C, Vieth B, Parekh S, Reinius B, Guillaumet-Adkins A, Smets M, Leonhardt H, Heyn H, Hellmann I, Enard W (2017) Comparative analysis of single-cell RNA sequencing methods. Mol Cell 65:631–643.e4. 10.1016/j.molcel.2017.01.023 [DOI] [PubMed] [Google Scholar]
- 63.Bloom JD (2018) Estimating the frequency of multiplets in single-cell RNA sequencing from cell-mixing experiments. PeerJ:6–e5578. 10.7717/peerj.5578 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Ilicic T, Kim JK, Kolodziejczyk AA, Bagger FO, McCarthy DJ, Marioni JC, Teichmann SA (2016) Classification of low quality cells from single-cell RNA-seq data. Genome Biol 17:1–15. 10.1186/s13059-016-0888-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Lun ATL, McCarthy DJ, Marioni JC (2016) A step-by-step workflow for low-level analysis of single-cell RNA-seq data with Bioconductor. F1000Research 5:2122. 10.12688/f1000research.9501.2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Hafemeister C, Satija R(2019) Normalization and variance stabilization of single-cell RNA--seq data using regularized negative binomial regression. Genome Biol 20:296. 10.1186/s13059-019-1874-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Zhang Y, Ma Y, Huang Y, Zhang Y, Jiang Q, Zhou M, Su J (2020) Benchmarking algorithms for pathway activity transformation of single-cell RNA-seq data. Comput Struct Biotechnol J 18:2953–2961. 10.1016/j.csbj.2020.10.007 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Brennecke P, Anders S, Kim JK, Kolodziejczyk AA, Zhang X, Proserpio V, Baying B, Benes V, Teichmann Sa, Marioni JC, Heisler MG (2013) Accounting for technical noise in single-cell RNA-seq experiments. Nat Methods 10:1093. 10.1038/nmeth.2645 [DOI] [PubMed] [Google Scholar]
- 69.Finak G, McDavid A, Yajima M, Deng J, Gersuk V, Shalek AK, Slichter CK, Miller HW, McElrath MJ, Prlic M, Linsley PS, Gottardo R (2015) MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data. Genome Biol 16:278. 10.1186/s13059-015-0844-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Kharchenko PV, Silberstein L, Scadden DT (2014) Bayesian approach to single-cell differential expression analysis. Nat Methods 11:740. 10.1038/nmeth.2967 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Miao Z, Deng K, Wang X, Zhang X (2018) DEsingle for detecting three types of differential expression in single-cell RNA-seq data. Bioinformatics 34:3223–3224. 10.1093/bioinformatics/bty332 [DOI] [PubMed] [Google Scholar]
- 72.Trapnell C, Cacchiarelli D, Grimsby J, Pokharel P, Li S, Morse M, Lennon NJ, Livak KJ, Mikkelsen tS, Rinn JL (2014) The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat Biotechnol 32:381–386. 10.1038/nbt.2859 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Qiu X, Mao Q, Tang Y, Wang L, Chawla R, Pliner HA, Trapnell C (2017) Reversed graph embedding resolves complex single-cell trajectories. Nat Methods 14:979. 10.1038/nmeth.4402 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Qiu X, Hill A, Packer J, Lin D, Ma YA, Trapnell C (2017) Single-cell mRNA quantification and differential analysis with Census. Nat Methods 14:309. 10.1038/nmeth.4150 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Delmans M, Hemberg M (2016) Discrete distributional differential expression (D3E) - a tool for gene expression analysis of single-cell RNA-seq data. BMC Bioinformatics, 17(1): 110. 10.1186/s12859-016-0944-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Chubb JR, Trcek T, Shenoy SM, Singer RH (2006) Transcriptional Pulsing of a Developmental Gene. Current Biology, 16(10): 1018–1025. 10.1016/jxub.2006.03.092 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Raj A, Peskin CS, Tranchina D, Vargas DY, Tyagi S (2006) Stochastic mRNA synthesis in mammalian cells. PLoS Biology, 4(10): 1707–1719. 10.1371/journal.pbio.0040309 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Saelens W, Cannoodt R, Todorov H, Saeys Y (2019) A comparison of single-cell trajectory inference methods. Nat Biotechnol 37:547. 10.1038/s41587-019-0071-9 [DOI] [PubMed] [Google Scholar]
- 79.Chen G, Ning B, Shi T (2019) Single-cell RNA-seq technologies and related computational data analysis. Front Genet:317. 10.3389/fgene.2019.00317 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Cannoodt R, Saelens W, Saeys Y (2016) Computational methods for trajectory inference from single-cell transcriptomics. Eur J Immunol 46:2496. 10.1002/eji.201646347 [DOI] [PubMed] [Google Scholar]
- 81.La Manno G, Soldatov R, Zeisel A, Braun E, Hochgerner H, Petukhov V, Lidschreiber K, Kastriti ME, Lönnerberg P, Furlan A, Fan J, Borm LE, Liu Z, van Bruggen D, Guo J, He X, Barker R, Sundstrom E, Castelo-Branco G, Cramer P, Adameyko I, Linnarsson S, Kharchenko PV (2018) RNA velocity of single cells. Nature 560:494–498. 10.1038/s41586-018-0414-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.Anders S, Pyl PT, Huber W (2015) HTSeq-A Python framework to work with high-throughput sequencing data. Bioinformatics 31:169–169. 10.1093/bioinformatics/btu638 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Robinson MD, McCarthy DJ, Smyth GK (2009) edgeR: a bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26:139–140. 10.1093/bioinformatics/btp616 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84.McCarthy DJ, Chen Y, Smyth GK (2012) Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation. Nucleic Acids Res 40: 4288–4297. 10.1093/nar/gks042 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85.Love MI, Huber W, Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 15:1–21. 10.1186/s13059-014-0550-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86.Hardcastle TJ, Kelly KA (2010) BaySeq: empirical Bayesian methods for identifying differential expression in sequence count data. BMC Bioinf 11:1–14. 10.1186/1471-2105-11-422 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87.Leng N, Dawson JA, Thomson JA, Ruotti V, Rissman AI, Smits BMG, Haag Jd, Gould MN, Stewart RM, Kendziorski C (2013) EBSeq: an empirical Bayes hierarchical model for inference in RNA-seq experiments. Bioinformatics 29:1035–1043. 10.1093/bioinformatics/btt087 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 88.Trapnell C, Hendrickson DG, Sauvageau M, Goff L, Rinn JL, Pachter L (2013) Differential analysis of gene regulation at transcript resolution with RNA-seq. Nat Biotechnol 31:46–53. 10.1038/nbt.2450 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 89.Tarazona S, Garcia-Alcalde F, Dopazo J, Ferrer A, Conesa A (2011) Differential expression in RNA-seq: a matter of depth. Genome Res 21:2213–2223. 10.1101/gr.124321.111 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 90.Tarazona S, Furió-Tarí P, Turra D, Di Pietro A, Nueda MJ, Ferrer A, Conesa A (2015) Data quality aware analysis of differential expression in RNA-seq with NOISeq R/Bioc package. Nucleic Acids Res 43:e140. 10.1093/nar/gkv711 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 91.Li J, Tibshirani R (2013) Finding consistent patterns: a nonparametric approach for identifying differential expression in RNA-Seq data. Stat Methods Med Res 22:519–536. 10.1177/0962280211428386 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 92.Law CW, Chen Y, Shi W, Smyth GK (2014) Voom: precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biol 15:1–17. 10.1186/gb-2014-15-2-r29 [DOI] [PMC free article] [PubMed] [Google Scholar]









