Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2012 Nov 3.
Published in final edited form as: IEEE/ACM Trans Comput Biol Bioinform. 2012 Sep-Oct;9(5):1281–1292. doi: 10.1109/TCBB.2012.83

CEDER: Accurate detection of differentially expressed genes by combining significance of exons using RNA-Seq

Lin Wan 1, Fengzhu Sun 2
PMCID: PMC3488134  NIHMSID: NIHMS407820  PMID: 22641709

Abstract

RNA-Seq is widely used in transcriptome studies, and the detection of differentially expressed genes (DEGs) between two classes of individuals, e.g. cases vs controls, using RNA-Seq is of fundamental importance. Many statistical methods for DEG detection based on RNA-Seq data have been developed and most of them are based on the read counts mapped to individual genes. On the other hand, genes are composed of exons and the distribution of reads for the different exons can be heterogeneous. We hypothesize that the detection accuracy of differentially expressed genes can be increased by analyzing individual exons within a gene and then combining the results of the exons. We therefore developed a novel program, termed CEDER, to accurately detect DGEs by combining the significance of the exons. CEDER first tests for differentially expressed exons yielding a p-value for each, and then gives a score indicating the potential for a gene to be differentially expressed by integrating the p-values of the exons in the gene. We showed that CEDER can significantly increase the accuracy of existing methods for detecting DEGs on two benchmark RNA-Seq datasets and simulated datasets.

Index Terms: RNA-Seq, gene expression, differentially expressed gene, high-throughput sequencing, combined p-value statistic

1 Introduction

THE detection of differentially expressed genes (DEGs) between different classes of samples, e.g. cases vs controls, treatments vs untreated individuals, samples from different tissues, or same tissue but different locations, etc. is a fundamental and important problem for trancriptome studies. Many biological downstream analysis are based on the accurate information of DEGs. Microarrays were originally widely used for detecting DEGs genome-wide, and many statistical methods have been developed for detecting DEGs using microarrays [1]. Recently, with the rapid development of next generation high throughput sequencing technologies including Illumina Selexa, Roche 454 and Applied Biosystem’s SOLiD, RNA-Seq is being widely used in trancriptome studies as an attractive alternative to microarrays [2], [3]. Compared to microarrays, RNA-Seq has shown superior accuracy in the measurement of gene expression levels [4], [5], and in the study of alternative splicing [6], [7]. Although great achievements have been made by RNA-Seq, it raises great challenges in computational and statistical approaches for the normalization and analysis of RNA-Seq data [8]. Among these challenges, accurate detection of differentially expressed genes using RNA-Seq data is essential and many statistical methods have been developed to achieve this objective (see Oshlack et al. for an excellent review [9]).

Microarray gives continuous measurements of transcript abundances using probe intensities. On the other hand, RNA-Seq gives discrete counts of the numbers of reads mapped to the transcripts. Such a difference makes the underlying statistical models for RNA-Seq signals (reads) different from those for microrray signals (probe intensities) [8], eventually resulting in different statistics used for detecting DEGs for RNA-Seq and microarrays [9]. The Poisson process was first used to model the distribution of RNA-Seq reads along the genome [4], [10] and software based on the Poisson distribution has been developed to detect DEGs [11]. However, the Poisson distribution assumption on the number of reads cannot capture the variability of the RNA-Seq data well, leading to high false positive rates in the detection of DEGs [9]. The negative binomial distribution was then developed to model the RNA-Seq read count data in order to account for the high variability of the read count data [12]. The edgeR [13] and DESeq [14], both of which are based on negative binomial distribution for the number of reads mapped to the genes, are the most widely used programs in DEG detection. The two methods made different assumptions on the variance of the negative binomial distribution and were implemented by different strategies to estimate the parameters for the dispersion term of the negative binomial model. In addition, Srivastava and Chen used the generalized Poisson model to model RNA-Seq data aiming to account for non-uniform distribution of the reads along the genome, and developed a tool, GPSeq, for the detection of DEGs [15].

In addition to the problem of modeling the read distribution in RNA-Seq, one major difficulty raised in the accurate detection of DEGs based on RNA-Seq data is that most RNA-Seq profiles have no or a limited number of replicates. Due to the many parameters of the statistical models for the read distribution in RNA-Seq, the statistical methods of DEG detection generally require a large number of replicates to accurately estimate the parameters and hence achieve high accuracy [16], [17]. However, in many biological studies, the number of replicates is generally low due to the high cost of RNA-Seq assay, as well as sample collection and preparation, leading, in turn, to low accuracy of DEG detection [16]. To overcome this problem, read counts in all the genes from the genome were proposed to estimate the parameters in the model and these estimated parameters were then used to detect DEGs [13], [14], [17]. The edgeR [13] and DESeq [16] are applicable as long as one class of samples has replicates and give the p-value that a gene is differentially expressed between the two classes. As p-value decreases with the number of reads and longer genes tend to have more reads, the p-value based approaches tend to find long genes. On the other hand, the ASC addresses the problem of detecting DEGs for RNA-Seq based on fold changes and is applicable to samples without replicates [17]. The ASC borrows information across genes to establish a prior distribution of sample variation, and uses the empirical Bayes method to report the posterior distribution of the log-fold-change for each gene. Instead of giving the significant result (p-value) of the statistical test that the gene is differentially expressed as other methods, ASC uses the posterior distribution of the log-fold-changes to rank genes for differential expression. Thus, ASC is more likely to select genes with large fold changes as DEGs. However, in real biological systems, the key genes of the gene networks, such as transcription factors, exert their functions in great efficiencies, leading to a moderate or mild fold change between their different functional states. Thus, merely using the log-fold-change as the criteria to select DEGs may miss many important genes of interest. We think that both p-value based approaches and fold change based approaches have merits when identifying biologically functional genes.

Most genes consist of multiple exons and the read counts in the exons can have different distributions. In addition, the numbers of reads in different exons within a gene can be very different. It is well known that RNA degradation renders the number of reads in the 3′ end of a gene significantly higher than that for the 5′ end. These observations prompt us to hypothesize that the accuracy of detecting DEGs can be increased by first testing for differential expressed exons and then combining the significance of the exons within a gene to detect differentially expressed genes. Based on these ideas, we developed an efficient strategy, termed Combining the significance of the Exons to detect Differentially Expressed gene for RNA-Seq (CEDER), to detect DEGs. The CEDER borrows information from the exons of each gene and checks whether the exons of each gene are consistently differentially expressed. Thus, CEDER makes the testing of the differential expression of each gene more robust than checking the gene as one unit. The CEDER is easy to implement based on existing statistical methods of DEG detection for RNA-Seq data.

The techniques of combining p-values have been extensively studied in meta-analysis across many different fields [18]. These techniques have also been applied to various biological problems to integrate information of different statistical tests and have achieved excellent results. For example, the combined p-value statistics have been used to integrate the genetic association testing results of multiple SNPs in a gene or genomic region [19]. In addition, the combined p-value statistics have been used to combine the p-values of the probes on microarrays, and increased the accuracy of detecting DEGs [20]; the combined p-value statistics have also been used to combine the p-values of the sliding windows in tiling arrays and achieved high accuracies in the detection of genomic regions enriched for transcription factor binding [21].

In this study, we applied the combined p-value statistics to CEDER for the detection of differentially expressed genes based on RNA-Seq data. In the implementation of CEDER, we first used a negative binomial distribution based method, DESeq [14], for DEG detection to detect differentially expressed exons. To combine the p-values of exons from DESeq, we studied three widely-used methods for combining the p-values: Fisher’s method [22], Stouffer’s Z-score method [23] and the minimum p-value method [24]. When applying the combined p-values by the three methods to select DEGs on the gold standard MicroArray Quality Control (MAQC) datasets [25] as well as simulated datasets, we showed that CEDER implemented using the Fisher’s and the minimum p-value methods can significantly increase the accuracy of DESeq for the detection of DEGs. Thus, CEDER provides a powerful alternative for the detection of differentially expressed genes.

The paper is organized as follows. In Section 2, we provide details on CEDER and the MAQC datasets for validation. In Section 3, we compare CEDER with several widely-used methods for detecting DEGs and show that CEDER can significantly increase the accuracy of detecting DEGs. The paper concludes with some discussion and conclusions.

2 Materials and Methods

We developed a novel strategy, CEDER, that can significantly increase the accuracy of the statistical methods for DEG detection based on RNA-Seq data. The CEDER integrates the significance of differential expression for different exons within a gene using existing DEG detection methods (e.g. DESeq [14]) for accurate detection of DEGs. The method is particularly useful when the distributions of reads in the exons within a gene are widely different that the read counts within a gene cannot be modeled with a relatively simple distribution such as the negative binomial distribution that have been widely used in most available methods [13], [14]. In the following subsections, we introduce the techniques of combining p-value statistics; relevant methods of DEG detection for RNA-Seq; simulation studies; and the MAQC datasets and data processing procedures used in this study.

2.1 Different methods to combine p-values of differential expression of exons

The CEDER first tests the differentially expressed exons with DESeq [14] using exon read counts as input. Consider a gene having k exons. For the i-th exon, i = 1, 2, ···, k, we first test the null hypothesis Hi0 that the exon has the same expression level between the two classes of samples versus the alternative hypothesis Hi1 that the exon is differentially expressed between the two classes using some testing procedure, e.g. DESeq [14]. Let pi be the resulting p-value.

In this study, we only consider reads mapped completely to the exons and discard those mapped to exon junctions. For given samples from the two classes, the numbers of reads mapped to the exons can be assumed to be independent since the exons do not overlap. We emphasize though that when the individual samples are considered random, the numbers of reads mapped to the exons are no longer independent because they relate to each other through the samples.

Since the p-values, p1, p2, …, pk, can be assumed independent for given samples, several methods for combining p-value statistics [26] for the exons within a gene can be used to detect differentially expressed genes. In this study, three widely used methods for combining p-values, Fisher’s method [22], Stouffer’s Z-score method [23] and the minimum p-value method [24], are studied.

2.1.1 The Fisher’s method

The Fisher’s method combines information across multiple tests using the statistic

χ2=-2i=1klnpi, (1)

where pi is the p-value of the i-th exon. If none of the exons are differentially expressed within the given samples, the pis are independent and follow a uniform distribution in the unit interval [0, 1]. The χ2 hence has a chi-square distribution with 2k degrees of freedom, where k is the number of exons tested, when no exons are differentially expressed. We use the score SFisher = − ln(P(Xχ2)) to rank the genes for differential expression, where X has a chi-square distribution with 2k degrees of freedom and χ2 is the value calculated from the observed data.

2.1.2 The Stouffer’s Z-score method

The Stouffer’s Z-score method [23] first transforms pi into Zi = Φ−1(1−pi) where Φ is the cumulative distribution function for the standard normal distribution, and then combines Zi using

Z=i=1kZik. (2)

When none of the exons are differentially expressed within the samples, Z has the standard normal distribution.

A weighted version of the Stouffer’s method was proposed as follows [23]

Zw=i=1kwiZii=1kwi2, (3)

where wis are non-negative weights. The Zw still has the standard normal distribution when none of the exons are differentially expressed. It was shown that the square root of sample size for each test can be used as an appropriate weight [23]. We thus choose wi=Ni, where Ni is the number of reads mapped to the i-th exon across the samples. Similar to the Fisher’s method, we use score SStouffer = − ln(P(Zwzw)) to rank the genes for differential expression where the value zw is obtained from the observed data.

2.1.3 The minimum p-value method

Let p[1] be the minimum of p1, p2, …, pk. When none of the exons are differentially expressed, p[1] has a beta distribution with parameters 1 and k. Similar as the above two methods, we use score Sminimum = − ln(P(Xp[1])) where X has the beta distribution with parameters 1 and k to rank the genes for differential expression [24].

Note that, for all the three methods, we do not refer the scores as joint p-values as the individual p-values p1, p2, ···, pk may be dependent when the samples are random. Thus, the scores defined above do not have the same meaning as the concept of log p-value in statics. These scores are only used to rank the genes for differential expression and do not provide statistical significance for differential expression. Thus, we evaluate these methods by comparing the set of truly differentially expressed genes with the top ranked genes based on these scores as well as other methods for detecting DEGs. We set the upper bounds of all scores to be 20, meaning that we give a score 20 for all genes with score over 20.

2.2 Methods for the detection of DEGs

Many statistical methods have been developed to detect DEGs based on RNA-Seq read count data [9]. Among these methods, we compared CEDER with two widely used programs edgeR [13] and DESeq [14]. The edgeR and DESeq methods were implemented with “edgeR” and “DESeq” packages in Bioconductor (Release Version 2.8) with their default settings. The inputs of edgeR and DESeq are the read counts of the genes. We compared edgeR and DESeq on MAQC benchmark data and found DESeq to be more accurate in detecting differentially expressed genes (see Fig. 3). We thus used DESeq to detect differentially expressed exons for CEDER. When using DESeq to detect the differentially expressed exons, the input to DESeq is the read counts of the exons. We studied extensively the performance of CEDER and DESeq using both simulated and the MAQC datasets.

Fig. 3.

Fig. 3

The distributions of p-values by edgeR and DESeq for the non-differentially expressed genes based on the MAQC data.

We also compared CEDER with two other methods, GPSeq [15] and ASC [17] using the MAQC datasets. We did not implement GPSeq in this study but directly compared the CEDER results with the GPSeq results on the MAQC benchmark data (dataset I) in [15]. Note that the input of GPSeq is the number of mapped reads in each position of the gene. The ASC [17] was implemented by the updated codes from the authors. The input of ASC is the read counts of the genes. Note that ASC reports results of log-fold-changes of the genes.

2.3 Simulation Studies

We did extensive simulation studies to see if CEDER can significantly increase the accuracy of detecting DEGs compared to the original method DESeq, which CEDER is based on. The simulations were carried out as follows. For a given gene g, we first generated its base expression level (mean value) λg drawn from an exponential distribution with rate 1/250. We then generated its log-fold-change δg between two classes A and B drawn from a normal distribution with mean 0 and standard deviation σ. Its expression level λgj in sample j was defined as 2δg/2λg if sample j was from class A, or 2δg/2λg if sample j was from class B. We next generated the read-count Ngj of gene g in sample j with a negative binomial distribution as

Ngj~NB(mu=αjλgj,size=1/0.2),

where αj is the size factor which is related to sequencing depth for sample j. This simulation strategy is the same as in [14]. We simulated the read count data of 5,000 genes for each sample. Genes with δg < 0.2 were defined as non-differentially expressed and genes with δg no smaller than a given Threshold for the LOg-Fold-change (TLOF) were defined as differentially expressed.

To generate the exon read count data, we allocated the Ngj reads of gene g in sample j to one of the gene’s exons with probabilities p1g,p2g,,pkg, where k is the number of exons in gene g, and i=1kpig=1. Due to various biases present in RNA-Seq, the reads are generally not uniformly distributed in different exons. One major bias in RNA-Seq comes from the RNA degradation [27], resulting in that exons near the 3′ end of the gene generally have significant more reads mapped than the other ones (see Table 5 and [28]). To capture this major effect by RNA degradation, we generated pig by first sampling P1g,,Pkg from an exponential distribution with rate 1, and then letting pig=Pig/(j=1kPjg). By this way, our simulation is more likely to allocate the majority (> 75%) of the reads to only 1 or 2 exons of the gene when k ≥ 5.

TABLE 5.

Read counts in gene LHX6 (NM 199160) and its exons from the 3′ end to the 5′ end.

LHX6 (NM 199160) Dataset I Dataset II
Exon Level # hbr # uhr p-value by DESeq # hbr # uhr p-value by DESeq
1 (3′) 436 292 0.343 336 209 0.345
2 44 47 0.507 22 28 0.608
3 15 3 0.157 6 1 0.227
4 22 0 0.019 12 0 0.037
5 35 2 0.044 24 1 0.052
6 14 1 0.107 13 0 0.033
7 36 5 0.079 24 2 0.070
8 8 0 0.061 5 0 0.084
9 (5′) 13 2 0.148 8 1 0.177
Total (Gene Level) 623 352 0.304 450 242 0.312

We carried out two types of simulations. Firstly, as a direct comparison between CEDER and DESeq, we did a relatively simple simulation using the same parameters as in [14]. Two samples from class A (A1 and A2) and three samples from class B (B1, B2 and B3) were simulated with σ = 2. The size factors, αj, for the five simulated samples are: 1 (A1), 1.3 (A2), 0.7 (B1), 0.9 (B2) and 1.6 (B3). We generated two sets of independent gene/exon read count data: (1) one set with all the genes having k = 5 exons, and (2) the other set with all the genes having k = 10 exons. In this simple simulation, we let TLOF = 3.

Secondly, we did much more thorough extensive simulations to study the effects of the following factors on the performance of CEDER: a) the standard deviation of the fold changes, σ = 1.5, 2.0, 2.5; b) the threshold for the log-fold-changes for selecting the differentially expressed genes, TLOF = 1.5, 2.0, 2.5, 3.0; c) the ratio of the size factors between the two classes of samples λ = αB/αA = 1.1, 1.2, 1.6, 1.8 (we fixed αA = 1); and the number of exons in a gene k = 5, 10. To see the potential variation of performance for CEDER among different samples, we generated 10 samples from class A and 10 samples from class B for each parameter combinations (σ, λ, k). Each A sample was compared to a B sample resulting in a total of 100 comparisons. We present the mean and standard error of the resulting AUC scores for the 100 comparisons.

2.4 The MAQC RNA-Seq data processing

We used two RNA-Seq datasets related to MAQC samples [29] to compare CEDER with other methods for detecting DEGs. One is the human brain reference (hbr) sample and the other is the universal human reference (uhr) sample. Two different laboratories, Dr. Dudoit from University of California at Berkeley and Dr. Wu from Genentech, sequenced the samples using the Illumina GA2 platform.

Dataset I from Dr. Dudoit’s group contains RNA-Seq reads of length 35bps from the two samples with just one biological replicate. Each sample was prepared with an individual library and was sequenced with seven lanes at two flow cells. The details of the data is given in [25]. Note that biological differences are confounded with the differences introduced from the library preparation methods as noted in [25]. This dataset was downloaded from NCBI Sequence Read Archive (SRA) with ID SRX016359 (MAQC Brain exp 2 using phi X control lane) and SRX016367 (MAQC UHR exp 2 using phi X control lane).

Dataset II from Dr. Wu’s group contains RNA-Seq reads of length 50bps from the same samples also with just one replicate. Each sample was sequenced in seven lanes at one flow cell. The details of the dataset can be found in [30]. This dataset was downloaded from NCBI Gene Expression Omnibus (GEO) with ID GSE24284 (GSM597210 for hbr and GSM597211 for uhr). The reads from seven lanes of each sample were merged together. Although each sample was sequenced in 7 lanes, the data from each lane cannot be considered as replicates, see the DESeq manual.

We used Bowtie (version 0.12.5) [31] as a mapping tool and mapped all reads to the human genome (hg19). For the two RNA-Seq datasets on hbr and uhr samples, we allowed 2 mismatches for the 35bp reads and 3 mismatches for the 50bp reads. We only kept the unique mapped reads and did not consider the junction reads as in [28]. The RefSeq annotation (downloaded from UCSC Genome Browser on July 1, 2011) was used as the gene annotation.

2.5 Processing of the qRT-PCR data

For the hbr and uhr samples, the expression levels of 997 genes were quantitatively measured with TaqMan Gene Expression Assay by the MicroArray Quality Control (MAQC) project [32]. We used the qRT-PCR results as gold standard to evaluate the statistical methods for detecting DEGs based on RNA-Seq data. The qRT-PCR data were downloaded from NCBI GEO with series ID GSE5350. The four replicates of uhr sample were downloaded from GSM129638 to GSM129641, and the four replicates of brain sample were downloaded from GSM129642 to GSM129645.

We only kept the genes with the unique matched RefSeq ID (due to the annotation differences between RefSeq and MAQC) and filtered out genes with zero read counts in all samples of RNA-Seq data, resulting in a total of 946 genes. Following Bullard et al. [25], we defined the expression level of gene j at the i-th replicate as Yi,j ≡ ΔCi,j × log 2, where ΔCi,j =Ci,POLR2ACi,j, with C being the raw value of cycle number and POLR2A being the house-keeping gene. The log-fold-change was defined as Ȳuhr,jȲhbr,j, the difference of the average values across the 4 replications. Genes with absolute value of log-fold-change > 2 are considered as differentially expressed (positive set), and genes with absolute value of log-fold-change < 0.2 are considered as non-differentially expressed (negative set). Among the 946 genes we kept, a total of 304 genes are in the positive set of differentially expressed gene and a total of 141 genes are in the negative set of non-differentially expressed genes.

2.6 Evaluation of different methods for detecting DEGs

We used the receiver operating characteristic (ROC) curve to evaluate the different methods for detecting DEGs. For a given method of detecting DEGs, we rank the genes in descending order according to the scores of the genes with high scores corresponding to high probability of being differentially expressed. For a given threshold value, genes with score above the threshold are predicted as differentially expressed and genes with score below the threshold are predicted as non-differentially expressed. By comparing the prediction results with the gold standard partition of the 445 genes based on qRT-PCR, we have Table 1. The true positives (TP) are the genes that are validated as differentially expressed by qRT-PCR and predicted to be differentially expressed by the score based on the RNA-Seq data. The false negatives (FN), false positives (FP), and true negatives (TN) are similarly defined as in standard classification problems and are given in Table 1. The false positive rate (FPR) and the true positive rate (TPR) are defined as

TABLE 1.

Definition of true and false positives by comparing the predicted differentially expressed genes with the gold standard from qRT-PCR.

qRT-PCR RNA-Seq Prediction

DEG non-DEG Total
DEG TP FN TP + FN
non-DEG FP TN FP + TN
Total TP + FP FN+TN T
FPR=FPFP+TN,TPR=TPTP+FN. (4)

The receiver operating characteristic (ROC) curve depicts the relationship between TPR and FPR. We used the area under the ROC curve (AUC) to evaluate the different methods for detecting DEGs. We calculated two kinds of AUC: (1) AUC1 is the area under the ROC curve in the full range of FPR 0 ≤ FPR ≤ 1; (2) AUC2 is the area under the ROC curve in the range of 0 ≤ FPR ≤ 0.05.

2.7 Availability

The CEDER program and the supplementary materials are available from http://www-rcf.usc.edu/%7Efsun/programs.html.

3 Results

We present the results first for simulation studies, then the MAQC benchmark dataset, and finally an example of a novel differentially expressed gene identified using CEDER.

3.1 Results from the simulation studies

We first studied the performance of CEDER using the simulated RNA-Seq data. Since edgeR is only applicable to situations at least one sample has replicates [13] and the p-values output by edgeR are biased toward 0 when the data have no replicates (see Fig. 3), we implemented CEDER with DESeq [14] as a basic tool for detecting differentially expressed exons.

The programs DESeq and CEDER were used to detect DEGs for the six comparisons (A1-B1, A1-B2, A1-B3, A2-B1, A2-B2 and A2-B3) based on the first set of relatively simple simulations. Among the 5,000 simulated genes, a total of 661 (k = 5 exons)/656 (k = 10 exons) genes are in the positive set of differentially expressed genes and a total of 393 (k = 5 exons)/394 (k = 10 exons) genes are in the negative set of non-differentially expressed genes.

By setting different thresholds for the p-values from DESeq and different scores from CEDER, we calculated the true positive rate (TPR) and the false positive rate (TPR) and plotted the ROC curves for DESeq and CEDER (Fig. 1 for the case k = 5 and Fig. 2 for the case k = 10). Note that in the simulated scenarios, the AUC1 scores of DESeq are low, at most 0.7 except for the pair of A1-B2. In these situations, the CEDER with the minimum p-value method has the highest AUC1 score and the increase of AUC1 score from DESeq to CEDER can be substantial. For example, for the pair of A2-B3 when k = 5, the AUC1 score for DESeq is only 0.69, while the AUC1 score for CEDER with the minimum p-value method is 0.83, a 20% increase. When restricted to the region with the low false positive rate (FPR ≤ 0.05), the AUC2 score for CEDER with the Fisher’s method can be the highest among all the methods studied except for the pair of A2-B1 when k = 5, 10 and the pair of A1-B2 when k = 10.

Fig. 1.

Fig. 1

ROC curves for DESeq and CEDER on simulated data for genes with 5 exons.

Fig. 2.

Fig. 2

ROC curves for DESeq and CEDER on simulated data for genes with 10 exons.

We next extensively studied the factors affecting the performance of CEDER by changing the parameters (σ, TLOF, λ, k). The complete results for all the parameters are given in the supplementary materials. The results for k = 5 are similar to the results for k = 10. Thus, we just present the results for k = 5 in the main text. Table 2 gives the average AUC scores together with their standard errors for DESeq and CEDER estimated from the 100 comparisons for various values of σ = 1.5, 2, 2.5 and TLOF = 1.5, 2, 2.5, 3 when the size factor ratio λ = 1.1. It is interesting to note that the AUC (both AUC1 and AUC2) score for DESeq decreases rapidly with σ. When σ ≥ 2.5, the p-value outputs for the genes by DESeq are even worse than random guessing and the AUC1 score is even lower than 0.5. An explanation for this observation is that DESeq assumes that most genes are non-differentially expressed when it fits the negative binomial distribution to the read count data. When σ is high, most genes are differentially expressed violating the underlying assumptions of DESeq. We first look at the AUC1 scores (Table 2). When σ = 2.0, the AUC1 score for DESeq is around 0.75 which is low. On the other hand, the CEDER with the Fisher’s method can significantly increase the AUC1 score to about 0.79 when TLOF = 1.5 and the increase of AUC1 score is most significant to 0.87 when TLOF = 3.0. The CEDER with the Stouffer’s and the minimum p-value methods both increase the AUC1 score of DESeq, but not as large as the Fisher’s method. When σ = 1.5, DESeq has decent performance with AUC1 scores all above 0.83. Even in this situation, the AUC1 score for the CEDER with the Fisher’s method is still much higher than that for DESeq. When looking at the AUC2 score (Table 2), for most of the cases, the Fisher’s method increases the AUC2 score of DESeq the largest when the AUC2 score of DESeq is relative high (e.g. > 0.01); the Stouffer’s Z-score method increases the AUC2 of DESeq the largest when the AUC2 score of DESeq is in the range between 0.005 and 0.01.

TABLE 2.

The average AUC score and its standard error (Std) of DESeq and CEDER with Fisher’s, Stouffer’s and the minimum p-value methods for different σ (the standard deviation of the log-fold-changes) and different thresholds of log-fold change (TLOF) for declaring significant differential expression.

AUC1 (0 ≤ FPR ≤ 1)

CEDER

DESeq Fisher Stouffer Mininum

σ TLOF Mean Std Mean Std Mean Std Mean Std

1.5 1.5 0.83 0.01 0.85 0.01 0.80 0.01 0.83 0.01
2.0 0.87 0.01 0.91 0.01 0.85 0.01 0.88 0.01
2.5 0.90 0.01 0.94 0.01 0.88 0.01 0.93 0.01
3.0 0.91 0.01 0.96 0.01 0.91 0.01 0.95 0.01

2.0 1.5 0.73 0.01 0.79 0.01 0.77 0.01 0.73 0.02
2.0 0.75 0.01 0.83 0.01 0.80 0.01 0.78 0.02
2.5 0.75 0.02 0.85 0.02 0.82 0.01 0.81 0.02
3.0 0.73 0.02 0.87 0.02 0.83 0.02 0.84 0.02

2.5 1.5 0.49 0.02 0.43 0.02 0.48 0.02 0.46 0.02
2.0 0.48 0.02 0.42 0.02 0.48 0.02 0.46 0.02
2.5 0.46 0.02 0.41 0.02 0.46 0.02 0.46 0.02
3.0 0.45 0.03 0.40 0.02 0.45 0.02 0.45 0.02

AUC2 (0 ≤ FPR ≤ 0.05)

CEDER

DESeq Fisher Stouffer Mininum

σ TLOF Mean Std Mean Std Mean Std Mean Std

1.5 1.5 0.017 0.002 0.019 0.002 0.018 0.002 0.013 0.002
2.0 0.021 0.003 0.025 0.003 0.023 0.002 0.017 0.002
2.5 0.027 0.003 0.031 0.003 0.028 0.003 0.023 0.003
3.0 0.031 0.003 0.036 0.003 0.032 0.003 0.028 0.003

2.0 1.5 0.005 0.002 0.007 0.002 0.009 0.002 0.002 0.001
2.0 0.007 0.002 0.009 0.002 0.011 0.002 0.003 0.001
2.5 0.008 0.002 0.012 0.003 0.013 0.003 0.004 0.002
3.0 0.010 0.003 0.014 0.003 0.015 0.003 0.005 0.002

2.5 1.5 0.001 0.001 0 0 0 0 0 0
2.0 0.001 0.001 0 0 0 0 0 0
2.5 0.002 0.001 0 0 0 0 0 0
3.0 0.002 0.001 0 0 0 0 0 0

The size factor ratio λ = 1.1, k = 5 exons per gene, and each sample contains 5,000 genes.

We also studied the effect of size factor ratio λ on the AUC scores of DESeq and CEDER. Table 3 gives the mean AUC scores and their standard errors of DESeq and CEDER for different values of λ when σ = 2, TLOF = 3, and k = 5. It can be seen that the AUC1 score for DESeq decreases with λ indicating that the program does not take care of the size factor ratio λ well. On the contrary, the AUC2 score for DESeq increases with λ when σ ≥ 2. In both situations, the CEDER can significantly increase the AUC1 and AUC2 scores.

TABLE 3.

The average AUC score and its standard error (Std) of DESeq and CEDER with Fisher’s, Stouffer’s and the minimum p-value methods for different size factor ratio λ = αB/αA.

AUC1 (0 ≤ FPR ≤ 1)

CEDER

DESeq Fisher Stouffer Mininum

λ Mean Sth Mean Sth Mean Sth Mean Sth

1.1 0.73 0.02 0.87 0.02 0.83 0.02 0.84 0.02
1.2 0.71 0.02 0.83 0.02 0.79 0.02 0.84 0.01
1.6 0.64 0.01 0.68 0.02 0.67 0.01 0.75 0.01
1.8 0.62 0.01 0.64 0.01 0.64 0.01 0.70 0.01

AUC2 (0 ≤ FPR ≤ 0.05)

CEDER

DESeq Fisher Stouffer Mininum

λ Mean Std Mean Std Mean Std Mean Std

1.1 0.010 0.003 0.014 0.003 0.015 0.003 0.005 0.002
1.2 0.013 0.002 0.016 0.002 0.014 0.002 0.010 0.003
1.6 0.015 0.002 0.018 0.002 0.016 0.002 0.016 0.002
1.8 0.016 0.002 0.019 0.001 0.017 0.002 0.018 0.002

The other parameters are σ = 2, TLOF = 3, k = 5 exons per gene, and each sample contains 5,000 genes.

When looking at the AUC1 score, it is important to note that for situations that the AUC1 score of DESeq is below 0.70, the performance of CEDER with the minimum p-value method is better than that of the Fisher’s method. For example, when the size factor ratio λ is 1.6, the AUC1 score for DESeq is 0.64 and the AUC1 score of CEDER with the minimum p-value method is 0.75, an increase of 0.11. On the other hand, the AUC1 scores of CEDER with the Fisher’s and Stouffer’s methods are 0.68 and 0.67, respectively. When looking at the AUC2 score, it is also important to note that for situations that the AUC2 score of DESeq is ≤ 0.01, the performance of the Stouffer’s method is better than that of the Fisher’s method. Beside the above two cases, the performance of CEDER with the Fisher’s method is always better than the Stouffer’s and minimum p-value methods. By checking the complete data in the supplementary material, we can see that this observation is generally true, although there are a few exceptions.

Finally, although involving more computation time than the original DESeq, CEDER consumes reasonable computation time. For example, it takes less than 10 minutes for CEDER to calculate 5,000 genes with 10 exons on the PC with a 3.33 GHz CPU.

3.2 Detecting DEGs using the MAQC benchmark datasets

We evaluated the performance of CEDER using the benchmark RNA-Seq data of the human brain reference (hbr) and universal human reference (uhr) samples. Previously measured with qRT-PCR by the MAQC project, the expression levels of 997 genes in these two samples have been widely used as a standard [32]. We followed Bullard et al. [25] to preprocess the qRT-PCR data, and selected 304 genes as a positive set of differentially expressed genes and 141 genes as a negative set of non-differentially expressed genes from the 946 genes (see Section 2.5 for details). It is worth to notice that the 304 + 141 = 445 testing genes we selected have only a single isoform according to the RefSeq annotation, and thus we can exclude differential expression in the form of differentially expressed isoforms.

We used two independent RNA-Seq datasets, dataset I from [25] and dataset II from [30], of the hbr and uhr samples to test CEDER separately. Neither of the two datasets had biological replicates for the samples. We mapped the reads of each sample in each dataset to the human genome (hg19) with Bowtie [31] and only kept the uniquely mapped reads (see Section 2.4 for details). We did not consider the junction reads to avoid dependency of read counts for adjacent exons.

We first compared two widely used programs, edgeR [13] and DESeq [14], to detect DEGs between hbr and uhr using datasets I and II, respectively. The input to the two programs is the read counts of all the genes annotated in RefSeq. We found that the p-values given by edgeR are usually small, and the histogram of p-values for the 141 non-DEGs is highly skewed to 0 (Fig. 3), which is not reasonable. In contrast, the histogram of the p-values of the non-DEGs by DESeq is more reasonable with moderate shift to 1 (Fig. 3). Robinson et al. [13] also pointed out in their original paper that edgeR is only applicable to situations that at least one sample has replicates. We thus used DESeq to detect differently expressed exons for CEDER when there are no replicates.

Based on the above observations, we used DESeq to detect differentially expressed exons in CEDER. DESeq outputs a p-value for each exon indicating the statistical significance of differential expression between the hbr and the uhr samples for datasets I and II separately. For a given gene, we then combined the p-values of its exons using Fisher’s, Stouffer’s (unweighting version), and the minimum p-value methods, respectively, to give scores for each gene. We then set different thresholds of the scores, calculated true positive rate and false positive rate, and plotted the ROC curves for the Fisher’s method, Stouffer’s Z-score method and the minimum p-value method (Fig. 4). It is clear that the three versions of CEDER outperform DESeq in both datasets except for the AUC2 score of the minimum p-value method. We first look at the AUC1 score. In dataset I, the AUC1 score of 0.81 for the original DESeq was increased to 0.89, 0.9, and 0.82, for Fisher’s, the minimum, and Stouffer’s p-value methods, respectively; in dataset II, the AUC1 score of 0.81 for the original DESeq was increased to 0.92, 0.91, and 0.86, for Fisher’s, Stouffer’s, and the minimum p-value methods, respectively. In both datasets, the AUC1 score was increased by at least 10% for the Fisher’s and the minimum p-value methods. When looking at the AUC2 score, in dataset I, the AUC2 score of 0.025 for the original DESeq was increased to 0.028 and 0.027 for the Fisher’s and Stouffer’s Z-score methods, respectively; in dataset II, the AUC2 score of 0.025 for the original DESeq was increased to 0.029 for both Fisher’s and Stouffer’s methods; the minimum p-value method did not increase the AUC2 score of DESeq for both datasets. Note that the calculated AUC2 scores are not accurate in the MAQC datasets because the samples in the neighbor region of FPR = 0.05 are sparse. We also tried the weighted version of Stouffer’s Z-score method, but the AUC score of the weighted version is lower than that of the unweighting version (data not shown).

Fig. 4.

Fig. 4

ROC curves of different DEG detection methods on the MAQC datasets.

To further compare DESeq and CEDER, we ranked the genes by their scores from DESeq and CEDER, checked the top ranked genes, and see how many of them are true positives (defined as log-fold change (LOF) >2 by qRT-PCR) and how many of them are false positives (defined as LOF <0.2 by qRT-PCR). The results are given in Table 4. When top 150 ranked genes are selected, the number of true positives selected by DESeq (141) is slightly higher than that selected by CEDER-Fisher (137), and the number false positives selected by both methods are small (2) for Dataset I. On the other hand, when top 300 ranked genes are selected, the number of true positives selected by CEDER-Fisher (224) is about 7% higher than that selected by DESeq (209), and the number of false positives selected by CEDER-Fisher (6) is slightly smaller than that by DESeq (8). The same conclusions hold for dataset II. Thus, the advantage of CEDER-Fisher compared to DESeq may lie in the range between 20% and 30% of the selected genes.

TABLE 4.

The number of genes with large log-fold change (LOF>2) and the number of genes with small log-fold change (LOF <0.2) in the score ranked genes selected by different methods.

Dataset I

CEDER

DESeq Fisher Stouffer Mininum

#Top Rank #LOF>2 #LOF<0.2 #LOF>2 #LOF<0.2 #LOF>2 #LOF<0.2 #LOF>2 #LOF<0.2

50 50 0 48 0 48 0 48 0
100 98 1 97 1 92 1 95 1
150 141 2 137 2 130 2 136 2
200 177 4 175 3 166 2 174 4
250 198 7 207 3 194 3 206 6
300 209 8 224 6 206 7 239 9

Dataset II

CEDER

DESeq Fisher Stouffer Mininum

#Top Rank #LOF >2 #LOF <0.2 #LOF >2 #LOF <0.2 #LOF >2 #LOF <0.2 #LOF >2 #LOF <0.2

50 50 0 47 1 48 0 47 1
100 98 1 97 1 93 1 96 1
150 142 2 138 2 134 2 141 2
200 178 4 179 3 171 2 178 3
250 199 5 210 3 201 3 211 5
300 212 7 230 5 219 5 241 11

The LOF of the gene was calculated based on the qRT-PCR data (see Section 2.5 for details).

Since GPSeq [15] requires positional read count data, we did not implement GPSeq in this study but resorted to compare the results from CEDER with GPSeq’s published Fig. 3 in [15] for dataset I. By comparing Fig. 4 (left plot for Dataset I) in this paper with Fig. 3 in [15], it can be seen that GPSeq slightly outperforms CEDER with Fisher’s or the minimum p-value methods. However, the numbers of genes used by GPSeq for its ROC curve (218 for the positive set and 74 for the negative set) are much smaller than those we include here (304 for the positive set and 141 for the negative set), although both studies used the same criteria to select test genes based on the qRT-PCR data. The reduction in the number of genes in GPSeq is due to the fact that the GPSeq program cannot converge for about 1/3 or more of the genes and thus is not applicable to those genes.

We also applied ASC using its default setting to analyze datasets I and II with the same input of gene count data. Different from other methods that report statistical significance of differential expression for each gene, ASC reports log-fold-changes of the gens between the hbr and the uhr samples. By setting different thresholds of the log-fold-change, we can also obtained the ROC curve for ASC. As a result, the AUC score for ASC is higher than the AUC scores with CEDER, DESeq, edgeR and GPSeq (data not shown). The better performance of ASC compared to other methods may due to the fact that the genes defined as the gold standard positives and negatives were selected using fold-changes in qRT-PCR data, which makes the test geneset in favor of the ASC method. The DEGs detected by ASC tend to have high fold changes. The ASC is more likely to miss genes with moderate or low fold changes but are in fact differentially expressed. The methods based on statistics such as CEDER, DESeq, edgeR and GPSeq are not directly comparable to the methods based on fold changes such as ASC.

3.3 CEDER identifies a novel gene LHX6 as differentially expressed between the hbr and uhr samples

The gene LHX6 (NM 199160) functions as a transcriptional regulator and is involved in the control of differentiation and development of neural and lymphoid cells. It has 9 exons with a total length of 3,330bps based on the RefSeq annotation. The qRT-PCR assay by MAQC project [32] showed that LHX6 is differentially expressed between the hbr and uhr samples with a large log-fold-change of 3.58 with a rank of 191th highest out of the 946 genes.

We applied DESeq to the read count data for the genes using datasets I and II and found that the p-value of LHX6 by DESeq is not significant (p-value= 0.303, score 1.16, rank 323/946 for dataset I, and p-value= 0.312, score 1.19, rank 315/946 for dataset II). Meanwhile, when applying ASC to the gene level read count data, the calculated log-fold-change of LHX6 by ASC is 0.654 (rank 654/946) for dataset I and 0.628 (rank 669/946) for dataset II, respectively. We studied the read counts mapped to the exons of LHX6 for the two RNA-Seq datasets more carefully. It was found that the read counts mapped to the exons are very different and the read counts in the hbr sample are higher than that in the uhr sample for 8 out of the 9 exons in both data sets. Thus, we looked at the CEDER results more carefully.

We noted that most of the mapped reads in gene LHX6 are located in the first two exons closest to the 3′ end (see Table 5, the first two exons are indexed by “1” and “2”, respectively). The first two exons consist of 77% of all the reads for the hbr sample and 96% of all the reads for the uhr sample in dataset I. In dataset II, the first two exons consist of 80% of all the reads for the hbr sample and 98% of the reads for the uhr sample. Even though the numbers of reads in the remaining exons are small, they are highly different between the two samples across all the remaining exons.

The program CEDER was then used to analyze the data. The p-values of the exons by DESeq are listed in Table 4 and they range from 0.019 to 0.608. None of the exons are significantly differentiated after correcting for multiple tests. We then applied the three integration methods to combine the p-values of the exons in LHX6. The Fisher’s method gives a score of 5.91 based on dataset I with rank 176/946 and a score of 6.27 with rank 169/946 based on dataset II. The Stouffer’s Z-score method gives a score of 7.65 with rank 161/946 based on dataset I and a score of 8.26 with rank 159 based on dataset II. The minimum p-value method gives a score of 1.33 with rank 204/946 based on dataset I and a score of 1.84 with rank 165/946 based on dataset II. Thus, the score rank of LHX6 by CEDER is much more closer to the rank of log-fold-change by qRT-PCR (191/946) than by DESeq and ASC. Meanwhile, previous studies showed that 30% or more genes between different biological samples are differentially expressed. Thus, when selecting 20–30% genes as DEGs for the genomic studies, LHX6 will be selected by CEDER but are likely to be missed by DEGSeq and ASC.

4 CONCLUSIONS and DISCUSSION

In this paper, we developed a novel method, CEDER, to rank genes for differential expression by first testing for differentially expressed exons and then integrating the significance of differential expression of the exons within a gene. Three methods for integrating the significance of the exons were considered: Fisher’s, Stouffer’s, and the minimum p-value methods. Through extensive simulations and two MAQC benchmark datasets, we showed that CEDER significantly increases the accuracy of detecting differentially expressed genes. In general when looking at the AUC1 score, the CEDER with Fisher’s method works the best when the AUC1 score for DESeq is above 0.7 and the CEDER with the minimum p-value method works the best when the AUC1 score of DESeq is between 0.5 and 0.7. There are some situations that the AUC1 of DESeq is smaller than 0.5 and the CEDER does not work in these situations either. When looking the AUC2 score, the CEDER with Fisher’s method works the best when the AUC2 score for DESeq is above 0.01 and the CEDER with Stouffer’s method works the best when the AUC2 score of DESeq is between 0.005 and 0.01 except for a few situations. Thus, different methods have their own optimal parameter ranges. A possible explanation of this is that the minimum p-value method utilizes the information of one exon with the minimum p-value, while Fisher’s method combines the information of the p-values of all exons within a gene. Although theoretically the power of Stouffer’s method is nearly identical to the power of Fisher’s method, their different performances may lie in the fact that the p-values of the exons by DESeq under the null model are not uniform. The CEDER with the Fisher’s method is the best recommendation when we have no prior knowledge about the data.

In this study, we used DESeq as a basic tool for detecting differentially expressed exons. However, this is not a restriction and actually, any proper methods for DEG detection based on RNA-Seq data can be used. The CEDER is also not restricted to testing for differential expression at the exons of each gene. For example, in the situation that a gene has very few exons, or only one exon, we can implement our strategy by first segmenting the gene into several non-overlap fragments and augmenting the test data by testing differentially expressed fragments of each gene. However, the length of the segments need to be carefully chosen to achieve high detecting accuracy. If the divided segments are short, there is no power to detect differential expression for the segments resulting in low detection accuracy for the differentially expressed genes. If the segments are too wide, the read distribution in the segments maybe heterogeneous resulting in low detection accuracy. This is a topic for further study.

Although the three p-value integration methods we studied are non-parametric and make no assumptions about the underlying distribution of the data, they rely on the independency of the p-values. For given samples, the reads mapped to the exons can be assumed independent. However, the p-values of the exons are dependent when the samples are random. Thus, the scores we defined in this study do not have the same meaning as the concept of p-value in statistics. These scores are only used to rank the genes for differential expression and do not provide statistical significance for differential expression. As an alternative, more advanced methods as in Kechris et al. [21] can be used to combine the p-values of the exons when the dependence of the p-values are known.

In this study, we only considered the gene as a single transcript. However, in most eukaryotes, a large percentage of genes have multiple isoforms and RNA-Seq is showing great promise in the study of alternative splicing [6], [7]. The CEDER can also be applied to genes with multiple isoforms. Since the detection of differentially expressed isoforms is more complicated [28] and we currently lack benchmark RNA-Seq datasets with experimental validated isoform expression levels, we do not address this topic here and leave this topic for future studies.

Acknowledgments

This research was supported by NIH grants No. P50 HG 002790 and 1 U01 HL108634. FS is also supported by National Natural Science Foundation of China (60928007 and 60805010) and Tsinghua National Laboratory for Information Science and Technology (TNLIST) Cross-discipline Foundation.

Biographies

graphic file with name nihms407820b1.gif

Lin Wan received the BS degree in Physics from Nanjing University in 2003, and the PhD degree in Probability and Mathematical Statistics from Peking University in 2009. He is a member of the National Center for Mathematics and Interdisciplinary Sciences, and the Key Laboratory of Systems and Control, Institute of Systems Science, Academy of Mathematics and Systems Science, Chinese Academy of Sciences. He is also a postdoctoral research associate at the Molecular and Computational Biology Program, University of Southern California. His current researches include Computational Biology and Systems Biology.

graphic file with name nihms407820b2.gif

Fengzhu Sun received the BS degree in Mathematics from Shandong University, the MS degree in Probability and Statistics from Peking University, and the PhD degree in Applied Mathematics from University of Southern California. He is a Professor of Molecular and Computational Biology, University of Southern California. He is also a faculty member of the Bioinformatics and Systems Biology Chair Professor Team, Tsinghua University. Dr Sun works in the area of Computational Biology and Bioinformatics, Statistical Genetics, and Mathematical Modeling. His recent research interests include protein interaction networks, gene expression, single nucleotide polymorphisms (SNP), linkage disequilibrium (LD) and their applications in predicting protein functions, gene regulation networks, and disease gene identification. He is also interested in metagenomics, in particular, marine genomics.

Contributor Information

Lin Wan, The Molecular and Computational Biology Program, University of Southern California, Los Angeles, CA 90089, USA. He is also with the National Center for Mathematics and Interdisciplinary Sciences, and the Key Laboratory of Systems and Control, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, P.R. China.

Fengzhu Sun, Email: fsun@usc.edu, The Molecular and Computational Biology Program, University of Southern California, Los Angeles, CA 90089, USA. He is also with the TNLIST/Department of Automation, Tsinghua University, Beijing 100084, P.R. China.

References

  • 1.Speed T. Statistical Analysis of Gene Expression Microarray Data. Chapman and Hall/CRC; 2003. [Google Scholar]
  • 2.Wang Z, Gerstein M, Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet. 2009;10(1):57–63. doi: 10.1038/nrg2484. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Haas BJ, Zody MC. Advancing RNA-Seq analysis. Nat Biotechnol. 2010;28(5):421–423. doi: 10.1038/nbt0510-421. [DOI] [PubMed] [Google Scholar]
  • 4.Marioni JC, Mason CE, Mane SM, Stephens M, Gilad Y. RNA-Seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res. 2008;18(9):1509–1517. doi: 10.1101/gr.079558.108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods. 2008;5(7):621–628. doi: 10.1038/nmeth.1226. [DOI] [PubMed] [Google Scholar]
  • 6.Wang ET, Sandberg R, Luo S, Khrebtukova I, Zhang L, Mayr C, Kingsmore SF, Schroth GP, Burge CB. Alternative isoform regulation in human tissue transcriptomes. Nature. 2008;456(7221):470–476. doi: 10.1038/nature07509. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Hawkins RD, Hon GC, Ren B. Next-generation genomics: an integrative approach. Nat Rev Genet. 2010;11(7):476–486. doi: 10.1038/nrg2795. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Pachter L. Models for transcript quantification from RNA-Seq. Arxiv. 2011 p. http://arxiv.org/abs/1104.3889. [Online]. Available: http://arxiv.org/abs/1104.3889.
  • 9.Oshlack A, Robinson MD, Young MD. From RNA-Seq reads to differential expression results. Genome Biol. 2010;11(12):220. doi: 10.1186/gb-2010-11-12-220. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Jiang H, Wong WH. Statistical inferences for isoform expression in RNA-Seq. Bioinformatics. 2009;25(8):1026–1032. doi: 10.1093/bioinformatics/btp113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Wang L, Feng Z, Wang X, Wang X, Zhang X. DEGseq: an R package for identifying differentially expressed genes from RNA-Seq data. Bioinformatics. 2010 Jan;26(1):136–138. doi: 10.1093/bioinformatics/btp612. [DOI] [PubMed] [Google Scholar]
  • 12.Robinson MD, Smyth GK. Small-sample estimation of negative binomial dispersion, with applications to sage data. Biostatistics. 2008;9(2):321–332. doi: 10.1093/biostatistics/kxm030. [DOI] [PubMed] [Google Scholar]
  • 13.Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010;26(1):139–140. doi: 10.1093/bioinformatics/btp616. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Anders S, Huber W. Differential expression analysis for sequence count data. Genome Biol. 2010;11(10):R106. doi: 10.1186/gb-2010-11-10-r106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Srivastava S, Chen L. A two-parameter generalized Poisson model to improve the analysis of RNA-Seq data. Nucleic Acids Res. 2010;38(17):e170. doi: 10.1093/nar/gkq670. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Auer PL, Doerge RW. Statistical design and analysis of RNA sequencing data. Genetics. 2010;185(2):405–416. doi: 10.1534/genetics.110.114983. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Wu Z, Jenkins BD, Rynearson TA, Dyhrman ST, Saito MA, Mercier M, Whitney LP. Empirical Bayes analysis of sequencing-based transcriptional profiling without replicates. BMC Bioinformatics. 2010;11(1):564. doi: 10.1186/1471-2105-11-564. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Hedges LV, Olkin I. Statistical Methods for Meta-Analysis. Academic Press; 1985. [Google Scholar]
  • 19.Chapman J, Whittaker J. Analysis of multiple SNPs in a candidate gene or region. Genetic Epidemiology. 2008 doi: 10.1002/gepi.20330. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Hess A, Iyer H. Fisher’s combined p-value for detecting differentially expressed genes using Affymetrix expression arrays. BMC Genomics. 2007;8:96. doi: 10.1186/1471-2164-8-96. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Kechris KJ, Biehs B, Kornberg TB. Generalizing moving averages for tiling arrays using combined p-value statistics. Stat Appl Genet Mol Biol. 2010;9(1):Article29. doi: 10.2202/1544-6115.1434. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Fisher RA. Statistical Methods for Research Workers. Oliver & Boyd; 1970. [Google Scholar]
  • 23.Liptak T. On the combination of independent tests. Magyar Tud Akad Mat Kutato Int Kozl. 1958;3:171–197. [Google Scholar]
  • 24.Tippett LHC. The Methods of Statistics. Williams and Norgate; 1931. [Google Scholar]
  • 25.Bullard JH, Purdom E, Hansen KD, Dudoit S. Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments. BMC Bioinformatics. 2010;11:94. doi: 10.1186/1471-2105-11-94. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Krishnaiah PR, Sen PK, editors. Handbook of Statistics 4: Nonparametric Methods. Elsevier; 1984. pp. 113–121. ch. Combination of Independent Tests. [Google Scholar]
  • 27.Pepke S, Wold B, Mortazavi A. Computation for ChIP-seq and RNA-Seq studies. Nat Methods. 2009;6(11 Suppl):S22–S32. doi: 10.1038/nmeth.1371. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Wu Z, Wang X, Zhang X. Using non-uniform read distribution models to improve isoform expression inference in ChIP-seq. Bioinformatics. 2011;27(4):502–508. doi: 10.1093/bioinformatics/btq696. [DOI] [PubMed] [Google Scholar]
  • 29.Shi L, Reid L, Jones W, Shippy R, Warrington J, Baker S, Collins P, De Longueville F, Kawasaki E, Lee K, et al. The microarray quality control (MAQC) project shows inter-and intraplatform reproducibility of gene expression measurements. Nat Biotechnol. 2006;24(9):1151–1161. doi: 10.1038/nbt1239. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Nacu S, Yuan W, Kan Z, Bhatt D, Rivers CS, Stinson J, Peters BA, Modrusan Z, Jung K, Seshagiri S, Wu TD. Deep RNA sequencing analysis of readthrough gene fusions in human prostate adenocarcinoma and reference samples. BMC Med Genomics. 2011;4:11. doi: 10.1186/1755-8794-4-11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10(3):R25. doi: 10.1186/gb-2009-10-3-r25. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Canales RD, Luo Y, Willey JC, Austermiller B, Barbacioru CC, Boysen C, Hunkapiller K, Jensen RV, Knight CR, Lee KY, Ma Y, Maqsodi B, Papallo A, Peters EH, Poulter K, Ruppel PL, Samaha RR, Shi L, Yang W, Zhang L, Goodsaid FM. Evaluation of DNA microarray results with quantitative gene expression platforms. Nat Biotechnol. 2006;24(9):1115–1122. doi: 10.1038/nbt1236. [DOI] [PubMed] [Google Scholar]

RESOURCES