Skip to main content
Bioinformatics and Biology Insights logoLink to Bioinformatics and Biology Insights
. 2008 Feb 29;2:145–156. doi: 10.4137/bbi.s473

A New Test Statistic Based on Shrunken Sample Variance for Identifying Differentially Expressed Genes in Small Microarray Experiments

Akihiro Hirakawa 1,, Yasunori Sato 2, Chikuma Hamada 3, Isao Yoshimura 3
PMCID: PMC2735945  PMID: 19812772

Abstract

Choosing an appropriate statistic and precisely evaluating the false discovery rate (FDR) are both essential for devising an effective method for identifying differentially expressed genes in microarray data. The t-type score proposed by Pan et al. (2003) succeeded in suppressing false positives by controlling the underestimation of variance but left the overestimation uncontrolled. For controlling the overestimation, we devised a new test statistic (variance stabilized t-type score) by placing shrunken sample variances of the James-Stein type in the denominator of the t-type score. Since the relative superiority of the mean and median FDRs was unclear in the widely adopted Significance Analysis of Microarrays (SAM), we conducted simulation studies to examine the performance of the variance stabilized t-type score and the characteristics of the two FDRs. The variance stabilized t-type score was generally better than or at least as good as the t-type score, irrespective of the sample size and proportion of differentially expressed genes. In terms of accuracy, the median FDR was superior to the mean FDR when the proportion of differentially expressed genes was large. The variance stabilized t-type score with the median FDR was applied to actual colorectal cancer data and yielded a reasonable result.

Keywords: differentially expressed genes, false discovery rate, microarray, shrunken sample variance, significance analysis of microarray, t-type score

Introduction

In recent biological studies, the development of DNA microarray technology allows simultaneous measurement of the expression levels of thousands of genes and identification of genes that are differentially expressed across cells or tissues under different conditions such as normal or disease conditions. Identifying differentially expressed genes poses complex multiple testing problems and reduces the power for statistical tests because thousands of genes are evaluated simultaneously in typical microarray experiments. In such cases, it is difficult to precisely identify differentially expressed genes using traditional statistical methods, and this difficulty is exacerbated when the sample size is small.

The traditional test statistics for comparing the gene expression levels under different conditions are the Student’s t-statistic, Welch t-statistic, and Mann-Whitney statistic. These test statistics have been used since the beginning of microarray experiments, although their performances have not been sufficiently evaluated. After that, many test statistics suitable for microarray data have been proposed, i.e. Golub’s discrimination score (Golub et al. 1999), the SAM-statistic (Tusher et al. 2001), t-type score, and samroc statistic (Broberg, 2003). Golub’s discrimination score is a statistic wherein the mean difference between two groups is divided by the sum of the standard deviations of both groups. The SAM statistic is a modified Student’s t-statistic with a correction term added to its denominator. The samroc statistic is a ROC-based SAM statistic whose performance is almost the same as that of the SAM-statistic. Recently, some new test statistics have been proposed (Chen et al. 2007; Opgen-Rhein and Strimmer, 2007; Sartor et al. 2006).

Choosing an appropriate test statistic is essential for devising an effective method for identifying differentially expressed genes. We need to suppress both false positives, which are genes that are incorrectly identified as differentially expressed genes, and false negatives, which are genes that are incorrectly identified as nondifferentially expressed genes. Both false positives and false negatives mainly arise due to the underestimation and overestimation of the variance of gene expression levels, respectively. The Student’s t-statistic, Welch t-statistic, and Golub’s discrimination score leave both the underestimation and overestimation of variance uncontrolled, resulting in an increased risk of both false positives and false negatives (Broberg, 2003; Pan, 2002). The SAM-statistic and t-type score (Pan et al. 2003) with a correction term added to the denominator of the Welch t-statistic can suppress false positives by controlling the underestimation of variance, but they leave the overestimation uncontrolled. To control the overestimation, we devised a new test statistic (variance stabilized t-type score) by placing shrunken sample variances of the James-Stein type (Cui et al. 2005; Stein, 1956; Stein, 1964) in the denominator of the t-type score. The shrunken sample variances, which can borrow information across genes using the James-Stein shrinkage concept, can control the overestimation of variance. The variance stabilized t-type score can suppress both false positives and false negatives by adding the correction term and placing the shrunken sample variances, respectively.

According to the article published by Dupuy and Simon (2007), from among the many test statistics, the SAM statistic, Golub’s discrimination score, the t-statistic, and the Mann-Whitney statistic have been frequently used in actual microarray data analysis. We, therefore, conducted a simulation study to compare the performances of the Mann-Whitney statistic, Golub’s discrimination score, the Welch t-statistic, the t-type score, and the variance stabilized t-type score in small microarray experiments (Simulation study 1).

The false discovery rate (FDR) introduced by Benjamini and Hochberg (1995) as a criterion for optimizing the identifiability of differentially expressed genes is also essential for devising an effective method. The FDR is defined as the expected proportion of false positives among total positives, which represent all identified genes. Using the FDR, we can identify differentially expressed genes using a cut-off value corresponding to the target FDR, e.g. FDR <0.05. Recently, many FDR estimation methods have been proposed, such as Significance Analysis of Microarrays (SAM) (Tusher et al. 2001), the empirical Bayes (EB) method (Efron, 2001), and the mixture model method (MMM) (Pan et al. 2003). Among these, SAM is widely used (Dupuy and Simon, 2007) and estimates the number of false positives based on a permutation procedure. Note that SAM provides two FDRs, i.e. the mean FDR and median FDR. In the original description of SAM, the mean FDR is obtained by using the mean of the estimated number of false positives in each permutation, while the SAM software in the R package (Chu et al. 2005) provides the median FDR using the median of the estimated number of false positives in each permutation. Although the values of both FDRs are quite different in the actual microarray data analysis, the relative superiority of the two FDRs is unclear. Therefore, we examined the accuracy of both the mean and the median FDRs by using SAM in small microarray experiments (Simulation study 2).

Additionally, SAM using the variance stabilized t-type score was applied to colorectal cancer data comprising approximately 22,000 probesets of 3 cells from the primary tumor and 3 cells from a lymph node metastasis (http://www.ncbi.nlm.nih.gov/projects/geo/gds/; Provenzani et al. 2006) in order to demonstrate the practical application of both FDRs using the variance stabilized t-type score.

Materials and Methods

Test statistics

For each gene i, i = 1, 2,…, g, the expression level is Xi1, …, Xim from m samples collected from cells or tissues under Condition 1, and it is Yi1, …, Yin from n samples collected from cells or tissues under Condition 2. Xi1, …, Xim are normal random variables with true mean μXi and true variance σXi2, and Yi1, …, Yin are normal random variables with true mean μYi and true variance σYi2. If the true means of the two conditions are different, the gene is defined as being differentially expressed, and if the true means of the two conditions are the same, the gene is defined as being nondifferentially expressed.

The Mann-Whitney statistic is a nonparametric test statistic that does not assume specific distributions. For gene i, all the expression levels are ranked without regard to which sample they are in, being arranged into a single ranked series. Let ui denote the Mann-Whitney statistic for gene i; ui can be written as

ui=12mn(R¯Xi-R¯Yi)2(m+n)2(m+n+1)fi, (1)

where Xi is the mean rank of m samples in Condition 1, and Yi is the mean rank of n samples in Condition 2. Also, let T and tk be the size of tie expression levels in both conditions and the number of kth tie expression levels, respectively; fi can be written as fi = 1 − ∑k=1T tk (tk − 1)(tk + 1)/(m + n)(m + n − 1) (m + n + 1).

Golub’s discrimination score is a test statistic that is similar to the Welch t-statistic. Let di denote Golub’s discrimination score for gene i; di can be written as

di=X¯i-Y¯isXi2+sYi2, (2)

where i = ∑j=1m Xij/m and i = ∑j=1n Yij/n are the sample means for gene i under Conditions 1 and 2, respectively, and sXi2 = ∑j=1m (Xiji)2/(m − 1) and sYi2 = ∑j=1n (Yiji)2/(n − 1) are the sample variances for gene i under Conditions 1 and 2, respectively.

The Welch t-statistic is a typical test statistic for comparing gene expression levels under two conditions. Let wi denotes the Welch t-statistic for gene i; wi can be written as

wi=X¯i-Y¯isXi2/m+sYi2/n. (3)

In the case where the Welch t-statistic is used, since thousands of genes are evaluated simultaneously, when some of them have underestimated sample variance under two conditions by chance, their absolute test statistic becomes large even though their mean difference is not meaningfully large. In such cases, the total positives contain many false positives. For suppressing false positives, the t-type score with a correction term added to the denominator of the Welch t-statistic has been proposed based on the basic idea of the SAM-statistic. Let zi denote the t-type score for gene i; zi can be written as

zi=X¯i-Y¯isXi2/m+sYi2/n+a0, (4)

where a0 is a correction term and is the 90th percentile of {sXi2/m+sYi2/n:i=1,,g}. A correction term serves as a control for the underestimation of variance, and the t-type score can suppress false positives when a correction term is used. However, the t-type score leaves the overestimation of variance uncontrolled, resulting in an increased risk of false negatives.

We devised the variance stabilized t-type score by placing shrunken sample variances of the James-Stein type in the denominator of the t-type score to control the overestimation of variance. Let vi denotes the variance stabilized t-type score for gene i; vi can be written as

vi=X¯i-Y¯is˜Xi2/m+s˜Yi2/n+a˜0, (5)

Where s˜Xi2=sXi2-sXi2sXi2+s¯X2(sXi2-s¯X2),s¯X2=Σi=1gsXi2/g and s˜Yi2=sYi2-sYi2sYi2+s¯Y2(sYi2-s¯Y2),s¯Y2=Σi=1gsYi2/g are the shrunken sample variances for gene i under two conditions, respectively, and ã0 is the 90th percentile of {s˜Xi2/m+s˜Yi2/n:i=1,,g}. Based on Equation (5), when sXi2 (sYi2) is larger than X2 (Y2), sXi2 (sYi2) is shrunken toward X2 (Y2). On the other hand, when sXi2 (sYi2) is smaller than X2 (Y2), sXi2 (sY2) is not shrunken toward X2 (Y2) and remains as it is. These behaviors of shrunken sample variances control the overestimation of variance. After controlling the overestimation, a correction term controls the underestimation of variance as well as the t-type score. Thus, the variance stabilized t-type score can suppress false positives and false negatives, simultaneously.

False discovery rate

The FDR is a popular criterion for identifying differentially expressed genes in microarray data analysis. In this paper, we use the FDR definition given by Storey and Tibshirani (2003). For a fixed cut-off value, c, for a test statistic, we can obtain the true FDR and its estimator as

FDR(c)=π0FP(c)TP(c),FD^R(c)=π^0F^P(c)T^P(c), (6)

where π0 is the proportion of true nondifferentially expressed genes among the total candidate genes, and π̂0 is its estimator. For a fixed cut-off value, c, FP(c) is defined as the number of true false positives, and F̂P(c) is defined as the estimated number of false positives. Similarly, we define T̂P(c) as the estimated number of total positives. This definition of the FDR has been widely used in recent microarray data analysis, although the FDR is defined as F̂P(c)/T̂P(c) in the original SAM description. Based on Equation (6), it is necessary to estimate the proportion of true nondifferentially expressed genes, π0, in order to calculate the FDR. Although many statistical methods have been proposed in recent studies to estimate π0 (Chu et al. 2005; Efron et al. 2001; Lee et al. 2000; Storey, 2002), the precise estimation is very difficult (Xie et al. 2005) and is itself an unresolved research problem. Since our study does not focus on the estimation of π0, we use true π0 in the simulation study. However, in the actual data analysis, we estimate π0 using the method of Chu et al. (2005).

Mean and median FDRs

For calculating the estimated number of total positives, we calculate any test statistic ti for gene i, i = 1, …, g, from raw data, corresponding to the ordered statistics t(1)t(2) … ≤ t(g). For the fixed cut-off value, c, we identify gene i that satisfies | t(i)| > c as a differentially expressed gene. The estimated number of total positives is defined as T̂P(c) = #{i: | t(i)| > c}.

We also need to estimate the number of false positives (FP). The SAM estimates the number of false positives based on the permutation from all samples in a total of B times. For the bth permutated data, we calculate the test statistics tib, b = 1, …, B and i = 1, …, g, and corresponding ordered statistics t(1)bt(2)bt(g)b. After the permutation, we obtain the number of genes that satisfy | t(i)b | > c, #{i: | (i)b | > c}, i = 1, …, g, in each permutation. The mean of these numbers of genes is defined as the mean F̂P(c), and the median of these numbers of genes is defined as the median F̂P(c). The mean F̂P(c) and median F̂P(c) can be represented as follows:

meanF^P(c)=1/Bb=1B#{i:t(i)b>c} (7)

and

medianF^P(c)=median(#{i:t(i)1>c},#{i:t(i)2>c},,#{i:t(i)B>c}). (8)

We place T̂P(c), mean F̂P(c), and median F̂P(c) into Equation (6) to obtain the mean FD̂R and median FD̂R for the fixed cut-off value, c, as follows

meanFD^R=π^0·meanF^P(c)T^P(c) (9)

and

medianFD^R=π^0·medianF^P(c)T^P(c). (10)

Similarly, the 75th percentile FD̂R and 90th percentile FD̂R are defined as

75thperc.FD^R=π^0·75thperc.F^P(c)T^P(c)

and

90thperc.FD^R=π^0·90thperc.F^P(c)T^P(c),

respectively. Note that we calculate the ordered statistics (1)b(2)b(g)b to determine the cut-off value, c, corresponding to the fixed threshold Δ = (i)t(i) in SAM (Chu et al. 2005; Tusher et al. 2001). However, in this paper, we determined the cut-off value, c, corresponding to the target FDR because the procedure for determining the cut-off value is not relevant to the accuracy of the estimated FDR.

Simulation studies

We conducted Simulation study 1 to compare the performance of the five statistics, i.e. the Mann-Whitney statistic, Golub’s discrimination score, the Welch t-statistic, the t-type score, and the variance stabilized t-type score. The criterion for examining the performance is the receiver operating characteristic (ROC) curve in which the proportions of both false positives and false negatives are used (Broberg, 2003). Based on the ROC curve, we consider that a test statistic whose ROC curve lies below that of another test statistic has better performance. Simulation study 1 is designed to have 4,000 (i = 1, …, 4,000) genes in total, including s differentially expressed genes (i = 1, …, s) and 4,000-s nondifferentially expressed genes (i = s + 1, …, 4,000). Each condition has an equal sample size N (N = m = n). For j = 1, …, N, we generate

XijNormal(μi,σ12),i=1,,s,
XijNormal(0.0,σi2),i=s+1,,4,000,

and

YijNormal(0.0,σi2),i=1,,4,000.

Since each true mean of the expression levels of differentially expressed genes is different, we assume a random effect model, i.e. μiNormal (1.0, 0.12), i = 1, …, s. We focus on small sample size experiments under two conditions, and the sample size (N) is set as 3, 5, or 10. The number of differentially expressed genes (s) corresponding to 1% or 10% is set as 40 or 400, respectively. In the case of constant variance, we set σi2 = 0.52. We also generate a random standard deviation for each gene from the normal distribution with mean 0.5 and variance 0.12 for random variances case. Additionally, the replication of simulation is set as 1,000, and the ROC curve is drawn using the average of 1,000 values of the estimated proportions of false positives and false negatives.

We also conducted Simulation study 2 to examine the accuracy of the mean FDR, the median FDR, the 75th percentile FDR, and the 90th percentile FDR when the variance stabilized t-type score is used. The criterion for examining the accuracy is the scatter plot of the true FDR versus the estimated FDRs. Based on the scatter plot, the line above the diagonal shows overestimation, while that below the diagonal shows underestimation. The simulated data and simulation conditions are the same as those of Simulation Study 1. In the additional conditions, the number of permutations, B, is set as 200 in SAM and the proportion of true nondifferentially expressed genes, π0, is set as 1 - s/4,000 for each condition in order to estimate the FDR in Equation (6). The scatter plot is drawn using the average of 1,000 values of the true FDR and the each estimated FDR.

Application to colorectal cancer data

Colorectal cancer (CRC) data were measured for comparing the polysomal RNA from isogenic cell lines established from a CRC patient (http://www.ncbi.nlm.nih.gov/projects/geo/gds/; Provenzani et al. 2006). In the CRC data, 3 cells were derived from a primary tumor and a lymph node metastasis, respectively. Each single RNA sample was subjected to microarray data analysis on Affymetrix DNA chips bearing approximately 22,000 probe sets, which collectively interrogate 14,500 human genes. Details are given in Provenzani et al. (2006). SAM using the three statistics, i.e. the Welch t-statistic, the t-type score, and the variance stabilized t-type score, was applied to this data. The number of permutations, B, is set as 200, and the proportion of true nondifferentially expressed genes, π0, is estimated by the method of Chu et al. (2005).

Results

Results of simulation studies

We discuss only constant variance cases because the ROC curves in Simulation study 1 and the scatter plot in Simulation study 2 of both constant variance cases and random variances cases are almost the same. Figure 1 shows the performance of each test statistic, based on the ROC curve. The t-type score and the variance stabilized t-type score outperformed the other three test statistics, irrespective of the sample size and the proportion of differentially expressed genes. The difference in performance between them became large when the sample size or the proportion of differentially expressed genes decreased. The variance stabilized t-type score outperformed the t-type score when N = 3 or 5, but it was slightly better than or as good as the t-type score when N = 10. The difference in the performance between the variance stabilized t-type score and the t-type score became large when the sample size or the proportion of differentially expressed genes decreased. The performances of the Mann-Whitney statistic, Golub’s discrimination score, and the Welch t-statistic are almost same when the sample size is greater than 5.

Figure 1.

Figure 1

Performance of each test statistic in Simulation study 1.

Figure 2 shows that the accuracy of the mean FDR, median FDR, 75th percentile FDR, and 90th percentile FDR based on the scatter plot when the true FDR was smaller than 0.2. Each estimated FDR was calculated using the true proportion of nondifferentially expressed genes, π0. The biases of the mean FDR, 75th percentile FDR, and 90th percentile FDR were almost the same, irrespective of the sample size and the proportion of differentially expressed genes. When s = 40, the mean FDR, 75th percentile FDR, and 90th percentile FDR were constantly overestimated, whereas the median FDR was overestimated or underestimated depending on the true FDR. In particular, the median FDR was underestimated when the true FDR was low. When s = 400, the mean FDR, 75th percentile FDR, and 90th percentile FDR were overestimated, whereas the median FDR was almost unbiased.

Figure 2.

Figure 2

Accuracy of each FDR in Simulation study 2.

Results of colorectal cancer data analysis

Figure 3 shows the relationship between the three statistics, the Welch t-statistic, the t-type score, and the variance stabilized t-type score, and the standard error of the Welch t-statistic, sXi2/m+sYi2/n, in the CRC data. The results shown in Fig. 3 (a) suggest that the absolute Welch t-statistics of the majority of genes became large due to the underestimation of variance, resulting in an increased risk of false positives. In Fig. 3 (b), it appears that the absolute t-type score suppressed false positives by controlling the underestimation of variance, and the t-type score contributed to the precise identification of differentially expressed genes. Fig. 3 (c) suggested that the variance stabilized t-type score controlled the underestimation of variance as well as the t-type score and also controlled the overestimation of variance because the ranks of the genes with sample variances that were not small became high when compared with Fig. 3 (b). This result indicates that the overestimation of variance was left uncontrolled in the t-type score, while the variance stabilized t-type score could control both the overestimation and underestimation of variance in the CRC data. Thus, the variance stabilized t-type score made a greater contribution to the precise identification of differentially expressed genes than the t-type score.

Figure 3.

Figure 3

Relationships between the three statistics and the standard error of the Welch t-statistic.

Table 1 shows the estimated TP, the estimated mean FDR, and the estimated median FDR using the three statistics, the Welch t-statistic, the t-type score, and the variance stabilized t-type score. The estimated proportions of the true nondifferentially expressed genes, π̂0, for the Welch t-statistic, the t-type score, and the variance stabilized t-type score were 0.636, 0.646, and 0.606, respectively. Each value of π̂0 was almost the same in the CRC data. The estimated number of total positives of the Welch t-statistic was extremely large and would contain many false positives. Since the T̂P of both the t-type score and the variance stabilized t-type score was smaller than that of the Welch t-statistic, many false positives were suppressed. The T̂P of the variance stabilized t-type score was larger than that of the t-type score because the variance stabilized t-type score controlled both the overestimation and underestimation of variance, resulting in a decreased risk of both false positives and false negatives.

Table 1.

Results of the CRC data analysis.

Welch t-statistic (π̂0 = 0.636)
T-type score (π̂0 = 0.646)
Variance stabilized t-type score (π̂0 = 0.606)
Cut-off value T̂P Mean FD̂R Median FD̂R T̂P Mean FD̂R Median FD̂R T̂P Mean FD̂R Median FD̂R
2.0 8221 0.220 0.142 376 0.106 0.012 811 0.115 0.028
2.1 7880 0.211 0.131 338 0.103 0.010 708 0.111 0.025
2.2 7576 0.202 0.121 308 0.102 0.008 629 0.108 0.019
2.3 7294 0.194 0.112 278 0.100 0.005 578 0.103 0.015
2.4 6994 0.186 0.103 251 0.099 0.003 522 0.100 0.010
2.5 6722 0.180 0.097 226 0.098 0.000 469 0.097 0.008

The estimated median FDR was smaller than the estimated mean FDR irrespective of the test statistic. Based on the results of Simulation study 2, the median FDR was almost unbiased, whereas the mean FDR was overestimated when N = 3 and s = 400. Therefore, the median FDR is recommended as the criterion for identifying differentially expressed genes in the CRC data. When the cut-off value was 2.5, the estimated median FDR of the t-type score was 0, i.e. 226 genes contained no false positives, while the estimated median FDR of variance stabilized t-type score was 0.008 when the cut-off value was 2.5, and 469 genes contained hardly any false positives. Probably, several hundreds of genes may be identified as nondifferentially expressed genes when the t-type score is used, despite the t-type score of such genes is really underestimated due to the overestimation of variance.

Discussion

Utility of variance stabilized t-type score

In this paper, we devised the variance stabilized t-type score using the shrunken sample variances of the James-Stein type and examined its performance. The results of both Simulation study 1 and CRC data analysis revealed the characteristics of the five test statistics, i.e. the Mann-Whitney statistic, Golub’s discrimination score, the Welch t-statistic, the t-type score, and the variance stabilized t-type score, in small microarray experiments. The Mann-Whitney statistic, Golub’s discrimination score, and the Welch t-statistic cannot control both the overestimation and underestimation of variance, thereby resulting in an increased risk of both false positives and false negatives. In the case where these test statistics are used, many false positives are identified as differentially expressed genes even if the cut-off value corresponding to a small target FDR is determined to suppress false positives. However, the Mann-Whitney statistic and Welch t-statistic can provide the p value as another criterion for identifying differentially expressed genes. Since the t-type score and the variance stabilized t-type score cannot provide the p value, we may be able to use the Mann-Whitney statistic or the Welch t-statistic if the sample size is sufficiently large. Based on the results of the additional simulation study and a recent study in terms of the usual t-statistic (Pan, 2002), the identification of differentially expressed genes using the Welch t-statistic is reliable when the sample size is more than 30, irrespective of the proportion of differentially expressed genes.

The basic idea of adding a correction term to the denominator of the Welch t-statistic in terms of the t-type score had been introduced by Tusher et al. (2001). The utility of a modified t-statistic such as the SAM-statistic or the t-type score has been demonstrated by some researchers (Baldi and Long, 2001; Broberg, 2003), and we evaluated the utility of the t-type score in small microarray experiments. The t-type score, which can control the underestimation of variance in particular, showed a better performance than the Welch t-statistic irrespective of the sample size because the sample variances are underestimated by chance in small microarray experiments. On the other hand, we noted that the t-type score leaves the overestimation of variance uncontrolled. For precise identification of differentially expressed genes, the overestimation of variance should be controlled because both overestimation and underestimation of variance occur simultaneously in actual microarray data analysis.

The shrunken sample variances of the James-Stein type in the variance stabilized t-type score borrow information across genes using the James-Stein shrinkage concept. This is useful for identifying differentially expressed genes because the variance is not estimated precisely when the sample size is small. We demonstrated that the variance stabilized t-type score was better than or at least as good as the t-type score by using simulated and actual data. In particular, the variance stabilized t-type score outperformed the t-type score when the sample size or the proportion of differentially expressed genes was small, and the relative superiority of the variance stabilized t-type score was almost the same when the sample variances were either constant or random. Thus, the variance stabilized t-type score is effective and robust for identifying differentially expressed genes in small microarray experiments.

Characteristics of the mean FDR and median FDR

The original description of SAM provides the mean FDR, while the SAM software of the R package provides the median FDR. Because both estimated FDRs are quite different in the actual microarray data analysis, biological researchers are confused with regard to which FDR is a suitable criterion for actual individual microarray data. Indeed, the difference between the estimated mean FDR and estimated median FDR was approximately 0.1 when the variance stabilized t-type score was used in the CRC data analysis. This difference is meaningfully large because approximately 22,000 genes were evaluated simultaneously, although the magnitude of the difference was small. To our knowledge, no studies have been conducted in which the accuracy of the median FDR was examined, although some studies have examined the accuracy of the mean FDR (Efron et al. 2001; Pan, 2003). The result of Simulation study 2 revealed the characteristics of the four FDRs as determined by SAM. As pointed out by Pan et al. (2003) in terms of the mean FDR, the estimated distribution of nondifferentially expressed genes based on the permutation (null distribution) was more dispersed than the distribution of true nondifferentially expressed genes. In other words, the variation of distribution that consists of the estimated number of false positives for the fixed cut-off value in each permutation was very large. The mean of such a distribution became large, resulting in overestimation of the FDR. This disadvantage was exacerbated when the sample size was small or the proportion of differentially expressed genes was large based on the results of Simulation study 2. The median FDR was almost unbiased when the proportion of differentially expressed genes was large even if the sample size was small. This feature of the median FDR, i.e. the accuracy does not vary according to the sample size, is attractive in small microarray experiments. However, the median FDR was underestimated when the true FDR and the proportion of differentially expressed genes was small. The magnitude of underestimation increased when the sample size decreased. The reason for the underestimation of the median FDR is that the median of distribution that consists of the estimated number of false positives for the large cut-off value in each permutation becomes very sparse when the sample size or the proportion of differentially expressed genes is small. Specifically, the estimated number of false positives in each permutation becomes almost zero in the case where the large cut-off value is used when the sample size or proportion of differentially expressed genes is small. In such a case, although the true number of false positives is more than 1, the median of the distribution becomes 0, thereby resulting in underestimation of the FDR. Further, through the additional simulation study, we confirmed that the median FDR also outperformed the mean FDR when s = 200, corresponding to 5% as the proportion of differentially expressed genes, and the five FDRs based on SAM using the t-type score, show a similar performance to that of the variance stabilized t-type score. Our results indicated that we mainly need to make a choice between the mean FDR and the median FDR based on the estimated proportion of differentially expressed genes. Based on the result of Simulation study 2, we recommend the use of the median FDR as the criterion when the estimated proportion of differentially expressed genes is more than 1%, irrespective of the sample size. We also recommend that robustness and reliability be evaluated based on both the mean FDR and the median FDR when the estimated proportion of differentially expressed genes is less than 1%. In the case of CRC data analysis, we should identify the differentially expressed genes based on the estimated median FDR.

Concluding remarks

We devised the variance stabilized t-type score based on the shrunken sample variances and examined its performance, and the accuracy of the median FDR by SAM. The utility of the variance stabilized t-type score and the characteristics of the median FDR were demonstrated through their application to simulated and actual data. Our results indicated that use of the variance stabilized t-type score with the median FDR by SAM is an effective and robust method for identifying differentially expressed genes when the proportion of differentially expressed genes is more than 1% in small microarray experiments.

Footnotes

Authors’ Contribution

Akihiro Hirakawa, Chikuma Hamada and Isao Yoshimura contributed equally to this work.

References

  1. Baldi P, Long AD. A Bayesian framework for the analysis of microarray expression data: regularized t-test and statistical inferences of gene changes. Bioinformatics. 2001;17:509–19. doi: 10.1093/bioinformatics/17.6.509. [DOI] [PubMed] [Google Scholar]
  2. Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Statist Soc. B. 1995;57:289–300. [Google Scholar]
  3. Broperg P. Statistical methods for ranking differentially expressed genes. Genome Biol. 2003;4(6):R41. doi: 10.1186/gb-2003-4-6-r41. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Chen JJ, Tsai CA, Tzeng S, et al. Gene selection with multiple ordering criteria. BMC Bioinformatics. 2007;8:74. doi: 10.1186/1471-2105-8-74. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Chen Y, Dougherty ER, Bittner ML. Ratio-based decisions and the quantitative analysis of cDNA microarray images. J. Biomed. Optics. 1997;2:364–7. doi: 10.1117/12.281504. [DOI] [PubMed] [Google Scholar]
  6. Chu G, Narasimhan B, Tibshirani R, et al. SAM “Significance Analysis of Microarray” Users Guide and Technical Document. 2005. [Accessed 1 September 2007]. URL: http://www-stat.stanford.edu/∼tibs/SAM/
  7. Comander J, Natarajan S, Gimbrone MA, et al. Improving the statistical detection of regulated genes from microarray data using intensity-based variance estimation. BMC Genomics. 2004;5(1):17. doi: 10.1186/1471-2164-5-17. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Cui X, Hwang G, Qui J, et al. Improved statistical tests for differential gene expression by shrinking variance components estimates. Biostatistics. 2005;6:59–75. doi: 10.1093/biostatistics/kxh018. [DOI] [PubMed] [Google Scholar]
  9. Dupuy A, Simon RM. Critical review of published microarray studies for cancer outcome and guidelines on statistical analysis and reporting. J. Natl. Cancer Inst. 2007;99:147–57. doi: 10.1093/jnci/djk018. [DOI] [PubMed] [Google Scholar]
  10. Efron B, Tibshirani R, Storey JD, et al. Empirical Bayes analysis of a microarray experiment. J. Am. Statist. Ass. 2001;456:1151–60. [Google Scholar]
  11. Golub TR, Slonim DK, Tamayo P, et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science. 1999;286:531–7. doi: 10.1126/science.286.5439.531. [DOI] [PubMed] [Google Scholar]
  12. Kohane I, Kho A, Butte A. Microarrays for an integrative genomics. The MIT Press; 2002. [Google Scholar]
  13. Kooperberg C, Aragaki A, Strand AD, et al. Significance testing for small microarray experiments. Stat. Med. 2005;24:2281–98. doi: 10.1002/sim.2109. [DOI] [PubMed] [Google Scholar]
  14. Lee M, Kuo F, Whitmore G, et al. Importance of replication in microarray gene expression studies: Statistical methods and evidence from repetitive cDNA hybridizations. Proc. Natl. Acad. Sci. U.S.A. 2000;97(18):9834–9. doi: 10.1073/pnas.97.18.9834. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. McLachlan G, Do K, Ambroise C. Analyzing microarray gene expression data. New York: Wiley; 2004. [Google Scholar]
  16. National Center Biotechnology Information. Gene Expression Omnibus. Colorectal cancer progression: polysomal mRNA profiles. [Accessed 1 September 2007]. Accession No: GDS1780. URL: http://www.ncbi.nlm.nih.gov/projects/geo/gds/
  17. Opgen-Rhein R, Strimmer K. Accurate ranking of differentially expressed genes by a distribution-free shrinkage approach. Stat. Appl. Genet. Mol. 2007;6(1):9. doi: 10.2202/1544-6115.1252. [DOI] [PubMed] [Google Scholar]
  18. Pan W. A comparative review of statistical methods for discovering differentially expressed genes in replicated microarray experiments. Bioinformatics. 2002;18(4):546–56. doi: 10.1093/bioinformatics/18.4.546. [DOI] [PubMed] [Google Scholar]
  19. Pan W, Lin J, Le C. A mixture model approach to detecting differentially expressed genes with microarray data. Funct. Integr. Genomics. 2003;3:117–24. doi: 10.1007/s10142-003-0085-7. [DOI] [PubMed] [Google Scholar]
  20. Pan W. On the use of permutation in and the performance of a class of nonparametric methods to detect differential gene expression. Bioinformatics. 2003;19(11):1333–40. doi: 10.1093/bioinformatics/btg167. [DOI] [PubMed] [Google Scholar]
  21. Pawitan Y, Murthy K, Michiels S, et al. Bias in the estimation of false discovery rate in microarray studies. Bioinformatics. 2005;21(20):3865–72. doi: 10.1093/bioinformatics/bti626. [DOI] [PubMed] [Google Scholar]
  22. Pounds S, Cheng C. Robust estimation of the false discovery rate. Bioinformatics. 2006;22(16):1979–87. doi: 10.1093/bioinformatics/btl328. [DOI] [PubMed] [Google Scholar]
  23. Provenzani A, Fronza R, Loreni F, et al. Global alterations in mRNA polysomal recruitment in a cell model of colorectal cancer progression to metastasis. Carcinogenesis. 2006;27(7):1323–33. doi: 10.1093/carcin/bgi377. [DOI] [PubMed] [Google Scholar]
  24. Reiner A, Yekutieli D, Benjamini Y. Identifying differentially expressed genes using false discovery rate procedure. Bioinformatics. 2003;19(3):368–75. doi: 10.1093/bioinformatics/btf877. [DOI] [PubMed] [Google Scholar]
  25. Sartor MA, Tomlinson CR, Wesselkamper SC, et al. Intensity-based hierarchical bayes method improves testing for differentially expressed genes in microarray experiments. BMC Bioinformatics. 2006;7:538. doi: 10.1186/1471-2105-7-538. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Schena M, Sharon D, Heller R, et al. Parallel genome human analysis: Microarray based-expression monitoring of 1000 genes. Proc. Natl. Acad. Sci. U.S.A. 1997;93(20):10614–9. doi: 10.1073/pnas.93.20.10614. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Simon R. New challenges for 21st century clinical trials. Clin. Trials. 2007;4:167–9. doi: 10.1177/1740774507076800. [DOI] [PubMed] [Google Scholar]
  28. Stein C. Inadmissibility of the usual estimator for the mean of a multivariate normal distribution. Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability; 1956. pp. 197–206. [Google Scholar]
  29. Stein C. Inadmissibility of the usual estimator for the variance of a normal distribution with unknown mean. Ann. Inst. Statist. Math. 1964;16:155–60. [Google Scholar]
  30. Storey JD. A direct approach to false discovery rates. J. R. Stat. Soc. B. 2002;64:479–98. [Google Scholar]
  31. Storey JD, Tibshirani R. Statistical significance for genomewide studies. Proc. Natl. Acad. Sci. U.S.A. 2003;100(16):9440–45. doi: 10.1073/pnas.1530509100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Tusher V, Tibshirani R, Chu G. Significance analysis of microarrays applied to the ionizing radiation response. Proc. Natl. Acad. Sci. U.S.A. 2001;98(9):5116–21. doi: 10.1073/pnas.091062498. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Wu B, Guan Z, Zhao H. Parametric and nonparametric FDR. estimation revisited. Biometrics. 2006;62:735–44. doi: 10.1111/j.1541-0420.2006.00531.x. [DOI] [PubMed] [Google Scholar]
  34. Xie Y, Pan W, Khodusky BA. A note on using permutation-based false discovery rate estimates to compare different analysis methods for microarray data. Bioinformatics. 2005;21(23):4280–88. doi: 10.1093/bioinformatics/bti685. [DOI] [PubMed] [Google Scholar]
  35. Zhao Y, Pan W. Modified nonparametric approaches to detecting differentially expressed genes in replicated microarray data. Bioinformatics. 2003;19(9):1046–54. doi: 10.1093/bioinformatics/btf879. [DOI] [PubMed] [Google Scholar]

Articles from Bioinformatics and Biology Insights are provided here courtesy of SAGE Publications

RESOURCES