Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2011 Aug 17.
Published in final edited form as: Stat Med. 2008 May 20;27(11):1960–1972. doi: 10.1002/sim.3237

Practical guidelines for assessing power and false discovery rate for a fixed sample size in microarray experiments

Tiejun Tong 1,, Hongyu Zhao 2,3,*,
PMCID: PMC3157366  NIHMSID: NIHMS66639  PMID: 18338314

SUMMARY

One major goal in microarray studies is to identify genes having different expression levels across different classes/conditions. In order to achieve this goal, a study needs to have an adequate sample size to ensure the desired power. Due to the importance of this topic, a number of approaches to sample size calculation have been developed. However, due to the cost and/or experimental difficulties in obtaining sufficient biological materials, it might be difficult to attain the required sample size. In this article we address more practical questions for assessing power and false discovery rate (FDR) for a fixed sample size. The relationships between power, sample size and FDR are explored. We also conduct simulations and a real data study to evaluate the proposed findings.

Keywords: false discovery rate, gene expression data, power, sample size, T-statistic

1. INTRODUCTION

The development of microarray technologies has revolutionized biomedical research. Microarrays allow scientists to simultaneously estimate the expression levels of thousands of genes in a given sample. Such high-dimensional data demand and motivate novel statistical approaches for the experimental design, data analysis and interpretation. One main objective of microarray studies is to identify genes with different expression levels between two or more conditions. Because thousands of tests are conducted simultaneously, there is a significant multiple comparison issue and it is not appropriate in general to control the false positives on a per comparison basis. Instead, the errors need to be controlled at a more stringent level. Two commonly used approaches are controlling the family-wise error (FWE) rate or the false discovery rate (FDR). Control of FWER in microarray studies can be conservative and usually has low power. In contrast, FDR measures the proportion of false positives among the identified genes. It is less conservative and more commonly used in microarray data analysis.

The power of a specific study is affected by the following factors: the proportion of differentially expressed genes, the distribution of effect sizes, the variation across samples in each condition, and the sample size [2]. Sample size is probably the most important factor in study design. Due to its importance, a number of approaches to sample size calculation have been proposed in the literature [312]. These studies have found that, to achieve good power while controlling the false positives at a low level (e.g. low FDR), a large sample size is usually required. For example, under the rule that the number of genes called significant is the same as the number of non-null genes in the population, Tibshirani [13] observed that the sample size should be increased to 100 in order to get the FDR down to 5%, depending on the proportion of genes truly changed at two-fold. Similar ideal sample sizes were also reported by Jung [5], Pounds and Cheng [9] and others. However, due to the cost and/or experimental difficulties in obtaining biological materials, it might be difficult to attain such a large sample size.

In this article, in addition to re-addressing the sample size calculation problem, we present some practical guidelines for assessing power and FDR for a fixed sample size (or for the largest sample size available within the budget) in microarray experiments by studying the following two questions:

  1. For a fixed sample size and a desired FDR level, what is the maximum power achievable?

  2. For a fixed sample size and a desired power level, what is the minimum FDR achievable?

In addition, we address the relationship between power and FDR and the determination of an appropriate FDR level to employ in practice. As an alternative to the common practice of declaring all genes with corresponding test statistics above a given threshold as significant, we introduce the concept of quantile thresholding where a fixed proportion of the genes that have the largest test statistics are declared significant. The rest of this paper is organized as follows. In Section 2, we introduce the notation, describe the model and briefly review the history of FDR and its new developments. We re-address the sample size calculation problem in Section 3, and study practical questions for assessing power and FDR in Sections 4 and 5 with extensive simulations. Finally, we analyze a real data set to evaluate the proposed findings in Section 6, and conclude the article in Section 7 with some discussion.

2. T-STATISTICS and FALSE DISCOVERY RATE

Given a microarray experiment with m genes, one major goal is to identify differentially expressed genes between different conditions. Let xij (j = 1, …, n1) and yij (j = 1, …, n2) denote the observed expression levels of gene i under conditions 1 and 2, respectively. With proper normalization, we assume that xij and yij are normally distributed with means μi1 and μi2 and standard deviations σi1 and σi2. To identify differentially expressed genes is then equivalent to testing Hi0: μi1 = μi2 versus Hi1: μi1μi2. We consider the following two-sample t-tests,

Ti=x¯iy¯isi1/n1+1/n2,i=1,,m,

where x¯i=j=1n1xij/n1 and y¯i=j=1n2yij/n2, and si is the pooled standard deviation defined as si=((j=1n1(xijx¯i)2+j=1n2(yijy¯i)2)/(n1+n22))1/2. Noting that although we have assumed a common variance for simplicity of exposition, the following analysis applies to unequal variances or other t-tests (e.g. paired t-tests) as well.

Let n = n1 + n2 denote the total number of arrays and λ = n1/n the allocation proportion for condition 1. Note that λ = 1/2 represents a balanced design. When n is large, Ti is approximately normally distributed with mean nλ(1λ)δi and variance 1, where δi = (μi1μi2)/σi is the so-called effect size. Thus, by assuming a reasonably large n (for example n ≥ 10) for the moment so that the approximation holds well, we can restate the hypothesis as Hi0: δi = 0 versus Hi1: δi ≠ 0. When n is very small, we recommend the use of the exact t-distribution for the T-statistic rather than a simple normal approximation.

Table I summarizes the various outcomes of testing m hypotheses, of which m0 represents the total number of true null hypotheses and m1 represents the total number of false null hypotheses. The quantity V is the number of false positives and R is the total number of rejections. In multiple testing, the ratio of the false discoveries, V/R, is often of interest. Benjamini and Hochberg [1] defined the false discovery rate (FDR) as

Table I.

Outcomes when testing m hypotheses.

Accept Reject Total
Null true U V m0
Alternative true T S m1
Total W R m = m0 + m1
FDR=E(VR|R>0)P(R>0). (1)

Noting that in microarray studies, the scientists are rarely interested in the situation where no genes are selected (i.e., R = 0), Storey [15] introduced the positive false discovery rate (pFDR), by removing the term P(R > 0) in (1). In the asymptotic setting, FDR and pFDR are equivalent and they possess the same asymptotic properties. This can be observed by noting that limmP(R>0)=1 for any nontrivial threshold. Both FDR and pFDR consider the expectation of the ratio V/R. Recently, the quantity E(V)/E(R), called the proportion of false positives by Fernando et al. [16] or the decisive false discovery rate (dFDR) by Bickel [17], has also been introduced. We refer to it as dFDR for the rest of the article. dFDR is meaningful as long as P(R > 0) ≠ 0. Although dFDR fails to describe the simultaneous fluctuations in V and R, it has some desirable properties such as it can be optimized using decision theory without the independence assumption [17]. It is also interesting to note that under the weak dependence criterion such as finite blocks and ergodic dependence, there is little difference between FDR and dFDR because limmFDRdFDR=0 [18].

Let Inline graphic (size m0) denote the set of true null hypotheses and Inline graphic (size m1) the set of false null hypotheses. Let π0 = m0/m denote the proportion of true null hypotheses. Let Hi = 0 if the ith null hypothesis is true, and Hi = 1 otherwise. We reject the null hypothesis Hi0 if its corresponding p-value, pi, is smaller than or equal to a given threshold α ∈ (0, 1). Then it is easy to see that

dFDR(α)=iM0P(piαHi=0)iM0P(piαHi=0)+iM1P(piαHi=1). (2)

Note that (2) holds under any dependence structure among the test statistics, as long as the marginal distribution for each test statistic is maintained. As mentioned above, since there is little practical difference between FDR, pFDR and dFDR when m is large and the dependence between genes is not strong, we will not distinguish them in this article since microarray data usually contain thousands of genes. Let βi = P(piα|Hi = 1) denote the power for the ith hypothesis test, and β = Inline graphic βi/m1 the average power. Because the p-values corresponding to the true null hypotheses are uniformly distributed, we have

FDR(α)=m0αm0α+iM1βi=π0απ0α+(1π0)β. (3)

3. SAMPLE SIZE CALCULATION

In this section, we derive the required sample size for achieving the desired power β while controlling the level of FDR at γ. Based on equation (3), we have

α=γ(1π0)(1γ)π0β. (4)

Recall that for a reasonably large n, Ti is approximately normally distributed with mean nλ(1λ)δi and variance 1. Therefore, Ti ~ N(0, 1) for iInline graphic, and TiN(nλ(1λ)δi,1) for iInline graphic. For two-sided tests, we reject Hi0 if |Ti| > z1−α/2, where zα is the αth quantile of N(0, 1). Let Φ(·) denote the cumulative distribution function of N(0, 1). Using the fact that z1−α = −zα, we have βi(α)=Φ(nλ(1λ)δi+zα/2)+Φ(nλ(1λ)δi+zα/2) and thus

β(α)=1m1iM1(Φ(nλ(1λ)δi+zα/2)+Φ(nλ(1λ)δi+zα/2)). (5)

Note that the second term in equation (5) is the minor term as long as the quantity nλ(1λ)δi is nontrivial. Combining (5) and (4), we have

α=γ(1π0)(1γ)π01m1iM1(Φ(nλ(1λ)δi+zα/2)+Φ(nλ(1λ)δi+zα/2)). (6)

The required sample size can then be obtained by solving equation (6), which can be done using numerical methods such as the bisection method [5] or a simple grid search. The R code for implementing the bisection method is available from the authors upon request. In practice, to calculate the required sample size, we need to estimate the proportion of true null hypotheses and the corresponding effect sizes in the set Inline graphic (see next section for more details). Throughout the article, we take the smallest integer larger than n whenever necessary.

Equation (6) suggests that the required sample size is inversely proportional to λ(1 − λ). This implies that the most efficient design to achieve a desired power is the balanced design, i.e., λ = 1/2. It is also easy to see that the required sample size decreases as the effect sizes increase. For the special case that |δi| ≡ δ > 0 for all iInline graphic, by ignoring the minor term in equation (5) we have n ≈ (zβzα/2)2/(λ(1 − λ)δ2), which is the same result found in Jung [5].

4. POWER CALCULATION

In this section, we calculate the maximum power achievable, denoted by βmax, when the sample size n and the level of FDR γ are given. For β(α) in equation (5), we show in Appendix A that

Lemma 1

(i) β(α) is a strictly increasing function of α ∈ [0, 1]; and (ii) β(α)/α is a strictly decreasing function of α ∈ [0, 1].

By (ii) and the fact that FDR(α) = π0/[π0 + (1 − π0)β(α)/α], FDR(α) is a strictly increasing function of α ∈ [0, 1] as long as π0 ≠ 1. This suggests that the assigned level of FDR should not be larger than π0. Because both β(α) and FDR(α) are strictly increasing functions of α, the power is also a strictly increasing function of FDR. Therefore, for any given level of FDR γ, there exists a unique βmax such that βmax = β (αmax), where αmax satisfies FDR(αmax) = γ. This implies that, to find βmax, it suffices to find the unique maximum threshold αmax. From equation (3), we have

π0αmaxπ0αmax+(1π0)β(αmax)=γ, (7)

where

β(αmax)=1m1iM1(Φ(nλ(1λ)δi+zαmax/2)+Φ(nλ(1λ)δi+zαmax/2)).

By solving αmax in equation (7) using numerical methods mentioned in Section 3, the maximum power achievable can be estimated as

β^max=β(α^max)=(1γ)π^0γ(1π^0)α^max, (8)

where π̂0 is an estimate of the proportion of true null hypotheses.

4.1. Simulation study

To explore the effects of n, λ, π0 and δi on the relationship between βmax and FDR, we consider the following four scenarios (plotted in Figure 1): (A) To explore the effect of n, we consider n = 10, 20, 40, 80 with λ = 0.5, π0 = 0.8 and {δi, iInline graphic} ~ U[0, 2]; (B) To explore the effect of λ, we consider λ = 0.1, 0.2, 0.3, 0.5 with n = 20, π0 = 0.8 and {δi, iInline graphic} ~ U[0, 2]; (C) To explore the effect of π0, we consider π0 = 0.4, 0.6, 0.8, 0.95 with n = 20, λ = 0.5 and δi ~ U[0, 2]; and (D) To explore the effect of δi, we draw {δi, iInline graphic} from U[0, 2], U[1, 2], 1, 1.5 with n = 20, λ = 0.5 and π0 = 0.8. Note that without loss of generality, by symmetry we have assumed that δi > 0 for any iInline graphic. Further, we set m = 2000 throughout the simulations since it has little impact on the relationship between βmax and FDR.

Figure 1.

Figure 1

Plots of the power versus FDR. Four curves (solid, dashed, dotted, dash-dotted) correspond to four different values of n (10, 20, 40, 80) in panel A, four different values of λ (0.1, 0.2, 0.3, 0.5) in panel B, four different values of π0 (0.4, 0.6, 0.8, 0.95) in panel C, and four different sets of {δi, iInline graphic} (U[0, 2], 1, U[1, 2], 1.5) in panel D, respectively.

In all settings βmax increases as FDR increases as expected because power is an increasing function of FDR. For the same FDR level, Panel A shows that although βmax increases with the sample size as expected, the increase in power levels off when n becomes larger, which implies that there is little benefit for further increasing the sample size after a certain size. Panel B re-states the fact that a balanced design is always more efficient. Panels C and D indicate that βmax increases when the proportion of true null hypotheses decreases or when the level of effect sizes increases. Another interesting finding from these results is that most curves of βmax on FDR are concave with a clear elbow, especially when the power is nontrivial. To balance the trade-off between power and FDR, we suggest the use of an FDR near the elbow point, which provides an effective power while controlling FDR at a low level. Accurate determination of the FDR level can be critical. One approach based on decision theory is to maximize the quantity c1βmax(FDR) − c2FDR, where c1 ≥ 0 is the benefit of the achieved power, and c2 ≥ 0 is the cost paid for the falsely discovered hypotheses. In addition, though m has little impact on the relationship between βmax and FDR, in practice a larger FDR might be have to chosen to accommodate the typical microarray studies having the order of 54000 genes. More research is required to determine the FDR level for an efficient control.

To check the validation of (8) in practice, we conduct simulations to evaluate the performance of β̂max by reporting β̂maxβmax as a measure of accuracy. Set m = 2000 as before. Our first simulation study investigates the effect of δ̂i by assuming that a consistent estimate of π0 is available at this moment [19]. Noting that we estimate the effect size as (iȳi)/si which is approximately normally distributed with mean δi and variance 1/n1 + 1/n2 (see Section 2), we simulate δ̂i from N(δi, 1/n1 + 1/n2) where the true effect size δi is drawn from U[0, 2]. We consider a balanced design for simplicity and consider three different values of n (10, 30, and 50). We further consider two different levels of FDR (0.01 and 0.05), and three different values of π0 (0.6, 0.8, and 0.95). We run 5000 simulations and report the mean values of β̂maxβmax and their standard deviations in Table II. In general, the FDR level has little impact on the accuracy of β̂max, whose accuracy decreases as the sample size reduces. We observe little difference between β̂max and βmax when the sample size is at least moderately large. For a small sample size, e.g. n = 10, β̂max is slightly larger than βmax. In addition, for each given sample size, β̂max becomes more variable when π0 is approaching 1.

Table II.

Simulation results for the mean values of β̂maxβmax and their standard deviations (parenthesis) in various settings.

π0 FDR n = 50 n = 30 n = 10
0.6 0.01 0.0001 (0.007) 0.004 (0.008) 0.075 (0.009)
0.05 0.0000 (0.007) 0.002 (0.009) 0.066 (0.011)
0.8 0.01 0.0003 (0.010) 0.007 (0.012) 0.069 (0.011)
0.05 −0.0001 (0.010) 0.003 (0.012) 0.074 (0.014)
0.95 0.01 0.0004 (0.020) 0.013 (0.024) 0.051 (0.018)
0.05 0.0003 (0.021) 0.006 (0.024) 0.070 (0.022)

Our second simulation study does not assume the existence of a consistent estimate of π0. We set π0 = 0.8 for illustration. Noting that most existing estimators for π0 in the literature are conservative and overestimate π0 [20, 23], we consider two different values of π̂0 at 0.85 (i.e., 1 = 300) and 0.9 (i.e., 1 = 200). In each simulation, we draw 1 samples from {δ̂i, iInline graphic} without replacement. All other settings are the same as before. Let β̃max denote the estimated power, and we report the mean values of β̃maxβmax and their standard deviations in Table III. We also list in Table III the corresponding results for π̂0 = 0.8 from the previous simulation study for comparison. Similar to previous results, the FDR level has little impact on the accuracy of β̃max. We observe that when the inconsistency of π̂0 increases (i.e., π̂0π0 increases), the mean difference between β̃max and βmax stays similar but the variation increases. Overall, β̃max is fairly robust to the choice of π̂0.

Table III.

Simulation results for the mean values of β̃maxβmax and their standard deviations (parenthesis) in various settings.

π̂0 FDR n = 50 n = 30 n = 10
0.8 0.01 0.0003 (0.010) 0.007 (0.012) 0.069 (0.011)
0.05 −0.0001 (0.010) 0.003 (0.012) 0.074 (0.014)
0.85 0.01 −0.0003 (0.016) 0.006 (0.017) 0.069 (0.013)
0.05 −0.0002 (0.016) 0.002 (0.018) 0.074 (0.017)
0.9 0.01 −0.0001 (0.024) 0.006 (0.025) 0.069 (0.017)
0.05 −0.0007 (0.023) 0.002 (0.025) 0.075 (0.022)

5. FALSE DISCOVERY RATE CALCULATION

In this section, we calculate the minimum level of FDR achievable, denoted by FDRmin, when the sample size n and the desired power β are given. Similar arguments to those in Section 4 indicate that FDR is also a strictly increasing function of power, and thus there exists a unique FDRmin such that the desired power β is achieved. An easy way to calculate FDRmin is through FDRmin = FDR(αβ) = π0αβ/[π0αβ + (1 − π0)β], where αβ is the unique solution of

iM1(Φ(nλ(1λ)δi+zα/2)+Φ(nλ(1λ)δi+zα/2))=m1β.

Under this setting, αβ can be solved using numerical methods as before. The relationship between FDRmin and power is also readable from Figure 1. In general, to achieve a desired power, FDRmin increases with the proportion of true null hypotheses, but decreases with the effect sizes and sample size.

Let αi, iInline graphic, denote the threshold of the ith hypothesis test so as to have a detection power β, and αβ denote the common threshold such that the average power equals β. For the special case that |δi| ≡ δ > 0, the αi are all the same and thus αβ = Inline graphic αi/m1. In general, this relationship does not hold between αβ and {αi, iInline graphic} when the |δi| are not all the same. By ignoring the minor term in equation (5) (or similarly for a one-sided test), αi has an explicit form as αi/2=Φ(nλ(1λ)δi+zβ). Define = Inline graphic αi/m1, βi(α¯)=Φ(nλ(1λ)δi+zα¯/2) and β () = Inline graphic βi()/m1. In Appendix B we show that

Lemma 2

(i) When β + ᾱ/2 ≤ 1, we have β (ᾱ) ≥ β and thus ᾱ ≥ αβ; (ii) When β + αβ/2 > 1, we have β (ᾱ) < β and thus ᾱ < αβ.

Furthermore we have FDR() ≥ FDR(αβ) if β + /2 ≤ 1, and FDR() < FDR(αβ) if β + /2 > 1 by noting that FDR(α) is an increasing function of α. In practice, it is common that scientists are more interested in validating a small number of genes (e.g., top 10 or top 50 genes) than all the genes inferred to be differentially expressed. This implies that using a quantile threshold can also be of interest. Let α(k) denote the kth smallest value in {αi, iInline graphic}, βi(α(k))=Φ(nλ(1λ)δi+zα(k)/2)+Φ(nλ(1λ)δi+zα(k)/2), then #(βi(α(k)) ≥ β) = k. That is, when α(k) serves as a common threshold for all the tests, the detection power is at least β for each of the top k significant genes with αiα(k). For the special case that αmed = median{αi, iInline graphic}, we have P(βi(αmed) ≥ β) = 0.5, which implies that the detection power is at least β for half of the genes with δi ≥ median{δi, iInline graphic}.

6. REAL STUDY

We use a well-studied data set, the colon cancer data set [21] containing n1 = 22 normal colon tissue samples and n2 = 40 colon tumor samples with expression levels from 2000 genes in each sample, to evaluate the proposed findings. The main objective of this colon cancer study is to identify important genes that can distinguish colon tumors from normal tissues. We follow the same normalization steps as those in Huang and Pan [22]. That is, to remove possible array effects, we standardize the data in each array by subtracting the median expression level of the array and then dividing this value by the difference of its third quantile and its first quantile of the expression levels.

We use Storey’s method [23] to estimate the proportion of true null hypotheses. Noting that the alternative p-values are more likely to be small, for a reasonably large p0 ∈ (0, 1), the majority of p-values larger than p0 should correspond to the true null hypotheses. This suggests a conservative estimate of π0 as π̂0 = #{pi > p0}/m(1−p0). In this study, p0 is chosen as the median of all p-values as in Ge et al. [24]. Now since the sample sizes for both tissues are at least moderate, for simplicity, we calculate the p-values using the normal approximation. From the data set, we get π̂0 = 0.616 and thus 1 = 768. We then treat the largest 768 values of observed |δ̂i| as the true {δi, iInline graphic}, where δ̂i = (iȳi)/si.

The plots of β̂max on FDR for various combinations of (n1, n2) are presented in Figure 2. It is clear that the power increases as the total sample size increases when the ratio n1/n2 stays the same. The dash-dotted line with n1 = n2 = 31 is always above the solid line. This demonstrates again that a balanced design is most efficient. It is also interesting to note that under the same level of FDR, the power with (n1, n2) = (6, 56) is even lower than that with (n1, n2) = (11, 20), which indicates that an extremely unbalanced design is not recommended in practice unless necessary. As an illustration to determine the FDR level in the solid line, if we choose c1 = 1 − π̂0 and c2 = π̂0, an efficient control suggests an FDR level at 0.098.

Figure 2.

Figure 2

Plots of the power versus FDR for the colon cancer data set. Five curves (solid, dashed, dotted, dash-dotted and long-dashed) correspond to five different pairs of (n1, n2): (22, 40), (11, 20), (6, 10), (31, 31) and (6, 56), respectively.

Figure 3 displays the pattern of FDR on power when the quantile threshold, α(k), is employed. We consider α(100), αmed, α(500) and αβ, where αβ is the threshold such that the average power is exactly β (Section 5). As expected, FDR(α(k)) increases with k for any given power level. It is interesting to see that αmed has similar performance as αβ, which suggests that αmed can serve as a proxy of αβ in practice. In addition, when only a small number of top genes, e.g. k = 100, are of interest to scientists, we can control the FDR at a very satisfactory level.

Figure 3.

Figure 3

Plots of FDR versus the power when the quantile thresholds are used. Four curves (solid, dashed, dotted and dash-dotted) correspond to four different thresholds (αβ, αmed, α(100) and α(500)).

7. DISCUSSION

Our work is motivated by the fact that the sample size required for achieving good power and low FDR in microarray studies is usually large and hard to obtain, due to the cost and/or other experimental difficulties. Instead of sample size calculation under FDR control, we have studied several practical questions for assessing power and FDR for a fixed sample size. The relationships between power, sample size and FDR are explored. We hope our methods can help with microarray study designs. Our methods can be further used in the post-experimental stage, such as to determine the appropriate level of FDR control or to use a quantile threshold to follow up the top genes in the validation study.

For simplicity of exposition, we have focused on experiments with two conditions. Our methods can be generalized to more than two conditions or to more general settings. We have further assumed that there is little difference between FDR, pFDR and dFDR. The relationship between power and FDR is explored through the concept of dFDR. We note that the results in Sections 3, 4 and 5 are still valid when the test statistics are correlated with each other by noting that (2) holds in general. When the weak dependence of Storey et al. [18] does not hold, we can not claim that FDR and dFDR are asymptotically equivalent and thus the relationship between power and FDR may no longer hold. Further research is needed to explore the relationship between power and FDR (or pFDR) in the situations where the data are strongly correlated. In addition, we have assumed that the sample size is reasonably large (e.g. n > 10) such that the T -statistics are approximately normally distributed. When only a small sample size is available, we recommend the use of the exact t-distribution for the T -statistic rather than a simple normal approximation. Simulations (not shown) indicate that the patterns are also similar.

Acknowledgments

This work was supported in part by NIH grants GM-59507, N01-HV-28186 and P30-DA-18343, and NSF grant DMS-0241160. The authors thank the editor, the associate editor, two reviewers, and Matthew Holford for their constructive comments and suggestions that have led to a substantial improvement in the article.

APPENDIX A: PROOF OF LEMMA 1

  1. The first result is made trivial by noting that both Φ(nλ(1λ)δi+zα/2) and Φ(nλ(1λ)δi+zα/2) are increasing functions of α ∈ [0, 1].

  2. Let φ(·) denote the density function of N(0, 1). By Hung et al. [25] or Sackrowitz and Samuel-Cahn [26], it is easy to see that the density function of pi is
    f(pi)=φ(nλ(1λ)δi+zpi/2)2φ(zpi/2)+φ(nλ(1λ)δi+zpi/2)2φ(zpi/2)exp(nλ(1λ)δizpi/2)+exp(nλ(1λ)δizpi/2).

Note that f(pi) ≡ 1 for any iInline graphic. When iInline graphic, f(pi) is a strictly decreasing function of pi ∈ [0, 1] by noting that zpi/2 is a strictly increasing function of pi ∈ [0, 1] and g(x) = ex+ex is strictly decreasing on x ∈ (−∞, 0). This implies that βi(α)/α is a strictly decreasing function of α ∈ [0, 1] for iInline graphic, and so is β (α)/α.

APPENDIX B: PROOF OF LEMMA 2

We prove the result (i) first. For ease of notation, denote ξi = αi/2 and ξ̄ = Inline graphic ξi/m. The result (i) is then equivalent to proving that iM1Φ(nλ(1λ)δi+zξ¯)m1β when β + ξ̄ ≤ 1. Denote ξi = ξ̄ + di where di ∈ (− ξ̄, 1 − ξ̄). Noting that nλ(1λ)δi=zβzξi, to prove (i), it suffices to prove that

iM1Φ(zβ+zξ¯zξ¯+di)m1β.

When m1 = 2, let g(d) = Φ(zβ + zξ̄zξ̄+d) + Φ(zβ+ zξ̄zξ̄d) − 2β and we need to prove that g(d) ≥ 0 for any d ∈ (−ξ̄, 1 − ξ̄). Without loss of generality, we assume d ≥ 0. Denote zξ̄+d = y, then d = Φ(y) − ξ̄. We have

dy=yyφ(t)dt=φ(y).

This implies that (/∂d)zξ̄+d = 1/φ(zξ̄+d) and similarly (/∂d)zξ̄d = −1/φ(zξ̄d). Thus

dg(d)=φ(zβ+zξ¯zξ¯+d)φ(zξ¯+d)+φ(zβ+zξ¯zξ¯d)φ(zξ¯d)eB1/2eB2/22πφ(zξ¯+d)φ(zξ¯d),

where B1=(zβ+zξ¯zξ¯d)2+zξ¯+d2 and B2=(zβ+zξ¯zξ¯+d)2+zξ¯d2. Noting that zβ+zξ̄ ≤ 0 since β + ξ̄ ≤ 1, we have

B1B2=2(zβ+zξ¯)(zξ¯+dzξ¯d)0

since zξ̄+dzξ̄d for any d ≥ 0. Therefore, (/∂d)g(d) ≥ 0 and thus g(d) ≥ g(0) = 0.

For m1 = 3, we define g(d1, d2) = Φ(zβ + zξ̄zξ̄+d1) + Φ(zβ + zξ̄zξ̄+d2) + Φ(zβ + zξ̄zξ̄−(d1+d2)) − 3β. Without loss of generality, we assume (d1 ≥ 0, d2 ≥ 0) or (d1 < 0, d2 < 0). For d1d2 < 0, we can classify it to case 1 by reordering the terms of g(d1, d2) when d1+d2 > 0, or case 2 otherwise. Similar arguments as above lead to g(d1, d2) ≥ 0 for both (d1 ≥ 0, d2 ≥ 0) and (d1 < 0, d2 < 0).

Denote M1+={i:di0,iM1} and M1={i:di<0,iM1}. And let L+=#{M1+} and L=#{M1}. Clearly, L++L = m1. Thus by the fact that iM1+di+iM1di=0, we have

iM1Φ(zβ+zξ¯zξ¯+di)=iM1+Φ(zβ+zξ¯zξ¯+di)+iM1Φ(zβ+zξ¯zξ¯+di)(L+1)β+Φ(zβ+zξ¯zξ¯+iM1+di)+(L1)β+Φ(zβ+zξ¯zξ¯+iM1di)m1β.

Therefore, β() ≥ β.

The proof of the result (ii) is shown to be essentially the same as in (i) by noting that zβ + zξ̄ > 0 for β + ξ̄ > 1, and thus omitted.

References

  • 1.Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B. 1995;57:289–300. [Google Scholar]
  • 2.Pawitan Y, Michiels S, Koscielny S, Gusnanto A, Ploner A. False discovery rate, sensitivity and sample size for microarray studies. Bioinformatics. 2005;21:3017–3024. doi: 10.1093/bioinformatics/bti448. [DOI] [PubMed] [Google Scholar]
  • 3.Yang MCK, Yang JJ, McIndoe RA, She JX. Microarray experimental design: power and sample size considerations. Physiol Genomics. 2003;16:24–28. doi: 10.1152/physiolgenomics.00037.2003. [DOI] [PubMed] [Google Scholar]
  • 4.Gadbury GL, Page GP, Edwards J, Kayo T, Prolla TA, Weindruch R, Permana PA, Mountz JD, Allison DB. Power and sample size estimation in high dimensional biology. Stat Methods Med Res. 2004;13:325–338. [Google Scholar]
  • 5.Jung SH. Sample size for FDR-control in microarray data analysis. Bioinformatics. 2005;21:3097–3104. doi: 10.1093/bioinformatics/bti456. [DOI] [PubMed] [Google Scholar]
  • 6.Dobbin K, Simon R. Sample size determination in microarray experiments for class comparison and prognostic classification. Biostatistics. 2005;6:27–38. doi: 10.1093/biostatistics/kxh015. [DOI] [PubMed] [Google Scholar]
  • 7.Li SS, Bigler J, Lampe JW, Potter JD, Feng Z. FDR-controlling testing procedures and sample size determination for microarrays. Stat Med. 2005;24:2267–2280. doi: 10.1002/sim.2119. [DOI] [PubMed] [Google Scholar]
  • 8.Hu J, Zou F, Wright FA. Practical FDR-based sample size calculations in microarray experiments. Bioinformatics. 2005;21:3264–3272. doi: 10.1093/bioinformatics/bti519. [DOI] [PubMed] [Google Scholar]
  • 9.Pounds S, Cheng C. Sample size determination for the false discovery rate. Bioinformatics. 2005;21:4263–4271. doi: 10.1093/bioinformatics/bti699. [DOI] [PubMed] [Google Scholar]
  • 10.Tsai CA, Wang SJ, Chen DT, Chen JJ. Sample size for gene expression microarray experiments. Bioinformatics. 2005;21:1502–1508. doi: 10.1093/bioinformatics/bti162. [DOI] [PubMed] [Google Scholar]
  • 11.Ferreira JA, Zwinderman A. Approximate sample size calculations with microarray data: An illustration. Stat Appl Genet Mol Biol. 2006;5 doi: 10.2202/1544-6115.1227. Article 25. [DOI] [PubMed] [Google Scholar]
  • 12.Liu P, Hwang JTG. Quick calculation for sample size while controlling false discovery rate with application to microarray analysis. Bioinformatics. 2007;23:739–746. doi: 10.1093/bioinformatics/btl664. [DOI] [PubMed] [Google Scholar]
  • 13.Tibshirani R. A simple method for assessing sample sizes in microarray experiments. BMC Bioinformatics. 2006;7:106. doi: 10.1186/1471-2105-7-106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Jung SH. Sample size calculation for multiple testing in microarray data analysis. Biostatistics. 2005;6:157–169. doi: 10.1093/biostatistics/kxh026. [DOI] [PubMed] [Google Scholar]
  • 15.Storey JD. The positive false discovery rate: a Bayesian interpretation and the q-value. Annals of Statistics. 2003;31:2013–2035. [Google Scholar]
  • 16.Fernando RL, Nettledon D, Southey BR, Dekkers JC, Rothschild MF, Soller M. Controlling the proportion of false positives in multiple dependent tests. Genetics. 2004;166:611–619. doi: 10.1534/genetics.166.1.611. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Bickel DR. Error-rate and decision-theoretic methods of multiple testing: Which genes have high objective probabilities of differential expression? Stat Appl Genet Mol Biol. 2004;3 doi: 10.2202/1544-6115.1043. Article 8. [DOI] [PubMed] [Google Scholar]
  • 18.Storey JD, Taylor JE, Siegmund D. Strong control, conservative point estimation, and simultaneous conservative consistency of false discovery rate: a unified approach. J R Stat Soc Ser B. 2004;66:187–205. [Google Scholar]
  • 19.Zhang CH, Tang W. TR 2005-004. Dept. of Stat., Rutgers University; Bayes and empirical Bayes approaches to controlling the false discovery rate. [Google Scholar]
  • 20.Langaas M, Lindqvist BH, Ferkingstad E. Estimating the proportion of true null hypotheses, with application to DNA microarray data. J R Stat Soc Ser B. 2005;67:555–572. [Google Scholar]
  • 21.Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci USA. 1999;96:6745–6750. doi: 10.1073/pnas.96.12.6745. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Huang X, Pan W. Comparing three methods for variance estimation with duplicated high density oligonucleotide arrays. Funct Integr Genomics. 2002;2:126–133. doi: 10.1007/s10142-002-0066-2. [DOI] [PubMed] [Google Scholar]
  • 23.Storey JD. A direct approach to false discovery rate. J R Stat Soc Ser B. 2002;64:479–498. [Google Scholar]
  • 24.Ge Y, Dudoit S, Speed TP. Resampling-based multiple testing for microarray data analysis. Test. 2003;12:1–77. [Google Scholar]
  • 25.Hung HM, O’nell RT, Bauer P, Köhne K. The behavior of the P-value when the alternative hypothesis is true. Biometrics. 1997;53:11–22. [PubMed] [Google Scholar]
  • 26.Sackrowitz H, Samuel-Cahn E. P values as random variables – Expected P values. The American Statistician. 1999;53:326–331. [Google Scholar]

RESOURCES