Skip to main content
Genes logoLink to Genes
. 2024 Mar 7;15(3):344. doi: 10.3390/genes15030344

Computing Power and Sample Size for the False Discovery Rate in Multiple Applications

Yonghui Ni 1, Anna Eames Seffernick 1, Arzu Onar-Thomas 1, Stanley B Pounds 1,*
Editor: Stefano Lonardi1
PMCID: PMC10970028  PMID: 38540403

Abstract

The false discovery rate (FDR) is a widely used metric of statistical significance for genomic data analyses that involve multiple hypothesis testing. Power and sample size considerations are important in planning studies that perform these types of genomic data analyses. Here, we propose a three-rectangle approximation of a p-value histogram to derive a formula to compute the statistical power and sample size for analyses that involve the FDR. We also introduce the R package FDRsamplesize2, which incorporates these and other power calculation formulas to compute power for a broad variety of studies not covered by other FDR power calculation software. A few illustrative examples are provided. The FDRsamplesize2 package is available on CRAN.

Keywords: false discovery rate, power, sample size, multiple testing, proportion of true null hypotheses

1. Introduction

The false discovery rate (FDR) is now a widely used metric of statistical significance for exploratory data analyses that perform a very large number of hypothesis tests. Benjamini and Hochberg introduced the FDR and proposed a p-value adjustment procedure to control it under certain conditions [1]. Subsequently, many procedures to estimate or control the FDR and similar metrics of statistical significance have been developed [2]. Most of these later procedures improve power by incorporating an estimate π^0 of the proportion π0 of hypothesis tests that have a true null hypothesis. Others compute π^0 as the height of the rightmost bin of a histogram of all p-values [3,4,5]; Pounds and Cheng compute π^0 as the minimum of 1 or twice the mean of all p-values [6]; others have elaborate methods to compute π^0. These FDR methods have found widespread application in the analysis of genomic data in exploratory analyses that test the association of each of many genomic features with a phenotype or other biological condition of interest.

The widespread use of these methods has fueled the development of several methods to compute power and sample size for studies that use the FDR to determine statistical significance. Most of these methods compute power and sample size for two-group comparisons. Jung proposed a particularly elegant method based on a simple equation involving the desired FDR τ, the desired average power 1β of tests with a false null [7], the proportion π0 of tests with a true null, and the unadjusted p-value threshold α used to declare statistical significance [8]. Jung solves this equation for α and then finds the sample size n to provide 1β average power to the tests with a false null hypothesis and provides specific details for using a t-test to compare the means of many variables across two groups [7].

Pounds and Cheng expand on Jung’s [7] approach by considering that, in practice, the estimate π^0 is used instead of π0 itself to determine statistical significance [9]. The estimate π^0 is a function of the data and thus is dependent on sample size. Pounds and Cheng derive a formula for this relationship, incorporate it into Jung’s equation [7], and then develop an iterative procedure that incorporates this consideration into sample size calculations [9]. They also provide software to perform these calculations, using one-way ANOVA to compare the means of many variables across three or more groups. This iterative procedure can be computationally expensive. In some cases, the iterative procedure yields the same sample size calculation as Jung’s equation because π^0 converges to π0 as the sample size increases.

Here, we propose a p-value histogram framework to develop a simplified approach to consider the estimation of π0 in sample size calculations. We incorporate this approach into the FDRsamplesize2 R software package (version: 0.2.0) for power and sample size calculations. The FDRsamplesize2 package computes power and sample size for several types of scientific questions that are not covered by other packages. The FDRsamplesize2 package can perform these calculations in settings where all hypotheses are evaluated with one of the following tests: two-proportions z-test, a two-group comparison of Poisson data, comparison of negative binomial data, one-sample or two-sample t-test, t-test for non-zero correlation, Fisher’s exact test, signed rank test, sign test, rank sum test, ANOVA, and single-predictor Cox regression. Our software also computes power and sample size using Jung’s formula [7] and our p-value histogram formulas. It also computes power for using the Benjamini and Hochberg p-value adjustment method [1] (which does not estimate π^0 for FDR control).

2. Background

Consider a setting that involves performing a very large number of hypothesis tests. A proportion π0 of these tests have a true null hypothesis, and thus, the remaining proportion 1π0 of tests have a false null hypothesis. We want to design a study so that we have adequate statistical power to declare significance for 1β of the 1π0 of tests with a false null hypothesis (average power of 1β across tests with a false null hypothesis) while controlling the FDR at a fixed, pre-specified level τ.

For this problem, Jung noted that:

τ=π0απ0α+1π01β (1)

accurately approximates the relationship among the FDR τ, the proportion π0 of tests with a true null hypothesis, the (unadjusted) p-value threshold α used to declare statistical significance, and the average power 1β of all tests with a false null hypothesis at the α level [7]. Solving for α yields a fixed p-value threshold:

α=τ1π01βπ01τ (2)

that achieves the desired average power 1β and desired FDR control τ, given the proportion π0 of tests with a true null hypothesis. One can then use the power calculation formula for the statistical method of interest to compute the sample size n necessary for tests with a false null hypothesis to have an average power 1β at the α level.

Pounds and Cheng noted that in practice:

τ^=π^0απ0α+1π01β (3)

is used to determine statistical significance, where all observed p-values are used to compute an estimate π^0 of π0 as well as local FDR estimates τ^ as a function of the p-value threshold α [9]. The estimate π^0 appears only in the numerator of this equation because the FDR adjustment procedures only use π^0 in the numerator. In FDR adjustment procedures, the empirical distribution function (EDF) of all p-values is used to estimate the denominator; thus, π^0 is not needed in the denominator. In sample size calculations, the denominator is the average cumulative distribution function (CDF) of all p-values, which is a function of the conjectured value π0. The denominator is the average CDF of all tests, mathematically represented with separate terms for the π0 of tests with a true null and the 1π0 tests with a false null to illuminate the relationship between FDR and average power 1β. Recall that the statistical power of a test at the α level is the CDF of the p-value for that test. Therefore, π^0 and τ^ themselves also depend on the average power 1β of the tests with a false null hypothesis, and thus, the sample size. Pounds and Cheng derive the formula for the expected value Eπ^0|n,θ of π^0 as a function of the sample size n and parameter vector θ (effect sizes of all tests), and then develop an iterative procedure that incorporates this consideration into the sample size calculations [9]. This iterative procedure adjusts for the fact that data-based estimates are used in practice. In some cases, this procedure obtains larger sample sizes than Jung’s formula because Eπ^0|n,θπ0 for most methods to compute π^0. In other cases, this procedure obtains the same sample size as Jung’s formula because Eπ^0|n,θπ0 as n. Nevertheless, the iterative procedure is computationally burdensome. Thus, a more straightforward and less computationally complex approximation can be useful.

Following [7], let us consider the approximate relationship:

τ˜=π˜0απ0α+1π01β      (4)

where π˜0 is used to compute τ˜. For example, π˜0=π0 in Jung’s equation above. Also, the Benjamini and Hochberg method operationally uses π˜0=1 to obtain τ˜ [1]. Using π˜0=1 in the equation above and solving for α yields:

αBH=τ˜1π01β1τ˜π0 (5)

as the p-value threshold determined by using the Benjamini and Hochberg procedure to control the FDR at τ in a setting with 1β average power of tests with a false null hypothesis [1]. One can then use the power formula for the tests with false null hypotheses to determine the sample size needed to achieve an average power of 1β at the αBH level.

Histograms of p-values are a useful component of several FDR control procedures. In the q-value procedure, Storey estimates π0 by the height of a histogram bar for p-values between 0.95 and 1.00 [3]. Nettleton et al. and Pounds et al. also use histograms to compute estimates of π0 [4,5]. This suggests that a histogram approximation to the p-value distribution may help in developing a useful framework for computing power and sample size for controlling the FDR in applications involving many hypothesis tests.

We now derive other estimators of π˜0 by considering a three-rectangle histogram approximation of the distribution of p-values obtained by performing many hypothesis tests (Figure 1). A proportion π0 of tests have a true null hypothesis. There is an average power 1β to declare the other 1π0 tests significant at the threshold α. The π0 tests with a true null have uniformly distributed p-values by the probability integral transform. For the 1π0 tests with a false null, a proportion 1β yield p-values less than α and a proportion β yield p-values greater than α. This distribution can be approximated by three rectangles. The first is a rectangle with a height π0 and base extending from p=0 to p=1; this rectangle is the uniform distribution for the proportion π0 of p-values with a true null hypothesis (shown in gray in Figure 1). The second is a rectangle with a base extending from p=0 to p=α and a total area 1π01β; this rectangle represents the 1β subset of the proportion 1π0 of p-values with a false null that are less than α (shown in gold in Figure 1). The height of the second rectangle is 1π01β/α and its center (mean) is at p=α/2. The third rectangle has a base extending from p=α to p=1 with a total area 1π0β; this rectangle represents those tests with a false null hypothesis that fail to be significant at the α level (i.e., a Type II error or what some may call a false negative result). It has a height 1π0β/1α and center (mean) at p=1+α/2 (shown in cyan in Figure 1).

Figure 1.

Figure 1

A three-rectangle approximation to the p-value histogram. The gray rectangle at the bottom represents the uniform distribution of p-values with a true null hypothesis. The gold rectangle at the top left represents the significant p-values with false null hypotheses that are less than α. The cyan rectangle at the top right represents the non-significant p-values with false null hypotheses that are greater than or equal α.

This three-rectangle approximation yields:

π˜hh=π0+1π0β1α (6)

for FDR methods such as those of [3,4,5] that estimate π0 by the height of a p-value histogram (histogram height; HH) near p=1. It also yields:

π˜hm=π0+1π0α+β (7)

for FDR methods such as that of [6] that estimate π0 by 2p¯; that is, twice the mean of all p-values (histogram mean; HM). Substituting π˜hh and π˜hm into the formula for τ˜ yields simple second-order polynomials in α, with a solution provided by the quadratic formula, as described in Appendix A. These formulas for α account for the dependency of the computed FDR values on the power of the tests with false null hypotheses without the need for the computational burden of Pounds’ and Cheng’s iterative procedure [9].

Thus, with the new framework, we have derived a simple mathematical formula for the p-value thresholds αBH, αJung, αhh, and αhm for the Benjamini and Hochberg procedure [1], Jung’s formula [7], methods that use the p-value histogram height (HH) at p=1 to estimate π0, as in Equation (6), and methods that use the p-value histogram mean (HM) to estimate π0, as in Equation (7), respectively. These four formulas can yield different values of α (Table 1) and thus lead to different sample size estimates. Some settings in the table may seem unusual in comparison to the 80% power and 5% level calculations often performed for testing a single hypothesis. However, some applications may warrant some of these other settings. For example, one may wish to identify 100 genes with a 50% FDR (to have 50 “real” hits for follow-up) in a comparison of the expression of 10,000 genes between samples treated and untreated with a drug that alters the expression of 100 genes. This would correspond to τ˜=0.50 (desired FDR), 1β=0.50 (power to detect 50 out of 100 genes), and π0=0.99 (9900 genes out of 10,000 unaltered by drug).

Table 1.

Example calculations of α.

π0 τ 1β αBH αhh αhm αJung
0.90 0.05 0.5 0.002618 0.002762 0.002762 0.002924
0.95 0.05 0.5 0.001312 0.001348 0.001348 0.001385
0.99 0.05 0.5 0.000263 0.000264 0.000264 0.000266
0.90 0.10 0.5 0.005495 0.005812 0.005810 0.006173
0.95 0.10 0.5 0.002762 0.002841 0.002840 0.002924
0.99 0.10 0.5 0.000555 0.000558 0.000558 0.000561
0.90 0.50 0.5 0.045455 0.049740 0.049510 0.055556
0.95 0.50 0.5 0.023810 0.024968 0.024938 0.026316
0.99 0.50 0.5 0.004950 0.005000 0.005000 0.005051
0.90 0.05 0.8 0.004188 0.004571 0.004569 0.004678
0.95 0.05 0.8 0.002100 0.002192 0.002192 0.002216
0.99 0.05 0.8 0.000421 0.000424 0.000424 0.000425
0.90 0.10 0.8 0.008791 0.009636 0.009627 0.009877
0.95 0.10 0.8 0.004420 0.004624 0.004623 0.004678
0.99 0.10 0.8 0.000888 0.000896 0.000896 0.000898
0.90 0.50 0.8 0.072727 0.084772 0.083619 0.088889
0.95 0.50 0.8 0.038095 0.041201 0.041063 0.042105
0.99 0.50 0.8 0.007921 0.008048 0.008047 0.008081

3. Supported Tests

The above section describes how to determine a threshold α for unadjusted p-values that yield the desired FDR τ˜ and average power 1β for a given proportion π0 of tests with a true null hypothesis. The sample size to yield the desired average power 1β at the p-value threshold α must still be determined using power calculation formulas for the particular statistical test of interest. Our R package performs these calculations for the following commonly applied statistical tests:

  • Two-sample t-test

  • One-sample t-test

  • Rank–sum test

  • Signed–rank test

  • Fisher’s exact test

  • t-test for correlation

  • Comparison of two Poisson distributions

  • Comparison of two negative binomial distributions

  • Two-proportions z-test

  • One-way ANOVA

  • Cox proportional hazards regression

4. Algorithmic Details

The FDRsampsize2 package uses the following general algorithm to determine the sample size:

  1. Given the desired FDR τ, the desired average power 1β, the proportion π0 of tests with a true null, and the desired option for π˜0 (BH, HH, HM, Jung), determine the p-value threshold α that achieves the desired FDR and average power.

  2. Compute the average power for each of the two initial sample sizes n0<n1 (default n0=3 and n1=6; other values can be specified by the user).

  3. If the average power for both of these initial sample sizes is greater than the desired average power, then report n0 and its average power as the result.

  4. If the average power for both of these initial sample sizes is less than desired, then define a new n0=n1 and n1=2n1 and compute the average power for these new n0 and n1. Repeat until the average power for n1 is greater than the desired average power 1β, and the average power for n0 is less than the desired average power.

  5. With n0 and n1 and their average powers as initial values, use bisection to determine the smallest n with average power greater than or equal to the desired average power 1β.

  6. To avoid excessive computing time, stop iterative calculations of steps 4 and 5 after average power has been computed a specified maximum number of times and report the results achieved thus far.

The algorithm reports the input parameters; the computed sample size (or number of events) n; the computed average power; the computed α; the number of iterations (number of times average power was computed in steps 4 and 5); the bisection bounds for the sample size n0 and n1; and the maximum number of iterations. If the desired average power was not achieved, the user may continue the calculations by starting at the achieved bounds for n0 and n1 or repeating the calculations with a greater maximum number of iterations.

5. Examples

5.1. Example 1: Study Involving Many Sign Tests

Suppose we are planning a study that will collect 10,000 Bernoulli (yes/no, positive/negative, etc.) variables for each subject. For each of those variables, we will use the sign test to evaluate the hypothesis that the probability θ of success is θ0 = 0.5 for each of those variables. We are interested in computing power and sample size for the setting in which 100 variables have a success probability of 0.8, and the remaining 9900 variables have a success probability of 0.5. In this setting, the null hypothesis H0: θ0 = 0.5 is true for 9900/10,000 (π0 = 0.99) of the variables. The package includes a function power.signtest that uses the formula of [10] to compute the power of the sign test. The power.signtest is plugged into the average.power.signtest function. The later function computes the average power achieved by performing all tests at a fixed p-value level α for a given sample size and θ. The find.sample.size function determines the sample size needed to achieve a certain average power while controlling the FDR at a specific level.

To determine the sample size needed to achieve the desired average power that controls the FDR at τ = 0.1, we can use the n.fdr.signtest function directly to obtain the sample size n, computed average power, fixed p-value threshold, number of iterations in the function output, and the bounds of the sample size and input parameters.

theta = rep(c(0.8,0.5), c(100,9900))     # 9900 null; 100 non-null
res = n.fdr.signtest(fdr = 0.1, pwr = 0.8, p = theta, pi0.hat = “BH”)
res
## $n
## [1] 45
##
## $computed.avepow
## [1] 0.8095842
##
## $desired.avepow
## [1] 0.8
##
## $desired.fdr
## [1] 0.1
##
## $input.pi0
## [1] 0.99
##
## $alpha
## [1] 0.0008879023
##
## $n0
## [1] 44
##
## $n1
## [1] 45
##
## $n.its
## [1] 8
##
## $max.its
## [1] 50

Thus, a sample size of 45 is necessary to have an average power 81%, while using the Benjamini and Hochberg procedure to control the FDR at 10% [1]. We also could obtain the sample size step-by-step; this will help users plug in the average power function of other statistical tests that are not available in FDRsamplesize2. As mentioned above, we first derive the p-value threshold α.

adj.p = alpha.power.fdr(fdr = 0.1, pwr = 0.8, pi0 = 0.99, method = “BH”)
adj.p
## [1] 0.0008879023

Then, we use the find.sample.size function to find the sample size and obtain the same output as from the n.fdr.signtest function above.

find.sample.size(alpha = adj.p, pwr = 0.8, avepow.func = average.power.signtest, p = theta)
## $n
## [1] 45
##
## $computed.avepow
## [1] 0.8095842
##
## $desired.avepow
## [1] 0.8
##
## $alpha
## [1] 0.0008879023
##
## $n.its
## [1] 8
##
## $max.its
## [1] 50
##
## $n0
## [1] 44
##
## $n1
## [1] 45

To compute power for a method that uses histogram height to estimate π0, we specify the option pi0.hat =HH” in n.fdr.signtest.

n.fdr.signtest(fdr = 0.1, pwr = 0.8, p = theta, pi0.hat = “HH”)
## $n
## [1] 45
##
## $computed.avepow
## [1] 0.810428
##
## $desired.avepow
## [1] 0.8
##
## $desired.fdr
## [1] 0.1
##
## $input.pi0
## [1] 0.99
##
## $alpha
## [1] 0.0008958549
##
## $n0
## [1] 44
##
## $n1
## [1] 45
##
## $n.its
## [1] 8
##
## $max.its
## [1] 50

These methods obtained the same sample size because πHH1 in this setting with πhhπ0=0.99. Let us try it again with π0=0.95:

theta = rep(c(0.8,0.5), c(500,9500))     # 9500 null; 500 non-null
n.fdr.signtest(fdr = 0.1, pwr = 0.8, p = theta, pi0.hat = “HH”)
## $n
## [1] 35
##
## $computed.avepow
## [1] 0.815116
##
## $desired.avepow
## [1] 0.8
##
## $desired.fdr
## [1] 0.1
##
## $input.pi0
## [1] 0.95
##
## $alpha
## [1] 0.004624029
##
## $n0
## [1] 34
##
## $n1
## [1] 35
##
## $n.its
## [1] 8
##
## $max.its
## [1] 50

In this case, we obtain the same sample size with the “BH” and “HH” options. That is not always the case. By studying Equations (6) and (7), we can see that the methods may obtain different results with smaller π0, greater power, and reduced tolerance for Type I errors. For example, setting θ = 0.55 for tests with a false null, θ = 0.50 for tests with a true null, π0=0.90, 1 − β = 0.99, and τ = 0.01, we obtain n = 3143 with the “BH” option and n = 3109 with the “HH” option for this problem.

5.2. Example 2: Design of a Clinical Trial to Find Prognostic Genes

The FDRsamplesize2 package can also determine the sample size necessary to identify genes with expression values that are associated with progression-free survival in the context of a pediatric brain tumor clinical trial. A single-predictor Cox regression model will be used to test the association of each gene’s expression with progression-free survival (PFS). The package defines the power.cox function that computes the power formula of Hsieh and Lavori for Cox regression modeling [11]. This function computes power as a function of the number of events (disease progression or death). We are interested in determining the number of events necessary to identify 80% of the genes truly associated with PFS while controlling the FDR at 10% in a setting in which 1% of the genes have a hazard ratio of 2 and a variance of 1. The R code for this calculation is shown below:

log.HR = log(rep(c(1,2),c(9900,100)))   # log hazard ratio for each gene
v = rep(1,10000)           # variance of each gene
res = n.fdr.coxph(fdr = 0.1, pwr = 0.8,
    logHR = log.HR, v = v, pi0.hat = “BH”)
res
## $n
## [1] 37
##
## $computed.avepow
## [1] 0.8139159
##
## $desired.avepow
## [1] 0.8
##
## $desired.fdr
## [1] 0.1
##
## $input.pi0
## [1] 0.99
##
## $alpha
## [1] 0.0008879023
##
## $n0
## [1] 36
##
## $n1
## [1] 37
##
## $n.its
## [1] 7
##
## $max.its
## [1] 50

This calculation shows that 37 events provided an average power of 81.4% when using the Benjamini and Hochberg procedure to control the FDR at 10% [1]. The calculations project that with this number of events, the analysis will choose a p-value threshold of 0.00089 and achieve the desired average power of 80%.

5.3. Example 3: Differential Expression Analysis of RNA-Seq Data

Hart et al. derived a formula to compute the power of using a negative binomial model for differential expression analysis of RNA-seq data across two groups when all tests are performed at a common p-value threshold α [12]. They used that formula to compute the power for performing all tests at the α = 0.01 level for unadjusted p-values. They did not incorporate that formula into the procedure of Jung [7] or Pounds and Cheng [9].

The formula of Hart et al. expresses power as a function of sample size, the log fold change of expression, the mean read depth per gene, and the coefficient of variation per gene [12]. The power.hart function computes this power formula. Below, we use this function for average power calculation in the n.fdr.negbin to compute the sample size needed to achieve the desired average power while controlling the FDR at 10% in the setting that 99% of genes have no effect (log fold-change of 0), 1% of genes have a 2.7-fold change (log-fold change = log(2.7) = 0.99). All genes have an average read depth of 5 and a coefficient of variation equal to 0.60.

theta = log(rep(c(1,2.7),c(9900,100)))   # log fold change for each gene
mu = rep(5,10000)         # read depth
sig = rep(0.6,10000)          # coefficient of variation
res = n.fdr.negbin(fdr = 0.1, pwr = 0.8, log.fc = theta,
      mu = mu, sig = sig, pi0.hat = “BH”)
res
## $n
## [1] 20
##
## $computed.avepow
## [1] 0.8087842
##
## $desired.avepow
## [1] 0.8
##
## $desired.fdr
## [1] 0.1
##
## $input.pi0
## [1] 0.99
##
## $alpha
## [1] 0.0008879023
##
## $n0
## [1] 19
##
## $n1
## [1] 20
##
## $n.its
## [1] 6
##
## $max.its
## [1] 50

In this setting, with a sample size of 20 subjects per group, using the Benjamini and Hochberg procedure chooses α = 0.00089 and provides an average power of 80.9% [1].

5.4. Example 4: Computing Power of t-Tests for a Given Sample Size and Effect Sizes

The FDRsamplesize2 package includes a function fdr.avepow that computes the FDR and average power of a set of tests as a function of α for a given sample size and effect sizes. For example, this function can compute the FDR and average power of a set of two-sample t-tests as a function of the p-value threshold α with n=10 per group in a setting with π0=0.95 and effect size δ=2 for the remaining tests.

delta = rep(c(0,2),c(95,5)) # 95% null; 5% with delta = 2

res=fdr.avepow(10,         # per-group sample size
       average.power.t.test, # average power function
       null.hypo = “delta == 0”, # null hypothesis
       delta = delta)     # effect size vector

head(res$res.tbl)
##        alpha   fdr     pwr
## [1,] 0.001 0.02800492 0.6951602
## [2,] 0.002 0.04869462 0.7834460
## [3,] 0.003 0.06773070 0.8288612
## [4,] 0.004 0.08567764 0.8577325
## [5,] 0.005 0.10276694 0.8780755
## [6,] 0.006 0.11912689 0.8933293

In this setting, a sample size of n=10 per group, using a fixed p-value threshold of α=0.005 yields an FDR of 10.3% and an average power of 87.8%.

6. Simulations

FDRsamplesize2 uses published power calculation formulas for specific tests; thus, its performance is dependent on the accuracy of those power calculation formulas at the p-value threshold α determined by solving the equation relating the average power, the FDR, and the p-value threshold α. We performed a series of simulations to evaluate the accuracy of the power and sample size calculations. The simulation script is provided as a Supplementary File (FDRsamplesize2-simulation-script.R). The results are provided in Supplementary Table S1 (FDRsamplesize2-simulation-results.xlsx). Generally, the average power, FDR, and p-value threshold averaged over simulation replications are very close to those computed from the formulas. The calculations yielded sample sizes providing the desired FDR and average power for the two-sample t-test, Wilcoxon rank–sum test, comparison of two Poisson variables, t-test for non-zero correlations, signed–rank test, Fisher’s exact test, and one-way ANOVA.

For the few tests for which FDRsamplesize2 yielded a sample size with inadequate FDR or average power, the power calculations were based on asymptotic normal approximations. This was the case for the two proportions z-test, the sign test, Cox regression, and the comparison of two negative binomial variables. The asymptotic normal approximations for these tests may not be sufficiently accurate for the very small p-value thresholds needed to maintain the desired FDR control. These results show that the accuracy of Jung’s equation relating the p-value threshold, average power, and FDR depends on the accuracy of the formula used to compute average power. Users should be aware of whether the underlying power calculation formulas are exact or approximate when using FDRsamplesize2. We designed the FDRsamplesize2 package so that users can easily incorporate improved power calculation formulas for specific tests as they become available. It may be advisable to double-check some sample size calculations with simulation. Still, FDRsamplesize2 provides a good starting point for using simulations to compute power and sample size.

7. Discussion

The FDR is a widely used metric of statistical significance in multiple-testing applications. Power and sample size calculations are an important consideration in planning studies. Formulas that relate the p-value threshold, FDR, average power, and the proportion π0 of tests with a true null hypothesis are useful for performing these calculations. In some applications, it is important to consider how average power impacts the estimate π^0 of π0 that is used to compute an FDR estimate τ^ in practice. We proposed a simple three-rectangle approximation to the p-value histogram to derive two new power calcualtion formulas that incorporate this consideration. We also developed the FDRsamplesize2 R package, which performs sample size calculations using these formulas and Jung’s formula [7]. Additionally, it computes power for using the Benjamini and Hochberg procedure, which does not compute an estimate π^0 of π0 but instead effectively uses π0=1 [1]. We incorporated power calculation formulas for several statistical tests not covered by other published FDR power calculation software. The FDRsamplesize2 package also conveniently includes some individual methods scattered across different software resources, such as the method of Hart et al. to compare many negative binomial variables (read-count) across two groups [12]. Also, FDRsamplesize2 provides a framework to compute the p-value threshold α needed for the desired average power and FDR, and then users can use traditional single-test power calculation methods to determine the sample size needed to achieve the desired average power at that α level. This provides additional flexibility and scope for users. Thus, FDRsamplesize2 should be a valuable resource to researchers as they develop grant proposals and plan their experiments.

In our simulations, FDRsamplesize2 provided very accurate sample size calculations for all tests that used exact power calculation formulas. In a few cases where asymptotic normal approximations were used to compute power, FDRsamplesize2 yielded sample sizes that did not satisfy pre-specified requirements for the FDR or average power. The inadequate power was observed for all four methods of using π0 or π^0: the Benjamini–Hochberg approach, Jung’s equation, and the two histogram-based approaches proposed here. Thus, we encourage users to use simulation to double-check calculations based on asymptotic normal approximations for power formulas. Still, even in these cases, FDRsamplesize2 calculations will save users time by providing useful guidance for a reasonable starting point for computationally intensive simulation-based power calculations. The Supplementary Simulation Script (FDRsamplesize2-simulation-script.R) also provides a useful template for performing those simulations.

There are other valuable resources to compute power and sample size for the FDR as well. The ssize.fdr package implements the method of Liu and Hwang for two-sample t-tests and some ANOVA tests [13,14]. The PROPER method was designed specifically for planning differential expression analyses of RNA-seq data [15]. Hu et al. developed a method using the expectation–maximization algorithm to compute power and sample size [16]. Shao and Tseng developed a method to carefully account for correlation among tests in computing power and sample size [17]. Pawitan et al. developed a method for performing many two-sample t-tests [18]. The scPower framework computes power for multi-sample single-cell transcriptomic experiments [19]. Ching et al. provided a simulation-based framework to compute power for RNA-seq differential expression analysis [20]. Jung et al. developed a method to compute sample size for a one-way blocked design [21]. Lee and Whitmore provided a framework to compute power for elaborate linear models [22]. Li et al. developed a bulk method for performing exact tests on RNA-seq data. All these methods are approximations that work well when a very large number (hundreds or more) of hypothesis tests are performed [23]. Glueck et al. developed a method to compute the exact power of the Benjamini and Hochberg procedure [1] for applications involving a relatively small number of hypothesis tests (such as evaluating multiple co-primary endpoints in a clinical trial) [24]. Numerous tools are now available for many specific study designs and scientific objectives in the literature. Certainly, there are still other applications for which power calculation methods need to be developed. These provide opportunities for those interested in developing power calculation methods for planning genomic studies.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/genes15030344/s1, Table S1: Results of Simulation Studies for desired power = 0.5 and desired FDR = 0.05.

genes-15-00344-s001.zip (18.3KB, zip)

Appendix A

This appendix provides details on solving the τ˜ equation for α with π˜0h (histogram height) and π˜0m (histogram mean).

  1. Histogram Height

With π˜0h=π0+1π0β1α (dropping the tilde notation for simplicity), we have:

τ=π0+1π0β/1ααπ0α+1π01β

which can be equivalently expressed as:

τπ0α+τ1π01β=π0+1π0β1αα

and then as:

τπ0α1α+τ1π01β1α=π0α1α+1π0βα

and finally, as:

1τπ0α2+τ1π01β+1τπ0+1π0βα+τ1β1π0=0

which is solved as:

α=b+b24ac2a

with:

a=1τπ0, b=τ1π01β+1τπ0+1π0β, and c=τ1β1π0.
  • 2.

    Histogram Mean

Similarly, with π˜0m=π0+1π0α+β, we obtain:

τ=π0+1π0α+βαπ0α+1π01β

which can be expressed as:

τπ0α+τ1π01β=π0α+1π0α+βα

and then as:

1π0α2+1τπ0+1π0βατ1π01β=0

which is solved as:

α=b+b24ac2a

with:

a=1π0, b=1τπ0+1π0β and c=τ1π01β.

Author Contributions

Conceptualization, A.O.-T. and S.B.P.; Software, Y.N. and S.B.P.; Simulation Studies, Y.N., A.E.S. and S.B.P.; Writing—original draft, A.O.-T. and S.B.P.; Writing—review & editing, Y.N., A.E.S. and S.B.P. All authors have read and agreed to the published version of the manuscript.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article and Supplementary Materials.

Conflicts of Interest

The authors declare no conflict of interest.

Funding Statement

This research was funded by American Lebanese Syrian Associated Charities (ALSAC).

Footnotes

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

References

  • 1.Benjamini Y., Hochberg Y. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. J. R. Stat. Soc. Ser. B. 1995;57:289–300. doi: 10.1111/j.2517-6161.1995.tb02031.x. [DOI] [Google Scholar]
  • 2.Storey J.D. False Discovery Rate. Int. Encycl. Stat. Sci. 2011;1:504–508. [Google Scholar]
  • 3.Storey J.D. A Direct Approach to False Discovery Rates. J. R. Stat. Soc. Ser. B Stat. Methodol. 2002;64:479–498. doi: 10.1111/1467-9868.00346. [DOI] [Google Scholar]
  • 4.Nettleton D., Hwang J.G., Caldo R.A., Wise R.P. Estimating the Number of True Null Hypotheses from a Histogram of p Values. J. Agric. Biol. Environ. Stat. 2006;11:337–356. doi: 10.1198/108571106X129135. [DOI] [Google Scholar]
  • 5.Pounds S.B., Gao C.L., Zhang H. Empirical Bayesian Selection of Hypothesis Testing Procedures for Analysis of Sequence Count Expression Data. Stat. Appl. Genet. Mol. Biol. 2012;11 doi: 10.1515/1544-6115.1773. [DOI] [PubMed] [Google Scholar]
  • 6.Pounds S., Cheng C. Robust Estimation of the False Discovery Rate. Bioinformatics. 2006;22:1979–1987. doi: 10.1093/bioinformatics/btl328. [DOI] [PubMed] [Google Scholar]
  • 7.Jung S.-H. Sample Size for FDR-Control in Microarray Data Analysis. Bioinformatics. 2005;21:3097–3104. doi: 10.1093/bioinformatics/bti456. [DOI] [PubMed] [Google Scholar]
  • 8.Gadbury G.L., Page G.P., Edwards J., Kayo T., Prolla T.A., Weindruch R., Permana P.A., Mountz J.D., Allison D.B. Power and Sample Size Estimation in High Dimensional Biology. Stat. Methods Med. Res. 2004;13:325–338. doi: 10.1191/0962280204sm369ra. [DOI] [Google Scholar]
  • 9.Pounds S., Cheng C. Sample Size Determination for the False Discovery Rate. Bioinformatics. 2005;21:4263–4271. doi: 10.1093/bioinformatics/bti699. [DOI] [PubMed] [Google Scholar]
  • 10.Noether G.E. Sample Size Determination for Some Common Nonparametric Tests. J. Am. Stat. Assoc. 1987;82:645–647. doi: 10.1080/01621459.1987.10478478. [DOI] [Google Scholar]
  • 11.Hsieh F.Y., Lavori P.W. Sample-Size Calculations for the Cox Proportional Hazards Regression Model with Nonbinary Covariates. Control. Clin. Trials. 2000;21:552–560. doi: 10.1016/S0197-2456(00)00104-5. [DOI] [PubMed] [Google Scholar]
  • 12.Hart S.N., Therneau T.M., Zhang Y., Poland G.A., Kocher J.P. Calculating Sample Size Estimates for RNA Sequencing Data. J. Comput. Biol. 2013;20:970–978. doi: 10.1089/cmb.2012.0283. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Liu P., Hwang J.G. Quick Calculation for Sample Size While Controlling False Discovery Rate with Application to Microarray Analysis. Bioinformatics. 2007;23:739–746. doi: 10.1093/bioinformatics/btl664. [DOI] [PubMed] [Google Scholar]
  • 14.Orr M., Liu P. Sample Size Estimation While Controlling False Discovery Rate for Microarray Experiments Using the Ssize. Fdr Package. R J. 2009;1:47. doi: 10.32614/RJ-2009-019. [DOI] [Google Scholar]
  • 15.Wu H., Wang C., Wu Z. PROPER: Comprehensive Power Evaluation for Differential Expression Using RNA-Seq. Bioinformatics. 2015;31:233–241. doi: 10.1093/bioinformatics/btu640. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Hu J., Zou F., Wright F.A. Practical FDR-Based Sample Size Calculations in Microarray Experiments. Bioinformatics. 2005;21:3264–3272. doi: 10.1093/bioinformatics/bti519. [DOI] [PubMed] [Google Scholar]
  • 17.Shao Y., Tseng C.H. Sample Size Calculation with Dependence Adjustment for FDR-Control in Microarray Studies. Stat. Med. 2007;26:4219–4237. doi: 10.1002/sim.2862. [DOI] [PubMed] [Google Scholar]
  • 18.Pawitan Y., Michiels S., Koscielny S., Gusnanto A., Ploner A. False Discovery Rate, Sensitivity and Sample Size for Microarray Studies. Bioinformatics. 2005;21:3017–3024. doi: 10.1093/bioinformatics/bti448. [DOI] [PubMed] [Google Scholar]
  • 19.Schmid K.T., Höllbacher B., Cruceanu C., Böttcher A., Lickert H., Binder E.B., Theis F.J., Heinig M. scPower Accelerates and Optimizes the Design of Multi-Sample Single Cell Transcriptomic Studies. Nat. Commun. 2021;12:6625. doi: 10.1038/s41467-021-26779-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Ching T., Huang S., Garmire L.X. Power Analysis and Sample Size Estimation for RNA-Seq Differential Expression. RNA. 2014;20:1684–1696. doi: 10.1261/rna.046011.114. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Jung S.-H., Sohn I., George S.L., Feng L., Leppert P.C. Sample Size Calculation for Microarray Experiments with Blocked One-Way Design. BMC Bioinform. 2009;10:164. doi: 10.1186/1471-2105-10-164. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Lee ML T., Whitmore G.A. Power and Sample Size for DNA Microarray Studies. Stat. Med. 2002;21:3543–3570. doi: 10.1002/sim.1335. [DOI] [PubMed] [Google Scholar]
  • 23.Li C.I., Su P.F., Shyr Y. Sample Size Calculation Based on Exact Test for Assessing Differential Expression Analysis in RNA-Seq Data. BMC Bioinform. 2013;14:357. doi: 10.1186/1471-2105-14-357. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Glueck D.H., Mandel J., Karimpour-Fard A., Hunter L., Muller K.E. Exact Calculations of Average Power for the Benjamini-Hochberg Procedure. Int. J. Biostat. 2008;4:11. doi: 10.2202/1557-4679.1103. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

genes-15-00344-s001.zip (18.3KB, zip)

Data Availability Statement

Data are contained within the article and Supplementary Materials.


Articles from Genes are provided here courtesy of Multidisciplinary Digital Publishing Institute (MDPI)

RESOURCES