Abstract
Exact analytic expressions are developed for the average power of the Benjamini and Hochberg false discovery control procedure. The result is based on explicit computation of the joint probability distribution of the total number of rejections and the number of false rejections, and expressed in terms of the cumulative distribution functions of the p-values of the hypotheses. An example of analytic evaluation of the average power is given. The result is confirmed by numerical experiments and applied to a meta-analysis of three clinical studies in mammography.
Keywords: hypothesis testing, multiple comparisons, false discovery, distribution of rejections, meta-analysis
1 Introduction
The Benjamini and Hochberg (1995) procedure (BH procedure) for simultaneous testing of multiple independent hypotheses aims to controls the False Discovery Rate, defined as the expected ratio of false rejections to the number of total rejections.
The power of a test of a single hypothesis is the probability of rejecting the null hypothesis. This rejection is correct when the alternative hypothesis is true, but is described as a Type I error when the null hypothesis is true. The average power (Benjamini and Liu, 1999) of a multiple hypothesis test is defined as the ratio of the expectation of the number of correct rejections (a random variable) and the number of hypotheses for which the alternative holds (assumed to be known).
In this paper, we provide exact analytic results for the average power for the BH procedure. The results presented here are based on analytic formulas for the joint distribution of the number of total rejections and the number of false rejections, given the number of the tests where the null hypothesis holds.
In Glueck et al. (2008b), the average power of the BH procedure was studied by decomposing the rejection region into a large number of elementary regions, given by simple inequalities, and integrating probability densities over the elementary regions. In this paper, a much more efficient method is developed, based on cumulative distribution functions rather than integrals. The results of this paper were made possible by recent work on computing the joint cumulative distribution function of order statistics of samples from two populations (Glueck et al., 2008a).
Many papers describe extensions of the power theory for false discovery rate, or related statistics, e.g., (Curran-Everett, 2000; Benjamini and Liu, 1999; Benjamini and Yekutieli, 2001; Efron et al., 2001; Genovese and Wasserman, 2002, 2004; Storey, 2002; Finner and Roters, 2002; Sarkar, 2002, 2004, 2006). Power has been studied almost entirely via simulation (Benjamini and Liu, 1999; Storey, 2002; Keselman et al., 2002; Lee and Whitmore, 2002). Jung (2005) derived analytic power results for the Storey procedure. Ferreira and Zwinderman (2006) used asymptotic methods for an adaptive version of the BH procedure. In Section 2, we review the BH procedure and find its exact average power. In Section 3, we give explicit formulae for a simple example with three hypotheses. In Section 4, we confirm the accuracy of the result by comparison to simulations. In Section 5, we give an application to a meta-analysis of clinical trials in mammography. Section 6 is the discussion and conclusion.
2 Exact power analysis
2.1 The BH procedure
Given α* ∈ [0, 1], null hypotheses H0i, i = 1, 2, …, m, with independent but not necessarily identically distributed p-values Pi, and corresponding order statistics P(i) (the p-values Pi sorted in non-decreasing order), the BH procedure produces a nondecreasing sequence of rejection bounds bi = iα*/m ∈ (0, 1), rejects the k null hypotheses H0(i), i = 1, 2, … , k, k ≤ m, such that k is the largest number for which P(k) ≤ bk, and does not reject all other hypotheses. Given that the null hypothesis H0i is true, the p-value Pi is defined as the probability of obtaining a realization of the data which is as or more unlikely than the observed data.
2.2 Notation and assumptions
For the purposes of power analyses, we define alternative hypotheses H1i. Assume that the true state of nature is known, i.e., whether the null or the alternative hypothesis is true. Without loss of generality, suppose that H0i is true for i ≤ n, and that H1i is true for i > n. Let Fi be the cumulative distribution function of the p-value Pi. Assume that Fi is continuous.
Denote by K the number of null hypotheses rejected by the BH procedure. Because the data are stochastic, K is a random variable, and the rejection of exactly k out of m hypotheses is the event
| (1) |
When the null hypothesis is true, the rejection of the null is a false rejection. In the notation here, false rejection occurs when i ≤ n and Pi ≤ bk. Denote by J the number of false rejections and by G (k, j) the event The occurrence of exactly j false rejections out of a total of exactly k rejections is the event
| (2) |
The occurrence of exactly j false rejections out of a total of exactly k rejections is the event
| (3) |
2.3 Distribution of the number of total and false rejections
To compute the power, we need to find the probability of the events {K} = k ∩ {J = j}, that is, the joint probability distribution of the total number of rejections K and the number of false rejections J. From (1) and (3),
| (4) |
It is shown in Appendix A that (4) yields
| (5) |
where
| (6) |
denotes the joint cumulative distribution function of the order statistics, jointly with the event G (k, j).
To evaluate the distribution functions in (5), we use an extension of the method developed by Glueck et al. (2008a). Define n1 = k, so pn1 = bk and let pn0 = 0 and pne+1 = 1. Define the index vector i = (i0, i1, … ie+1), the summation index set
| (7) |
the index matrix μ = [μih], i = 1, … , m, h = 0, … , e, and the index set , consisting of all μ such that
| (8) |
| (9) |
| (10) |
| (11) |
It is proven in Appendix B that
| (12) |
Note that the sum over in (12) can be written as
| (13) |
which is readily implemented as a collection of loops. The sum in (12) could be also written as sum over a subset of permutations of the set {1, … , m}, much like the sum over all permutations can be written as the sum over all square zero-one matrices with unit row and column sums.
The sum (12) is in fact a selection of certain terms in the expression of the joint distribution function of order statistics as a sum of permanents, following Bapat and Beg (1989). However, the present form appears to be simpler and thus easier to program. It avoids the repetition of terms that occur in the formulation with permutations.
2.4 Average power of the BH procedure
The following formulation provides a computable expression for average power.
Theorem 2.1
For m ≠ n, i.e. when the null is not true for every hypothesis, the average power of the BH procedure is
| (14) |
where Pr ({K = k} ∩ {J = j}) is given by (5) with (12).
When m = n, all the hypotheses are null, and the average power is not defined.
It should be noted that the sum over j from 0 to n in (14) can be replaced by the sum from max {0, k – (m – n)} to min {n, k} as the other terms are always zero.
3 Example
Consider the computation of average power for the BH procedure with three hypotheses, so m = 3. Suppose that one null and two alternative hypotheses are true, and n = 1. From (14), the expected power is
| (15) |
where, from expansion (5),
| (16) |
| (17) |
| (18) |
| (19) |
| (20) |
We now need to evaluate F1, F2, and F3. The cumulative distribution functions of the p-values depend on the choice of hypothesis tests. Here, we test each hypothesis with a two sided one sample z test. Suppose that each test decides between the same null hypothesis H0 : μ = μ0, and the same alternative hypotheses H1 : μ = μ1. For convenience, assume that the sample size for each hypothesis test is the same, N. For θ = 1, 2, … , N, suppose εiθ is an observed data point for hypothesis test i. If
| (21) |
the test statistics are given by
| (22) |
and the two sided p-values are (Rosner, 2006, p. 244)
| (23) |
Then
and
| (24) |
where Φ is the cumulative distribution function for the standard normal distribution. Equation (24) is a special case of the result given in Ruppert et al. (2007, Equation (5)).
4 Computational results
We numerically examine our exact method by comparison to a simulation. We conducted m independent two sided one sample z tests. As in Section 3, each test decided between the same null hypothesis H0 : μ = μ0, and the same alternative hypotheses H1 : μ = μ1. The null hypothesis holds for the first n hypothesis tests, and the alternative hypothesis holds for the remaining m – n hypothesis tests. For each hypothesis test, we simulated N random variables, formed the sample mean as in (22), and calculated the p-values as in (23). Using the sorted p-values, the BH procedure gave the number of rejections. The process was repeated 100, 000 times, and gave empirical estimates of the joint probability distribution of K and J. A subset of the results are shown in Table 1.
Table 1.
Comparison of calculations of Pr ({K = k} ∩ {J = j}) via simulation and theory. The sample size per hypothesis was N = 5, with α* = 0.05, m = 5, n = 3, μ0 = 0, μ1 = 1, and σ2 = 1
| k | j | Exact | Simulation |
|---|---|---|---|
| 0 | 0 | 0.37584 | 0.37718 |
| 1 | 0 | 0.36883 | 0.36838 |
| 1 | 0.00812 | 0.00788 | |
| 2 | 0 | 0.19646 | 0.19515 |
| 1 | 0.02476 | 0.02540 | |
| 2 | 0.00026 | 0.00027 | |
| 3 | 1 | 0.02297 | 0.02286 |
| 2 | 0.00116 | 0.00122 | |
| 4 | 2 | 0.00149 | 0.00157 |
| 3 | 0.00003 | 0.00001 | |
| 5 | 3 | 0.00005 | 0.00008 |
The largest difference between the exact and simulated values was < 0.0014. The half width of a 95% confidence interval for the true value is at most 2 [(0.5)2 /100, 000]0.5 = 0.003, showing that the difference was within the sampling error. This affirms the result in (4).
The exact average power was calculated from the exact joint distribution of K and J, using (14). The simulated average power was calculated similarly, but using the empirical values of the joint distribution of K and J. The results are shown in Table 2. The error in the simulated power shown in Table 2 is slightly larger than the error in the simulated probabilities in Table 1, as the error compounds over the sum of different values of k and j.
Table 2.
Comparison of calculations of average power via simulation and exact theory. The sample size per hypothesis was N = 5, with α* = 0.05, μ0 = 0, μ1 = 1, and σ2 = 1. The largest difference between the exact and simulated values was < 0.002
| m | n | Average Power | |
|---|---|---|---|
| Exact | Simulated | ||
| 2 | 0 | 0.56539 | 0.56371 |
| 2 | 1 | 0.50342 | 0.50455 |
| 3 | 0 | 0.54576 | 0.54466 |
| 3 | 1 | 0.49842 | 0.49813 |
| 3 | 2 | 0.44439 | 0.44557 |
| 4 | 0 | 0.53446 | 0.53352 |
| 4 | 1 | 0.49584 | 0.49494 |
| 4 | 2 | 0.45256 | 0.45322 |
| 4 | 3 | 0.40451 | 0.40684 |
| 5 | 0 | 0.52712 | 0.52650 |
| 5 | 1 | 0.49440 | 0.49619 |
| 5 | 2 | 0.45819 | 0.45931 |
| 5 | 3 | 0.41837 | 0.41717 |
| 5 | 4 | 0.37494 | 0.37439 |
Finally, we calculated and simulated average power for a range of values of μ1. The resulting average power curves are shown in Figure 1. Because the simulated values were so close to the exact values, the lines overlapped. For clarity, only the exact average power curves are shown.
Figure 1.
Average power for the Benjamini and Hochberg (1995) procedure and three independent two sided one sample z tests, with one true null hypothesis, so m = 3 and n = 1. The null hypothesis for each test is H0 : μ = μ0, and the alternative hypotheses for each test is H1 : μ = μ1. Here μ0 = 0, σ2 = 1, and μ1 was varied between 0 and 2. The false discovery rate is less than α*= 0.05. The average power is shown as a function of the sample size used for each hypothesis test, N. Average power is undefined when μ1 = 0, and all hypotheses are null. Thus, the average power curve is undefined at that point. Notice that the average power is smaller for N = 3, than for N = 5, or N = 7.
5 Application to mammography
In power analysis for single hypothesis studies, the goal is to calculate the power for a given sample size and design. This is quite different from data analysis, where we observe data, construct a test statistic, test a hypothesis, and draw conclusions. The BH procedure is often used to perform data analysis for studies with multiple comparisons.
A problem that arises often in the clinical trial literature is that many scientists conduct trials to test the same hypothesis. These trials are independent, in the sense that they have different subjects, different investigators, and occur in different geographical locations. Using the results of multiple independent trials to analyze scientific questions is meta-analysis.
Consider three clinical trials comparing full field digital and screen-film mammography (Lewin et al., 2002; Skaane et al., 2003; Pisano et al., 2005). The purpose of each of these trials is to contrast the relative diagnostic accuracy of digital and film mammography. The trials share a similar design. Each woman had mammographic examinations with both digital and film machines. Although the trials were not designed to measure the effect of conducting two mammograms on one woman, in each one of the trials, more cancers were detected than would have been by either mammographic modality alone. This suggests that full-field digital mammography and screen film mammography could be used together to improve breast cancer detection.
In order to examine this question, Glueck et al. (2007) conducted a re-analysis of the trial performed by Lewin et al. (2002). A combined method was the test that declared a woman to have cancer if the cancer was found by either (or both) film and digital mammography. The combined method resulted in the detection of significantly more cancers than the film mammographic method alone (Glueck et al., 2007).
This is an interesting result, but it immediately raises the question as to whether the same result holds true for the other two trials with similar designs, namely those of Skaane et al. (2003) and Pisano et al. (2005). One way to decide whether the combined mammographic method is better than the film mammographic method alone is to test this hypothesis in all three studies, and to make a decision based on the result of all three tests.
Benjamini and Hochberg (1997) discuss two possible ways to draw inference from multiple hypothesis tests. One possibility is to use an intersection hypothesis test, in which one considers a grand null which is true only if the null hypothesis is true for every component test. Another option is to use the multiple hypothesis test, where the goal is to provide multiple inferences about the same hypothesis, while controlling the error rate.
Why is control of the false discovery rate important? With classical, uncorrected single hypothesis testing methods, and three independent hypothesis tests, each with a Type I error rate of 0.05, the probability of making at least one Type I error when all three null hypotheses hold is 1 – (0.953) ≈ 0.14. Thus, even for only three hypotheses, the failure to correct for multiple comparisons sharply raises the probability of making at least one Type I error. The probability of making at least one Type I error is the family-wise error rate, Pr (J/m > 0). While the false discovery rate is less than or equal to the family-wise error rate, controlling the false discovery rate does provide some error control.
For this proposed meta-analysis, we use the BH procedure to control the false discovery rate for the multiple hypothesis approach. Recall that the BH procedure guarantees that the false discovery rate, the expected value of the ratio of incorrect rejections J to the total rejections K, is bounded, so thatE (J/K) < α*. We calculate the average power using the methods introduced in this paper.
To calculate the expected power for the BH procedure using our method, we need to specify m, the number of hypotheses being tested, the null hypothesis for each test, the alternative hypothesis for each test, the distribution of the p-value under the null and the alternatives, and n, the number of hypotheses for which the null holds. In the paragraphs that follow, we define or derive each one of the inputs for the power analysis.
In each study, cancer was diagnosed by the film mammographic method, or by the combined mammographic method. This gave a data table for each study, organized as in Table 3. By the definition of the combined method, the film method can never detect cancers that the combined method does not, so ni,10 = 0. The corresponding population probabilities are given in Table 4. In a typical power analysis, although the sample size would be assumed to be known, the data would not yet be observed.
Table 3.
Observed number of cancers detected by each mammographic technique for study i. Combined is the combined mammographic method, while film is the film mammographic method. Ni is the total number of cancers detected in study i. Here 0 indicates that the method missed the cancer, while 1 indicates that the method found the cancer. Thus ni,00 is the number of cancers missed by both methods. Also, ni,0. = ni,00 + ni,01, ni,1. = ni,10 + ni,11, ni,.0 = ni,00 + ni,10, and ni,.1 = ni,01 + ni,11
|
Table 4.
Population probabilities for study i. Combined is the combined mammographic method, while film is the film mammographic method. πi,00 is the probability that both methods missed the cancer
|
The null hypotheses i = 1, 2, 3, for the studies of Lewin et al. (2002), Skaane et al. (2003) and Pisano et al. (2005) respectively, are H0i : πi,10 – πi,01 = 0. The alternatives are Hi1 : πi,10 – πi,01 = δ ≠ 0. The value of δ is a parameter will vary as part of the power analysis.
From previous studies that looked at combining information from two mammographic methods (see, e.g., Sickles et al., 1986; Anttinen et al., 1993; Thurfjell et al., 1994; Ciatto et al., 2003) it is reasonable to believe that the combined test is always better than the film mammographic method, so we set n = 0 for the power analysis: we expect that none of the null hypotheses will be true.
We use an unconditional McNemar’s test (McNemar, 1947), a common test for paired binary outcome data. There has been great controversy in the statistical literature as to whether to use the conditional or unconditional test (Selicato and Muller, 1998). We use the unconditional test for this application because it is reasonable to believe that neither margin is fixed.
There are many equivalent transformations of the test statistic for the unconditional test. Depending on the form of the statistic chosen, the distribution of the test statistic under the null and the alternative differs. As suggested by Selicato and Muller (1998), we choose McNemar’s test statistic computed as an F statistic,
| (25) |
where
| (26) |
and
| (27) |
When conducting a single test, and the null hypothesis is true, Fi has approximately a central F distribution, with degrees of freedom 1 and Ni – 1. The distribution is approximate, because the tables, and hence the test statistic are discrete. The accuracy of the approximation is adequate, as demonstrated by Selicato and Muller (1998). When the alternative hypothesis is true, Fi is approximately distributed as the non-central F distribution with the same degrees of freedom, and noncentrality parameter Niωi, where
| (28) |
Thus, the p-value (a random variable) for study i is given by
| (29) |
where FF (x; 1, Ni – 1) is the cumulative distribution function of the central F distribution with degrees of freedom 1 and Ni – 1, evaluated at a point x. The cumulative distribution function of the p-value under the alternative for study i is approximated by
which is again a special case of the result of Ruppert et al. (2007, Equation (5)). If the null hypothesis is true for study i, ωi = 0, Fi (p) = p, and the p-value Pi has approximately a uniform [0, 1] distribution.
We choose the bound on the false discovery rate α* = 0.05. The number of cancers observed in the Lewin et al. (2002), Skaane et al. (2003) and Pisano et al. (2005) trials were N1 = 49, N2 = 31 and N3 = 335, respectively. Let Fi = FPi (p; Ni, π10, π01). We can obtain the average power using (14). While (14) gives exact average power, the results for this power analysis will be approximate, since the distribution of the test statistic given in Selicato and Muller (1998) is approximate. When δ = 0.2, the average power is approximately 0.9. When δ = 0.3, the average power is approximately .98. Average power was calculated for those values of δ because the observed difference in probability of cancer detection between the combined mammographic method and the film mammographic method demonstrated in Glueck et al. (2007) was close to 0.2.
The interpretation of average power for the BH procedure is somewhat different from the interpretation of power for a single hypothesis test. For a single hypothesis test, power is the probability of rejecting the null, for a given design, sample size, and alternative. For the BH procedure, average power is the fraction of hypothesis tests for which the alternative is true that are rejected. When no nulls are true, as in this example, average power is simply the fraction of hypotheses that should be rejected that are rejected. As is true for power analysis for single hypotheses, the decision to choose a goal average power for the study depends more on scientific considerations than on statistical ones. In this case, the average power seems adequate to detect a clinically important difference.
6 Conclusions
The method we have proposed in this paper produces exact calculations of average power. We have demonstrated its utility for an example with one sample z tests, and for a meta-analysis using McNemar’s test. The method will provide results no matter the distribution of the test statistic, and even for studies which use different statistics to test each hypothesis. It should prove useful for calculations of power for study designs where accuracy is important, such as clinical trials and their meta-analyses.
One major drawback of our technique is the computational complexity. The complexity depends on the number of hypotheses being considered. Equation (5) has complexity of order 2m–k, and the evaluation of the joint probability in (12) involves a number of computations of order m!. Thus, our results are easily computable only for the small number of hypotheses likely to be seen in clinical trials. Genomic problems are beyond our reach, with the exact methods described here.
We believe that our methods are important for many reasons. Our formula for average power is an exact result for any number of hypotheses, and is computable for situations with small numbers of hypotheses. Exact results provide a sound theoretical basis for deriving high accuracy, easily computable approximations. Not only do our results suggest approximations, but they will allow testing the approximations by comparison to exact results. In addition, understanding average power for the BH procedure immediately opens windows to understanding average power for other multiple comparisons procedures. Exact power calculations will allow the comparative study of the sensitivity and specificity of multiple comparison procedures.
A Proof of (5)
From (4),
| (30) |
We will change the direction of the inequalities in (4) for k + 1, … , m by passing to the set complement. Using standard set operations and the fact that the cumulative distribution functions are continuous, we have from (4),
| (31) |
Expand the last term as
| (32) |
by adding the probabilities of the sets in the union, subtracting probabilities of the intersections of two sets, which were counted twice, adding back the probabilities of the intersections of three sets, and so on. Substituting for B from (30) and using the definition (6) gives (5).
B Proof of (12)
Recall that Pi, 1 ≤ i ≤ m, are the unsorted p-values, and P(i) are the sorted p-values. Denote A = F(n1)⋯(ne) [pn1, … , pne ; G (j, k)]. Then we have
| (33) |
where Bi are the disjoint events
| (34) |
Since Fi are continuous, Pr {Pi ∈ (pnh, pnh+1]} = Fi (pnh+1) – Fi (pnh), the random variables Pi have values in the disjoint union
| (35) |
and Pi are independent, (12) now follows from elementary probability. The condition (8) means that μih acts as indicator for the presence of the term Fi (pnh+1) – Fi (pnh). Equation (9) describes the conditions on Pi ∈ [pn0, pn1] and (10) describes the conditions on Pi ∈ [pnh, pnh+1], h ≥ 1. Finally, (11) comes from the fact that the union (35) is disjoint.
Footnotes
Glueck was supported by NCI K07CA88811. Mandel was supported by NSF grants CNS-0325314, CNS-0719641 and DMS-0623983. Muller was supported by NCI P01 CA47 982-04, NCI R01 CA095749-01A1 and NIAID 9P30 AI 50410. Hunter was supported by NLM 5R01LM008111-03 and NCI 5 P30 CA46934-15.
References
- Anttinen I, Pamilo M, Soiva M. Double reading of mammography screening films–one radiologist or two? Clinical Radiology. 1993;48:414–421. doi: 10.1016/s0009-9260(05)81111-0. [DOI] [PubMed] [Google Scholar]
- Bapat RB, Beg MI. Order statistics for non-identically distributed variables and permanents. Sankhyā Ser. A. 1989;51:79–93. [Google Scholar]
- Benjamini Y, Liu W. A Step-Down Multiple Hypotheses Testing Procedure That Controls the False Discovery Rate Under Independence. Journal of Statistical Planning and Inference. 1999;82:163–170. [Google Scholar]
- Benjamini Yoav, Hochberg Yosef. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society Series B: Statistical Methodology. 1995;57(1):289–300. [Google Scholar]
- Benjamini Yoav, Hochberg Yosef. Multiple Hypotheses Testing with Weights. Scandinavian Journal of Statistics. 1997;24(3):407–418. [Google Scholar]
- Benjamini Yoav, Yekutieli Daniel. The control of the false discovery rate in multiple testing under dependency. The Annals of Statistics. 2001;29(4):1165–1188. [Google Scholar]
- Ciatto S, Rosselli Del Turco M., Burke C, Visioli P, Paci E, Zappa M. Comparison of standard and double reading and computer-aided detection (CAD) of interval cancers at prior negative screening mammograms: blind review. British Journal of Cancer. 2003;89:1645–1649. doi: 10.1038/sj.bjc.6601356. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Curran-Everett D. Multiple Comparisons: Philosophies and Illustrations. American Journal of Physiology. Regulatory Integrative and Comparative Physiology. 2000;279:R1–R8. doi: 10.1152/ajpregu.2000.279.1.R1. [DOI] [PubMed] [Google Scholar]
- Efron B, Storey J, Tibshirani R. Microarrays, Empirical Bayes Methods, and False Discovery Rates. Journal of the American Statistical Association. 2001;96:1151–1160. [Google Scholar]
- Ferreira José A., Zwinderman Aeilko. Approximate Sample Size Calculations with Microarray Data: an Illustration. Statistical Applications in Genetics and Molecular Biology. 2006;5 doi: 10.2202/1544-6115.1227. Article 25. [DOI] [PubMed] [Google Scholar]
- Finner H, Roters M. Multiple Hypothesis Testing and Expected Number of Type I Errors. The Annals of Statistics. 2002;30(1):220–238. [Google Scholar]
- Genovese CR, Wasserman L. Operating Characteristics and Extensions of the False Discovery Rate Procedure. Journal of the Royal Statistical Society Series B: Statistical Methodology. 2002;64:499–517. [Google Scholar]
- Genovese CR, Wasserman L. A Stochastic Process Approach to False Discovery Control. The Annals of Statistics. 2004;32(3):1035–1061. [Google Scholar]
- Glueck DH, Lamb MM, Lewin JM, Pisano ED. Two-modality mammography may confer an advantage over either full-field digital mammography or screen-film mammography. Academic Radiology. 2007;14(6):670–676. doi: 10.1016/j.acra.2007.02.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Glueck Deborah H., Karimpour-Fard A, Mandel J, Hunter L, Muller KE. Fast computation by block permanents of cumulative distribution functions of order statistics from several populations. Communications in Statistics: Theory and Methods. 37:18. doi: 10.1080/03610920802001896. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Glueck Deborah H., Muller Keith E., Karimpour-Fard Anis, Hunter Lawrence. Expected Power for the False Discovery Rate with Independence. Communications in Statistics: Theory and Methods. 2008b;37(12):1855–1866. doi: 10.1080/03610920801893731. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jung Sin-Ho. Sample size for FDR-control in microarray data analysis. Bioinformatics. 2005;21(14):3097–3104. doi: 10.1093/bioinformatics/bti456. [DOI] [PubMed] [Google Scholar]
- Keselman HJ, Cribbie R, Holland B. Controlling the Rate of Type I Error Over a Large Set of Statistical Tests. British Journal of Mathematical and Statistical Psychology. 2002;55:27–39. doi: 10.1348/000711002159680. [DOI] [PubMed] [Google Scholar]
- Lee M, Whitmore G. Power and Sample Size for DNA Microarray Studies. Statistics in Medicine. 2002;21:3543–3570. doi: 10.1002/sim.1335. [DOI] [PubMed] [Google Scholar]
- Lewin John M., D’Orsi Carl J., Edward Hendrick R, Moss Lawrence J., Isaacs Pamela K., Karellas Andrew, Cutter Gary R. Clinical comparison of full-field digital mammography and screen-film mammography for detection of breast cancer. American Journal of Roentgenology. 2002;179:671–7. doi: 10.2214/ajr.179.3.1790671. [DOI] [PubMed] [Google Scholar]
- McNemar Q. Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika. 1947;12:153–157. doi: 10.1007/BF02295996. [DOI] [PubMed] [Google Scholar]
- Pisano ED, Gatsonis C, Hendrick E, Yaffe M, Baum JK, Acharyya S, Conant EF, Fajardo LL, Bassett L, D’Orsi C, Jong R, Rebner M. Diagnostic performance of digital versus film mammography for breast-cancer screening. New England Journal of Medicine. 2005;353:1773–1783. doi: 10.1056/NEJMoa052911. [DOI] [PubMed] [Google Scholar]
- Rosner B. Fundamentals of Biostatistics. 6th edition Brooks-Cole; New York: 2006. [Google Scholar]
- Ruppert D, Nettleton D, Hwang JTG. Exploring the information in p-values for the analysis and planning of multiple-test experiments. Biometrics. 2007;63:483–495. doi: 10.1111/j.1541-0420.2006.00704.x. [DOI] [PubMed] [Google Scholar]
- Sarkar SK. Some Results on False Discovery Rate in Stepwise Multiple Testing Procedures. The Annals of Statistics. 2002;30(1):239–257. [Google Scholar]
- Sarkar SK. FDR-Controlling Stepwise Procedures and Their False Negatives Rates. Journal of Statistical Planning and Inference. 2004;125:119–137. [Google Scholar]
- Sarkar SK. False Discovery and False Nondiscovery rates in Single-Step Multiple Testing Procedures. The Annals of Statistics. 2006;34(1):394–415. [Google Scholar]
- Selicato GR, Muller KE. Approximating power of the unconditional test for correlated binary pairs. Communications in Statistics: Simulation and Computation. 1998;27:553–564. doi: 10.1080/03610919808813494. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sickles EA, Weber WN, Galvin HB, Ominsky SH, Sollitto RA. Baseline screening mammography: one vs. two views per breast. American Journal of Roentgenology. 1986;147:1149–1153. doi: 10.2214/ajr.147.6.1149. [DOI] [PubMed] [Google Scholar]
- Skaane P, Young K, Skjennald A. Population-based mammography screening: comparison of screen-film and full-field digital mammography with soft-copy reading–Oslo I study. Radiology. 2003;229:877–884. doi: 10.1148/radiol.2293021171. [DOI] [PubMed] [Google Scholar]
- Storey JD. Direct Approach to the False Discovery Rate. Journal of the Royal Statistical Society Series B: Statistical Methodology. 2002;64(3):479–498. [Google Scholar]
- Thurfjell EL, Lernevall KA, Taube AA. Benefit of independent double reading in a population-based mammography screening program. Radiology. 1994;191:241–244. doi: 10.1148/radiology.191.1.8134580. [DOI] [PubMed] [Google Scholar]

