Abstract
Multiple hypothesis testing is commonly used in genome research such as genome-wide studies and gene expression data analysis (Lin, 2005). The widely used Bonferroni procedure controls the family-wise error rate (FWER) for multiple hypothesis testing, but has limited statistical power as the number of hypotheses tested increases. The power of multiple testing procedures can be increased by using weighted p-values (Genovese et al., 2006). The weights for the p-values can be estimated by using certain prior information. Wasserman and Roeder (2006) described a weighted Bonferroni procedure, which incorporates weighted p-values into the Bonferroni procedure, and Rubin et al. (2006) and Wasserman and Roeder (2006) estimated the optimal weights that maximize the power of the weighted Bonferroni procedure under the assumption that the means of the test statistics in the multiple testing are known (these weights are called optimal Bonferroni weights). This weighted Bonferroni procedure controls FWER and can have higher power than the Bonferroni procedure, especially when the optimal Bonferroni weights are used. To further improve the power of the weighted Bonferroni procedure, first we propose a weighted Šidák procedure that incorporates weighted p-values into the Šidák procedure, and then we estimate the optimal weights that maximize the average power of the weighted Šidák procedure under the assumption that the means of the test statistics in the multiple testing are known (these weights are called optimal Šidák weights). This weighted Šidák procedure can have higher power than the weighted Bonferroni procedure. Second, we develop a generalized sequential (GS) Šidák procedure that incorporates weighted p-values into the sequential Šidák procedure (Scherrer, 1984). This GS Šidák procedure is an extension of and has higher power than the GS Bonferroni procedure of Holm (1979). Finally, under the assumption that the means of the test statistics in the multiple testing are known, we incorporate the optimal Šidák weights and the optimal Bonferroni weights into the GS Šidák procedure and the GS Bonferroni procedure, respectively. Theoretical proof and/or simulation studies show that the GS Šidák procedure can have higher power than the GS Bonferroni procedure when their corresponding optimal weights are used, and that both of these GS procedures can have much higher power than the weighted Šidák and the weighted Bonferroni procedures. All proposed procedures control the FWER well and are useful when prior information is available to estimate the weights.
1. Introduction
Multiple hypothesis testing involves testing multiple hypotheses simultaneously; each hypothesis is associated with a test statistic (Rubin et al., 2006). Multiple hypothesis testing is a common problem in genome research, such as genome-wide studies and gene expression data analysis (Lin, 2005). For multiple hypothesis testing, a traditional criterion for error (type I) control is the family-wise error rate (FWER), which is the probability of rejecting one or more true null hypotheses (Hochberg and Tamhane, 1987; Lin, 2005).
The Bonferroni procedure (Bonferroni, 1937) and the Šidák procedure (Šidák, 1967) are two well-known methods for controlling FWER with computational simplicity and wide applicability (Olejnik et al., 1997). However, both of these methods have limited statistical power as the number of hypotheses tested (m) increases (Nakagawa, 2004). Holm (1979) proposed a (step-down) sequential Bonferroni procedure which has slightly higher power than the Bonferroni procedure but there is little difference between these two procedures when the number of tests (m) is large (Lin, 2005). As an extension of the (step-down) sequential Bonferroni procedure, Holm (1979) proposed a generalized sequential (GS) Bonferroni procedure by using different weights for hypotheses of different importance. Although Holm did not show how to estimate the weights, the method has the potential to improve the power of multiple hypothesis testing when prior information is available to estimate the weights.
Rubin et al. (2006) and Wasserman and Roeder (2006) proposed a weighted Bonferroni procedure that adjusts p-values by using optimal weights. These optimal weights were calculated by maximizing the average power of the weighted Bonferroni procedure under the assumption that the means of all test statistics are known, and these weights are called optimal Bonferroni weights. Under such assumption, the average power of the weighted Bonferroni procedure is much higher than that of the Bonferroni procedure (Rubin et al., 2006; Genovese et al., 2006; Wasserman and Roeder, 2006). In practice, the means of the test statistics are unknown. However, if some prior information is available to estimate the means, this weighted Bonferroni procedure can be more powerful than the Bonferroni procedure (Rubin et al., 2006; Wasserman and Roeder, 2006; Roeder et al., 2006; Roeder et al., 2007).
The purpose of this study is to develop more powerful weighted hypothesis testing procedures as extensions of the weighted Bonferroni procedure. First, we propose a weighted Šidák procedure, and then under the assumption that the means of all test statistics are known, we estimate the optimal weights maximizing the average power of the weighted Šidák procedure (these weights are called optimal Šidák weights). The weighted Šidák procedure has slightly higher power than the weighted Bonferroni procedure. Second, we develop a GS Šidák procedure as an extension of the GS Bonferroni procedure of Holm (1979) and the sequential Šidák procedure (Scherrer, 1984). Finally, assuming that the means of all test statistics are known, we incorporate the optimal Šidák weights and the optimal Bonferroni weights into the GS Šidák procedure and the GS Bonferroni procedure, respectively. Theoretical proof and/or simulation studies show that, using their corresponding optimal weights, the GS Šidák procedure has slightly higher power than the GS Bonferroni procedure, and that both GS Šidák procedure and GS Bonferroni procedure have much higher power than the weighted Šidák procedure and the weighted Bonferroni procedure. All the proposed procedures can control the FWER well.
2. Methods
2.1. Notations
Consider testing m (null) hypotheses H = H1, H2, ⋯ Hm with corresponding test statistics Z = (Z1, Z2, ⋯, Zm), where we assume that Zi follows normal distribution of N(μi,1), and all Zi’s are independent. For simplicity, in this paper, we only present the results for one-sided tests. Similar results for two-sided tests can easily be obtained. Thus, for the i-th test, the (null) hypothesis is Hi : μi = 0, and the corresponding alternative hypothesis is H̄i : μi > 0. Suppose that there are m1 true null hypotheses and m2 false null hypotheses among all hypotheses in H, where m2 = m - m1. Let H0 denote all the true null hypotheses in H. Let p = (p1, p2, ⋯, pm) denote the p-values associated with the hypotheses (H1, H2, ⋯, Hm). Let μ = (μ1, μ2, ⋯, μm) denote the means of the test statistics Z.
As described earlier, FWER is the probability of falsely rejecting at least one true null hypothesis (Hochberg and Tamhane, 1987), which can be written as
A multiple testing procedure is said to control the family-wise error rate at a significance level α if FWER ≤ α.
The power for a single test is called per-hypothesis power. For a single test with hypothesis Hi, the per-hypothesis power is the probability of rejecting Hi given that the alternative hypothesis H̄i is true, i.e., Pr(rejecting Hi|μi < 0). For multiple hypotheses testing, Roeder et al. (2007) defined the average power of a testing procedure as the average value of per-hypothesis powers of the m2 tests associated with the false null hypotheses: .
2.2. Weighted Bonferroni procedure and optimal Bonferroni weights
2.2.1. Weighted Bonferroni procedure
In the Bonferroni procedure, if , then reject the null hypothesis Hj ; otherwise, it is failed to reject Hj (j = 1, ⋯, m). The power of multiple testing procedures can be increased by using weighted p-values (Genovese et al., 2006). Holm appears to be the first one proposing the idea of the weighted Bonferroni procedure, which incorporates weighted p-values into the Bonferroni procedure (Holm, 1979). Wasserman and Roeder (2006) provided a clear description of the weighted Bonferroni procedure as follows. Given nonnegative weights (w1, w2, ⋯ wm) for the tests associated with the hypotheses, (H1, H2, ⋯ Hm), where
(1) |
For hypothesis Hj (1 ≤ j ≤ m), when wj > 0, reject Hj if , and fail to reject Hj when wj = 0.
This procedure controls FWER at level α. The weights (w1, w2, ⋯ wm) can be specified by using certain prior information available to the researcher. For example, in genome-wide association studies, the prior information can be linkage signals or results from gene expression analyses. Roeder et al. (2006) proposed a method to estimate weights by using linkage data to weight association p-values in association studies. However, how to estimate the optimal weights in multiple testing is still a topic to be further investigated (see also the section on discussion).
2.2.2. Optimal Bonferroni weights
Rubin et al. (2006) and Wasserman and Roeder (2006) independently proposed very similar approaches to estimate the optimal weights by maximizing the average power of the procedure, assuming that the e means μ = (μ1, μ2 ⋯, μm) are known. We call these optimal weights optimal Bonferroni weights and they are calculated (Wasserman and Roeder, 2006) by
(2) |
where Φ̄(x) is the upper tail probability of a standard normal cumulative distribution function (CDF) (i.e., Φ̄ (x) = 1- Φ(x) and Φ(x) denotes the CDF of the standard normal distribution) and Δ is the constant that satisfies equations (1) and (2) i.e.
(3) |
As an illustrative example, Figure 1 shows the optimal Bonferroni weights as a function of the means μj in a multiple testing with 100 tests. The means μ vary from 1 to 7 in increment of 6/99 = 0.0606. When the means μj are small, the optimal weights increase with the increase of μj but when μj are large enough, the optimal weights decrease with the increase of μj In other words, the weighted Bonferroni procedure offers large weights (often > 1) to the tests with midrange of means and offers small weights (often < 1) to tests with small or large means (Wasserman and Roeder, 2006). Dividing the p-value by a weight w > 1 increases the probability of rejecting the corresponding null hypothesis, and dividing the p-value by a weight 0 < w < 1 decreases the probability of rejecting the corresponding null hypothesis. However, in most situations, even though the tests with large means are assigned small weights (<1), the corresponding hypotheses can still be rejected because the related p-values are very small. The weighted Bonferroni procedure using these optimal weights can have much higher power than the Bonferroni procedure when the means (μ) of the test statistics are given or given prior information that can be used for estimating the means (Roeder et al., 2006, 2007; Rubin et al., 2006; Wasserman and Roeder, 2006).
Figure 1.
Distribution of optimal Bonferroni weights for a multiple testing procedure with m = 100 tests. μ are the means of the test statistics, and vary from 1 to 7.
2.3. Weighted Šidák procedure and optimal Šidák weights
Since the Šidák procedure has higher power than the Bonferroni procedure for independent tests (Simes, 1986), we propose a weighted Šidák procedure that incorporates weighted p-values into the Šidák procedure as an extension of the weighted Bonferroni procedure. We also describe how to calculate the optimal weights for the weighted Šidák procedure assuming means of the test statistics are known.
2.3.1. Weighted Šidák procedure
In the Šidák procedure (Šidák, 1967), for any null hypothesis Hj (1 ≤ j ≤ m), if , then reject Hj. The Šidák procedure controls the FWER at level α. Now we propose a weighted Šidák procedure by using weighted p-values as follows: given a set of nonnegative weights (w1, w2, ⋯, wm) specified for independent tests associated with the hypotheses (H1, H2, ⋯, Hm) such that equation (1) holds (i.e., m–1∑iwi = 1), for hypothesis Hj (1 ≤ j ≤ m), when wj > 0, if
(4) |
then reject the null hypothesis Hj; on the other hand, when wj = 0, do not reject Hj. In this article we denote the weighted p-value as Sj. Therefore, (4) can be written as .
Theorem 1. Suppose m tests are independent, then the weighted Šidák procedure controls the family-wise error rate at a significance level α.
Proof. P(failing to reject any true null hypotheses in H0)
where, pj follows standard uniform distribution when Hj ∈ H0. Since FWER =1 – P(failing to reject any true null hypotheses in H0), then Theorem 1 follows. ▪
From the Taylor series expansions, we obtain
Based on this inequality, when the same pre-determined weights (w1, w2, ⋯, wm) are used by the weighted Šidák procedure and the weighted Bonferroni procedure, if any hypothesis Hj is rejected by the weighted Bonferroni procedure (i.e., ), then it must be rejected by the weighted Šidák procedure (i.e., ). Thus, we have Theorem 2.
Theorem 2. For m independent tests, if the same pre-determined weights (w1, w2, ⋯, wm) are used in the weighted Šidák procedure and the weighted Bonferroni procedure, then the weighted Šidák procedure has higher average power than the weighted Bonferroni procedure.
Remark 1. If all weights wj = 1, then the weighted Šidák procedure becomes the Šidák procedure. The weighted Šidák procedure can have higher power than the Šidák procedure if the weights are selected appropriately.
2.3.2. Optimal Šidák weights
As stated earlier, how to estimate optimal weights by using the prior information still needs further investigation. Here, we derive the optimal weights that maximize the average power of the weighted Šidák procedure under the assumption tion that the means (μ1, μ2, ⋯, μm) are known. These optimal weights are called optimal Šidák weights, which is an extension of the optimal Bonferroni weights of Wasserman and Roeder (2006).
For any specified weights (w1, w2, ⋯, wm), the per-hypothesis power for the single test with hypothesis Hj in the weighted Šidák procedure is
The average power of the weighted Šidák procedure is
To find the optimal weights that maximize this average power subject to constraint of equation (1), Lagrange method was used to obtain conditional extremum of PWaverage.
Theorem 3. Given FWER being α and known means (μ1, μ2, ⋯, μm) of the m independent test statistics (Z1, Z2, ⋯, Zm), the optimal non-negative weights (w1, w2, ⋯, wm), that maximize the average power of the weighted Šidák procedure subject to constraint of equation (1) can be obtained by solving inequalities wi ≥ 0, equations (1) and
(5) |
where c is a constant (given in Appendix A).
The proof of this theorem is given in Appendix A. The inequalities and equations can be solved by using the “nlminb()” function in R.
In the following simulation studies we will show that the weighted Šidák procedure using the optimal Šidák weights can have higher power than the weighted Bonferroni procedure using the optimal Bonferroni weights and that the weighted Šidák procedure using the optimal Šidák weights can have much higher power than the Šidák procedure.
2.4. GS Bonferroni procedure and GS Šidák procedure
Holm (1979) introduced a GS Bonferroni procedure that is a step-down procedure using ordered weighted p-values. If the (unknown) weights used in the procedure are estimated appropriately by using prior information, the procedure can have higher power than the weighted Bonferroni procedure (also see below). In this section, we first review this GS Bonferroni procedure, and then we propose a GS Šidák procedure as an extension of the GS Bonferroni procedure.
When assuming that the means of the statistics are known, it is difficult to derive the optimal weights by maximizing the average power of these GS procedures as done before for the weighted Bonferroni and the weighted Šidák procedures. We incorporate the optimal Bonferroni (Šidák) weights described in Section 2.2 (2.3) into the GS Bonferroni (Šidák) procedure. We will show below that when these optimal weights are used, the GS Bonferroni (Šidák) procedure has higher power than the weighted Bonferroni (Šidák) procedure.
2.4.1. GS Bonferroni procedure
Given nonnegative weights (w1, w2, ⋯, wm) for the m tests associated with hypotheses (H1, H2, ⋯, Hm), (note that it is not necessary to satisfy the condition ), if any weight wi = 0, then do not reject the corresponding hypothesis Hi. For the remaining hypotheses with weights wi > 0, define (i = 1, 2, ⋯, m), which are called B-values (i.e., weighted p-values). Let B(1) ≤ B(2) ≤ ⋯ ≤ B(m) be the ordered B-values, H(1), H(2), ⋯, H(m) be the corresponding hypotheses and w(1), w(2), ⋯, w(m) be the corresponding weights. Then the GS Bonferroni procedure (Holm, 1979) can be described as follows:
-
Step 1. If , stop the procedure; otherwise, reject H(1) and go to the next step.
...
-
Step j. If , stop the procedure; otherwise, reject H(j) and go to the next step.
....
Continue these steps until the procedure is stopped or all B-values have been processed.
This procedure controls FWER at level α. If we set all weights wj equal to 1, the inequality in step j becomes and the GS Bonferroni procedure becomes the sequential Bonferroni procedure (Holm, 1979). This GS Bonferroni procedure can have higher power than the sequential Bonferroni procedure when the weights are chosen properly (Holm, 1979).
Now we compare the power of the GS Bonferroni procedure and the weighted Bonferroni procedure when the same pre-determined weights are used in these two procedures. For pre-specified weight (w(1), w(2), ⋯, w(m)) associated with hypotheses (H(1), H(2), ⋯, H(m)) such that (i.e., ), if any false hypothesis H(j) is rejected by the weighted Bonferroni procedure, that is, is true, then . Since , we have
Thus, H(j) will also be rejected by the GS Bonferroni procedure. Therefore, we have Theorem 4.
Theorem 4. Given weights (w1, w2, ⋯, wm) for m independent tests such that , the GS Bonferroni procedure has higher average power than the weighted Bonferroni procedure.
2.4.2. GS Bonferroni procedure using the optimal Bonferroni weights
As stated earlier, it is difficult to estimate the optimal weights that maximize the average power of the GS Bonferroni procedure under the assumption that the means of statistics are known. Here we propose to use the optimal Bonferroni weights described in Section 2.2. When these optimal Bonferroni weights are used, from Theorem 4, we know that the GS Bonferroni procedure has higher average power than the weighted Bonferroni procedure. Our simulation studies will confirm this.
2.4.3. GS Šidák procedure
The GS Bonferroni procedure is based on the Bonferroni procedure. As stated earlier, the Šidák procedure has higher power than the Bonferroni procedure. Therefore, we propose a GS Šidák procedure.
Given nonnegative weights (w1, w2, ⋯, wm) for the m tests associated with hypotheses (H1, H2, ⋯, Hm) (note that it is not necessary to satisfy the condition ), if any weight wi = 0, do not reject the corresponding null hypothesis Hi. For the remaining hypotheses with weights wi > 0, let which are called S-values (i.e., weighted p-values). Let S(1) ≥ S(2) ≥ ⋯ ≥ S(m) be the ordered S-values, (H1, H2, ⋯, Hm) be the corresponding hypotheses, and w(1), w(2), ⋯, w(m) be the corresponding weights. The GS Šidák procedure can be described as the following steps.
Step 1. If , then stop the procedure; otherwise reject H(1) and go to the next step.
....,
Step j. When H(1) ⋯, H(j–1) have been tested and rejected: if
(6) |
stop the procedure; otherwise reject the hypothesis H(j), and go to the next step.
...,
Continue these steps until the procedure is stopped or all S-values have been processed.
Theorem 5. Suppose m tests are independent, then the GS Šidák procedure controls family-wise error rate at a significant level α.
Proof. Let I0 be the set of index subscripts for the true null hypotheses, I0 = {t: Ht ∈H0}. Let denote the largest S-value among all St with t∈ I0. Among the ordered S-values, S(1) ≥ S(2) ≥ ⋯ ≥ S(m), suppose at integer k, , then S(k) is first ordered S-value (from large to small) which is corresponding to a true null hypothesis in H0 (i.e., all the previous k−1 ordered S-values S(1), ⋯, S(k−1) are corresponding to false null hypothesis), where, 1≤ k ≤ m - m1 + 1, and m1 is the number of true hypotheses. According to the GS Šidák procedure, the event of failing to reject any true null hypotheses in H0 is equal to event that equation (6) holds for some j < k. The family-wise error rate is FWER =1 – P(failing to reject any true null hypotheses in H0), and
P(failing to reject any true null hypotheses in H0)
where , and 1 - pt follows uniform distribution when t∈I0. ▪
Now we compare the power of the GS Šidák procedure to that of the weighted Šidák procedure when both procedures use the same weights wj that satisfy . If H(j) is rejected by the weighted Šidák procedure, that is, is true (see inequality (4)), then . Since , we have
Thus, H(j) will also be rejected by the GS Šidák procedure, and we have Theorem 6.
Theorem 6. Given weights (w1, w2, ⋯, wm) for m independent tests that satisfy , then the GS Šidák procedure has higher power than the weighted Šidák procedure.
Furthermore, we compare the power of the GS Šidák procedure to that of the GS Bonferroni procedure when the same pre-specified weights are used in these two procedures.
Theorem 7. For m independent tests, if the same weights (w1, w2, ⋯, wm) are used in the GS Šidák procedure and GS Bonferroni procedure, then the GS Šidák procedure has higher average power than the GS Bonferroni procedure.
Proof. From the definition of B-value (Bi = pi / wi ) and S-value Si = (1 – pi)1/wi), we know Bi (Si) is a monotonically deceasing (increasing) function of wi. Suppose that B(1) ≥ B(2) ≥ ⋯ ≥ B(m) and S(1) ≥ S(2) ≥ ⋯ ≥ S(m) are associated with the same hypotheses H(1), H(2), ⋯, H(m). Below we show that for any hypothesis H(j), if , then , from which we know that the GS Šidák procedure has higher power than the GS Bonferroni procedure.
If , from the Taylor series expansions, we have
Thus, .▪
Remark 2. If setting all the weights equal (to 1), then the GS Šidák procedure becomes the sequential Šidák procedure (Scherrer, 1984).
2.4.4. GS Šidák procedure using the optimal Šidák weights
In the GS Šidák procedure, a major issue is how to calculate the weights. As we stated before, it is difficult to derive optimal weights that maximize the average power of the GS Šidák procedure under the assumption that the means of the statistics are known. Here, under this assumption, we suggest using the optimal Šidák weights calculated by equation (5). From Theorem 7, the GS Šidák procedure has higher average power than the weighted Šidák procedure when the optimal Šidák weights are used by these two procedures.
We will show that the GS Šidák procedure using the optimal Šidák weights has higher power than the GS Bonferroni procedure using the optimal Bonferroni weights by simulation studies (see below). It appears to be difficult to prove this statement theoretically because the optimal Šidák weights are not the same as the Bonferroni weights.
3. Simulation studies and results
To further evaluate the performance of the proposed testing procedures, we compared by simulation studies the average power of six multiple testing procedures: the Šidák procedure, the Bonferroni procedure, the weighted Šidák (Bonferroni) procedure using the optimal Šidák (Bonferroni) weights, and the GS Šidák (Bonferroni) procedure using the optimal Šidák (Bonferroni) weights.
3.1. Assuming true means μ = (μ1, μ2, ⋯, μm known
When we assume that the means of statistics are known, for each true null hypothesis Hj : μj = 0, the weight is assigned to zero in each procedure using weights. Thus, all true null hypotheses will not be rejected in these procedures. In other words, the FWER is equal to zero in the simulation studies with known means of statistics. Thus, we only compare the average power among these six procedures.
We simulated datasets in a similar way to Rubin et al. (2006). Each simulated dataset X = (Xi,j)n×m consisted of n =100 i.i.d observations, where each observation corresponded to a subject and was a vector of measurements of m = 1,000 independent covariates, and Xi,j was a measurement of covariate j at the i-th observation. For each covariate j, we assumed that Xi,j ∼ N(γj, 1), i = 1,⋯, n, and we implemented a test with statistic to test the null hypothesis γj = 0 against the alternative γj > 0. Using the central limit theorem, Zj follows an asymptotic distribution of N(uj, 1), where (Rubin et al., 2006). Thus, testing hypothesis γj = 0 is equivalent to testing hypothesis Hj:uj = 0.
When generating each dataset that was associated with 1,000 covariates, we randomly chose 50 covariates and set the means γj > 0 for these 50 covariates (i.e., μj > 0 for the corresponding test statistic Zj), and set γj = 0 for the other 950 covariates. In our simulation studies, we considered two scenarios for the 50 nonzero γj. In Scenario 1, we set the 50 non-zero γj equal to a common value γ that varies as a simulation parameter. We considered γ between 0.1 and 0.5, in increments of 0.1 (correspondingly, are between 1 and 5, in increments of 1). We simulated 1,000 datasets corresponding to each of these γ values. In Scenario 2, for the 50 non-zero γj, we set the first ten γj = 0.1 (μj = 1), the second ten γj = 0.2 (μj = 2), ⋯, and the fifth ten γj = 0.5 (μj = 5). We simulated 1,000 datasets for Scenario 2.
Table 1 shows the results of the estimated average power of the six multiple testing procedures in Scenarios 1 and 2. From Table 1, we can see that the GS Šidák, weighted Šidák, and Šidák procedures have slightly higher estimated average power than the corresponding GS Bonferroni, weighted Bonferroni, and Bonferroni procedures, and that the GS Šidák procedure is most powerful among the six procedures. The GS Šidák procedure and the GS Bonferroni procedure can have much higher power than both the weighted Šidák procedure and the weighted Bonferroni procedure. For example, in Scenario 1, when μ = 3, the estimated average power of the GS Šidák procedure, GS Bonferroni procedure, the weighted Šidák procedure, and the weighted Bonferroni procedure is 0.5820, 0.5792, 0.4670 and 0.4639, respectively (see Table 1).
Table 1.
The estimated average power of the six multiple testing procedures over 1,000 replicated data sets when given the means of the test statistics. (α =0.05, m2 = 50, m =1000 and n =100).
Scenario | μ | Bonfa | Šidák | W-Bonfb | W-Šidákc | GS-Bonfd | GS-Šidáke |
---|---|---|---|---|---|---|---|
Scenario | 1 | 0.0019 | 0.0020 | 0.0183 | 0.0186 | 0.0453 | 0.0459 |
2 | 0.0293 | 0.0298 | 0.1378 | 0.1394 | 0.2253 | 0.2274 | |
3 | 0.1869 | 0.1886 | 0.4639 | 0.4670 | 0.5792 | 0.5820 | |
4 | 0.5436 | 0.5460 | 0.8185 | 0.8205 | 0.8781 | 0.8796 | |
5 | 0.8664 | 0.8677 | 0.9719 | 0.9724 | 0.9838 | 0.9841 | |
Scenario | 0.3256 | 0.3267 | 0.4988 | 0.5002 | 0.5419 | 0.5432 |
Bonf = Bonferroni;
W-Bonf = Weighted Bonferroni;
W-Šidák = Weighted Šidák;
GS-Bonf = Generalized sequential Bonferroni;
GS-Šidák = Generalized sequential Šidák
3.2. True means μ = (μ1, μ2, ⋯, μm) unknown
In previous sections, all weights are calculated under the assumption that the means μ = (μ1, μ2, ⋯, μm) of statistics are known. However, in real data analysis, the means μ are usually unknown. The means μ can be estimated by using certain prior information. How to effectively estimate μ is still a topic to be further investigated. Rubin et al. (2006) described a data-splitting method, which splits the full data X into two parts, X1 and X2, with proportion π and 1- π of the original data X, respectively. Data X1 is used to estimate the means of the standardized test statistics and data X2 is used to test the hypotheses. They showed that if some prior information, such as the order of means (μ1, μ2, ⋯, μm), is available, by using the data-splitting method the weighted Bonferroni procedure has higher power than the Bonferroni procedure. In genetic association studies, Roeder et al. (2007) described a two-step approach to estimate the means (μ1, μ2, ⋯, μm), by using prior information such as reported linkage peaks, results of previously genome wide association studies, or results of gene expression studies.
It is beyond of the scope of this study to determine how to effectively estimate the means (μ1, μ2, ⋯, μm) by using prior information. To show the performance of our proposed procedures when estimated means are used, as an example, we implemented our proposed procedures by incorporating the data-splitting method and applied these methods to the simulated Scenarios 1–2 data sets described in the previous section. The only exception is that we assume here that the first 950 covariates have means equal to zero and the last 50 covariates have the common mean value γ >0. This fellows the assumption of Rubin et al. (2006) that the order of means (μ1, μ2, ⋯, μm) is known.
For each simulated dataset, the Bonferroni procedure and the Šidák procedure were implemented on the entire dataset, while the other four procedures used the data-splitting method (under the assumption that the order of means (μ1, μ2, ⋯, μm) is known). Table 2 shows the estimated average power and family-wise error rates of the six procedures for Scenarios 1 and 2. We only show the results with the proportion π of the first part X1 equal to 0.1.
Table 2.
The estimated FWERs and average power of the six multiple procedures over 1,000 replicated data sets when the means of the test statistics are unknown (α = 0.05, m = 1000, m2 = 50, n=100, and π = 0.1 for data-splitting).
Scenario | μ | Bonfa | Šidák | W-Bonfb | W-Šidákc | GS-Bonfd | GS-Šidáke | |
---|---|---|---|---|---|---|---|---|
Scenario 1 | 1 | FWER | 0.0610 | 0.0650 | 0.0180 | 0.0180 | 0.0180 | 0.0180 |
Power | 0.0017 | 0.0017 | 0.0091 | 0.0093 | 0.0092 | 0.0095 | ||
2 | FWER | 0.0360 | 0.0370 | 0.0020 | 0.0020 | 0.0020 | 0.0020 | |
Power | 0.0278 | 0.0283 | 0.0996 | 0.1009 | 0.1071 | 0.1085 | ||
3 | FWER | 0.0410 | 0.0430 | 0.0010 | 0.0010 | 0.0030 | 0.0030 | |
Power | 0.1835 | 0.1854 | 0.3766 | 0.3795 | 0.4523 | 0.4564 | ||
4 | FWER | 0.0480 | 0.0500 | 0.0030 | 0.0030 | 0.0260 | 0.0270 | |
Power | 0.5422 | 0.5445 | 0.7427 | 0.7449 | 0.8707 | 0.8719 | ||
5 | FWER | 0.0490 | 0.0490 | 0.0110 | 0.0110 | 0.0530 | 0.0540 | |
Power | 0.8746 | 0.8715 | 0.9342 | 0.9350 | 0.9772 | 0.9780 | ||
Scenario 2 | FWER | 0.0610 | 0.0630 | 0.0010 | 0.0010 | 0.0010 | 0.0010 | |
Power | 0.3239 | 0.3251 | 0.4566 | 0.4581 | 0.5247 | 0.5262 |
Bonf = Bonferroni;
W-Bonf = Weighted Bonferroni;
W-Šidák = Weighted Šidák;
GS-Bonf = Generalized sequential Bonferroni;
GS-Šidák = Generalized sequential Šidák
From Table 2, we can find that the weighted Šidák and the GS Šidák procedures have slightly higher estimated average power than the weighted Bonferroni and the GS Bonferroni procedures, respectively, and that the GS Šidák procedure has the highest estimated average power among these six procedures. For example, when μ is equal to 4, the estimated average power of the GS Šidák procedure is 0.8719. It is nearly 13% more than that of the weighted Bonferroni procedure (0.7427). In addition, it is interesting that the estimated average power of the six procedures is smaller than their estimated FWERs when μ is equal to 1. This occurs because the average power is the average (not cumulative value) of per-hypothesis powers for the 50 false null hypotheses, and the FWER is a cumulative value (not average) of type I error rates for 950 tests.
From Table 2, we can also find that the six procedures can control FWERs quite well. Interestingly, the estimated FWERs are much lower in the four procedures using weights (i.e. the weighted Bonferroni, weighted Šidák, GS Bonferroni and GS Šidák) than in the two procedures without using weights (Bonferroni and Šidák). The reason is that the four weighted procedures used the prior information of the order of means of the test statistics.
4. Discussion
In this article, we propose a weighted Šidák procedure and a GS Šidák procedure for multiple hypotheses testing based on the weighted Bonferroni procedure. Under the assumption that the means of the test statistics are known, we further describe how to estimate the optimal Šidák weights which maximize the average power of the weighted Šidák procedure. We show that the weighted Šidák procedure using the optimal Šidák weights can have higher power that the weighted Bonferroni procedure using the optimal Bonferroni weights. Furthermore, we incorporate the optimal Šidák (Bonferroni) weights into the GS Šidák (Bonferroni) procedure. Using these optimal weights the GS Šidák (Bonferroni) procedures can have higher power than the corresponding weighted Šidák (Bonferroni) procedures, respectively, and the GS Šidák procedure often has the highest power among these procedures.
For the multiple procedures using weights described in this article, how to estimate ate the weights (w1, w2, ⋯, wm) by using prior information is still an open problem. Several investigations have been reported in the literature. Roeder et al. (2006) used linkage data to estimate weights and adjust p-values in genome-wide association studies. Ionita-Laza et al. (2007) used between-family information to estimate weights and weighted association p-values calculated by use of within-family information in family-based genome-wide association studies.
It appears that the optimal Šidák weights and optimal Bonferroni weights have better property than the weights described in the previous paragraph because these optimal weights are based on maximizing the average power of the procedures. However, the optimal Šidák weights and optimal Bonferroni weights are calculated assuming that the means of test statistics are known, and in practice, these means are unknown. The means of test statistics may be estimated by using prior information (Roeder et al., 2007). When certain prior information is available to estimate the means of statistics, the procedures proposed in this paper are useful and can have much higher power than the widely used Bonferroni procedure. However, how to use prior information to estimate the optimal Šidák weights and optimal Bonferroni weights is still a challenge. We will pursue studies on this topic in the future.
Most of the proposed methods focus on the normal distribution model and one-sided tests. It is trivial to modify the formulas to handle two-sided tests for normal l distribution and χ2 distribution. All the proposed methods assume independence among the multiple tests. This assumption is very conservative. In a real data analysis, multiple tests are often highly correlated. For example, in genome-wide association studies, the tests for different markers may be correlated due to linkage disequilibrium among the markers (Conneely and Boehnke, 2007; Nyholt, 2004). How to extend our proposed method to account for correlation among tests is another issue we will pursue in the future.
All the proposed methods focus on the control of the family-wise error rate for multiple testing. However, a similar idea can be applied to control false discovery rate by using weighed p-value (see also Genovese et al., 2006).
Acknowledgments
We thank the editor and two referees for their helpful comments and useful suggestions. This research was supported by grant GM073766, GM077490, and GM081488 from the National Institute of General Medical Sciences. Address for correspondence: Dr. Guimin Gao, Department of Biostatistics, University of Alabama at Birmingham, Birmingham, AL 35294. email: ggao@ms.soph.uab.edu. Phone: 205-975-9188.
Appendix A. Proof of Theorem 3
Proof. For the m independent test statistics (Z1, Z2, ⋯, Zm), we estimate the optimal weights wj that maximize the average power with the constraint . We set wj = 0 if μj = 0. For the remaining test statistics with μj > 0, the corresponding Lagrange function is
By setting the derivatives, with respect to wi, for i = 1, 2, ⋯, m, to zero,
that is
(A.1) |
where φ(x) is the probability density function of the standard normal distribution. From (A.1), we have
Taking logarithm on both sides, we obtain
(A.2) |
or
where, . Therefore, w satisfies the equations (5).
To make sure that equations (5) provide optimal values, we need to investigate the second derivatives of the Lagrange function for wi.
where, . Note that the off-diagonal elements of the Hessian matrix are all zeroes. We conclude that the Hessian matrix is negative definite. Consequently, the solutions of the weights are optimal.
References
- Bonferroni CE. “Volume in Onore di Ricarrdo dalla Volta,”. Universita di Firenza; 1937. Teoria statistica delle classi e calcolo delle probabilita; pp. 1–62. [Google Scholar]
- Conneely KN, Boehnke M. So many correlated tests, so little time! Rapid adjustment of p values for multiple correlated tests. Am J Hum Genet. 2007;81:1158–1168. doi: 10.1086/522036. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Genovese CR, Roeder K, Wasserman L. False discovery control with p-value weighting. Biometrika. 2006;93:509–524. doi: 10.1093/biomet/93.3.509. [DOI] [Google Scholar]
- Hochberg Y, Tamhane AC. Multiple comparison procedures. New York: Wiley; 1987. [Google Scholar]
- Holm S. A simple sequentially rejective multiple test procedure. Scand J Stat. 1979;6:65–70. [Google Scholar]
- Ionita-Laza I, McQueen MB, Laird NM, Lange C. Genomewide weighted hypothesis testing in family-based association studies, with an application to a 100K scan. Am J Hum Genet. 2007;81:607–614. doi: 10.1086/519748. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lin DY. An efficient Monte Carlo approach to assessing statistical significance in genomic studies. Bioinformatics. 2005;21:781–787. doi: 10.1093/bioinformatics/bti053. [DOI] [PubMed] [Google Scholar]
- Nakagawa S. A farewell to Bonferroni: the problems of low statistical power and publication bias. Behavioral Ecology. 2004;14:1044–1045. doi: 10.1093/beheco/arh107. [DOI] [Google Scholar]
- Nyholt DR. A simple correction for multiple testing for single-nucleotide-polymorphisms in linkage disequilibrium with each other. Am J Hum Genet. 2004;74:765–769. doi: 10.1086/383251. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Olejnik S, Li JM, Huberty CJ, Supattathum S. Multiple testing and statistical power with modified Bonferroni procedures. J Educat Behavioral Statist. 1997;22:389–406. [Google Scholar]
- Roeder K, Bacanu S, Wasserman L, Devlin B. Using linkage genome scans to improve power of association scans. Am J Hum Genet. 2006;78:243–252. doi: 10.1086/500026. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Roeder K, Devlin B, Wasserman L. Improving power in genome-wide association studies: weights tip the scale. Genet Epidemiol. 2007;31:741–747. doi: 10.1002/gepi.20237. [DOI] [PubMed] [Google Scholar]
- Rubin D, Dudoit S, van der Laan MJ.2006A method to increase the power of multiple testing procedures through sample splitting U.C. Statistical Applications in Genetics and Molecular Biology 5: article 19. 10.2202/1544-6115.1148 [DOI] [PubMed] [Google Scholar]
- Scherrer B. Biostatistique. G. Morin; Quebec: 1984. p. 850. [Google Scholar]
- Šidák Z. Rectangular confidence regions for the means of multivariate normal distributions. J Am Stat Assoc. 1967;62:626–633. doi: 10.2307/2283989. [DOI] [Google Scholar]
- Simes RJ. An improved Bonferroni procedure for multiple tests of significance. Biometrika. 1986;73:751–754. doi: 10.1093/biomet/73.3.751. [DOI] [Google Scholar]
- Wasserman L, Roeder K.2006Weighted hypothesis testing. (http://arxiv.org/abs/math.ST/0604172) (accessed July 5, 2007) [DOI] [PMC free article] [PubMed]