Abstract
Two general classes of multiple decision functions, where each member of the first class strongly controls the family-wise error rate (FWER), while each member of the second class strongly controls the false discovery rate (FDR), are described. These classes offer the possibility that optimal multiple decision functions with respect to a pre-specified Type II error criterion, such as the missed discovery rate (MDR), could be found which control the FWER or FDR Type I error rates. The gain in MDR of the associated FDR-controlling procedure relative to the well-known Benjamini-Hochberg (BH) procedure is demonstrated via a modest simulation study with gamma-distributed component data. Such multiple decision functions may have the potential of being utilized in multiple testing, specifically in the analysis of high-dimensional data sets.
Keywords: false discovery rate, family-wise error rate, missed discovery rate, multiple decision problem, multiple testing, strong control
1 Introduction and Motivation
Consider the problem of simultaneously testing M pairs of null and alternative hypotheses, (Hm0, Hm1), m = 1, 2, …, M, where M is a known positive integer. Such multiple testing problems arise in many scientific areas; see, for instance, [1,2] for concrete examples. A discovery is said to have been made in the mth testing problem if Hm0 is rejected. It is called a false discovery if a discovery is declared but in reality Hm0 is true. On the other hand, a nondiscovery is made at the mth testing problem if Hm0 is not rejected. It is called a missed discovery if a nondiscovery is declared when in reality Hm0 is false. For the simultaneous testing problem, two commonly-used global Type I error rates are the family-wise error rate (FWER), which is the probability of at least one false discovery, and the false discovery rate (FDR), which is the expectation of the ratio of the number of false discoveries over the number of discoveries.
A conventional testing paradigm employed in these situations is to decide on the collection of statistical tests for the M pairs of hypotheses, e.g., a t-test or a Mann-Whitney-Wilcoxon test for each pair, obtain the p-value for each test, and then use the resulting M p-values in the FWER-controlling sequential Šidák procedure, provided an independence condition is satisfied, or the FDR-controlling procedure in [3]. In this conventional approach, there appears to be no leeway in the choice of the multiple testing procedure the moment the individual test procedures have been chosen. Furthermore, since p-values are probabilities computed under distributions specified by the null hypotheses, it is not apparent whether these p-value-based procedures are actually taking into account the probabilities under the alternative hypotheses distributions. If they do not, then it goes against the Neyman-Pearson dictum that in the search for optimal test procedures, it is germane to consider probabilities under both the null and the alternative hypotheses.
This paper is primarily motivated by these issues. In particular, we pose the following question: If we are given the M test procedures for each of the M pairs of hypotheses, could we obtain classes of multiple testing procedures whose elements either control the FWER or the FDR? If this has an affirmative answer, then a multiple testing procedure within these classes which is optimal with respect to some Type II error rate criterion may be found. In turn, we may then be able to choose the starting collection of test functions that will provide the best multiple testing procedure. This is the spirit of this paper. We will in fact demonstrate that, under certain conditions, when given a collection of test functions for the M pairs of hypotheses, we can generate classes of multiple testing procedures controlling the FWER or the FDR. These results will have important implications in the search for optimal multiple testing procedures that control either of these Type I error rates. The main results in this paper were motivated by those in [4], which did not deal with classes of multiple testing procedures, but instead simply focussed in developing improved FWER and FDR-controlling procedures from the Neyman-Pearson most powerful tests for each of the M pairs of hypotheses.
We shall investigate these issues in a somewhat general abstract framework. We have opted for the additional abstraction in our mathematical framework in order to make concise and elegant the derivations and also to cover more general data structures that arise or may arise in such multiple testing problems. Such data structures are typically high-dimensional and of more complicated forms (e.g., data for the mth testing problem need not be a multi-sample data set but it could be a spatio-temporal, an image, or a shape data), such as those arising in neuroscience, genomics, proteomics, etc. However, in order to make the general results more accessible to the reader, we will demonstrate them in concrete and conventional settings. In line with this, we first discuss a concrete situation in Section 2 as a way of introducing issues of interest and to describe in this specific situation the major results of the paper.
There are certainly papers that have tried to improve on the p-value based approaches. One approach was to use weighted p-values such as in [5] which provides a FWER controlling procedure and in [6] which gives a procedure that controls the FDR. A variety of p-value weighting schemes have been proposed, usually relying on a posited model for the p-values and/or whether control of the FWER or FDR is desired; see, for instance, [7] for a review. Though our proposed testing procedures may be viewed as weighted p-value based, we mention at the outset that our approach is intrinsically different from the weighted p-value approach as expounded in [6] in the manner in which the p-value statistics are weighted. In [6], it is stated in their section 2 that:
Whatever information one uses to construct p-value weights, the weight assignment remains a guess. This guess is to be made a priori, that is before seeing the p-values. For purposes of analysis, we model the weights as random variables that are related to the underlying truth or falsehood of each null hypothesis.
In contrast, in our approach, the p-value weights arise from the alternative hypotheses of most interest. This is akin to determining an appropriate sample size to achieve a desired power for an alternative hypothesis of most interest. Our weights thus take into major consideration the powers of the individual tests at specified alternative hypotheses, hence are not based on a priori assessments of the truth or falsity of each of the null hypotheses.
There are other papers that provide general FWER- and FDR-controlling procedures and which aim for some optimality properties. The paper [8] provides two sufficient conditions for FDR control, while [9] concerns aspects of optimality. The papers [10,11] present sequential rejection procedures which control FWER. Recent papers dealing with multiple decision functions (MDFs) with certain optimality properties are [12,5,13,6,14–19]. On the other hand, papers proposing MDFs with a Bayes or empirical Bayes flavor, are [20,21,1, 22]; and more recently, [23,24].
2 Essence of General Results via Concrete Situation
Prior to embarking on our abstract development we first provide a concrete situation to illuminate the issues and to offer a glimpse of the major results in this paper. Consider the problem of simultaneously testing M pairs of hypotheses Hm0 : θm ≤ θm0 versus Hm1 : θm > θm0 (m ∈ ℳ ≡ {1, 2, …, M}), where θm0’s are known positive constants. The tests are to be based on data T = (Tm, m ∈ ℳ), where Tm has a gamma distribution with shape parameter nm, a known positive integer, and scale parameter θm, so that the marginal density function of Tm is
where I{·} is the indicator function and Γ(·) is the gamma function. The Sufficiency Principle allows us to simply focus on the Tm’s since if Tm1, Tm2, …, Tmnm are the observable independent and identically distributed (IID) exponential random variables with scale parameter θm, then is sufficient for θm and has a gamma distribution with parameter vector (nm, θm). Henceforth, we shall denote by 𝒢(·; n, θ) and 𝒢−1(·; n, θ) the distribution and quantile functions, respectively, of a gamma random variable with parameter vector (n, θ). We may also assume that (Tm, m ∈ ℳ0) is an independent collection of random variables, where ℳ0 = {m ∈ ℳ : Hm0 is true}. On the other hand, (Tm, m ∈ ℳ1}, where ℳ1 =ℳ \ ℳ0, need not be an independent collection of random variables. It is well-known that, marginally, the αm-size uniformly most powerful (UMP) test for Hm0 : θm ≤ θm0 versus Hm1 : θm > θm0 based on Tm is given by
Recall that a (non-randomized) test function δ is a {0, 1}-valued statistic such that δ = 0(1) means that Hm0 is not rejected (rejected). The collection δ = (δm, m ∈ ℳ), which is a mapping from the sample space of the (Tm, m ∈ ℳ) into {0, 1}M, is a specific case of a multiple decision or test function.
Since we are now performing M simultaneous tests, global measures of Type I error are required. The first such global measure of Type I error is the family-wise error rate (FWER). For the multiple testing function δ ≡ {δm(·; αm), m ∈ ℳ}, its FWER is
where P is the true probability measure governing the (Tm, m ∈ ℳ). Another global measure of Type I error in the simultaneous testing setting is the false discovery rate (FDR), which for the multiple testing function δ is defined via
We emphasize that the true probability measure P in the probabilistic evaluation in the FWER or the expectation evaluation in the FDR is not known. Furthermore, note that the set of indices ℳ0 also depends on this unknown P. Without knowing this P, if one is able to establish that either the FWER or the FDR is no more than a fixed specified level q ∈ [0, 1] by proper choice of the individual sizes αm’s, or perhaps by proper construction of a multiple testing procedure δ which may be of much more complicated form than the (δm (·; αm), m ∈ ℳ), e.g., each component δm may depend on all the Tm’s or it could have a sequential flavor, then strong control of either of these global Type I error measures is achieved.
There are existing multiple test functions that control either the FWER or the FDR which are anchored on the p-value statistics obtained from the individual test functions δm’s. For controlling the FWER we may, for instance, utilize the sequential Šidák procedure (see, also, [10]); while for FDR control a popular multiple test function is the Benjamini and Hochberg [3] (BH) procedure. In the context of the concrete situation described in this section, the main essence and contribution of this paper is the demonstration that, starting from the individual UMP tests δm’s, one could construct a class of multiple test functions whose members control the FWER and with this class including as a special case the Šidák procedure; as well as a class of multiple test functions whose members control the FDR and with the BH procedure being in fact a member of this class. We now describe specific subsets of these classes of multiple test functions. Let
Let γ = (γm, m ∈ ℳ) ∈ Γ. For each m ∈ ℳ, define the function Am : [0, 1] → [0, 1] via
| (1) |
Suppose that it is desired to control the FWER at level q ∈ (0, 1). For this purpose, define the statistic
where δm(tm; α−) ≡ limβ↑α δm(tm; β). Then, Theorem 1 on page 16 guarantees us that the multiple test function given by
| (2) |
in fact controls the FWER at q for any fixed γ ∈ Γ.
On the other hand, suppose that it is desired to control the FDR at level q. If we define the statistic
then Theorem 2 on page 16 assures us that the multiple test function
| (3) |
controls the FDR at level q for γ ∈ Γ satisfying additional conditions to be stated later. Observe that both δ† (T; q, γ) and δ*(T; q, γ) are so-called compound multiple test functions since their mth component functions utilize the whole data vector T, not just Tm.
If we take γm = 1/M for each m ∈ ℳ, then the multiple test function δ† (T; q, γ = 1/M) is the sequential Šidák procedure; while the multiple test function δ*(T; q, γ = 1/M) is the BH procedure. But since by varying the vector γ = (γm, m ∈ ℳ) we obtain many FWER- or FDR-controlling multiple test functions, there is now the possibility that by considering the different power characteristics of the individual test functions δm’s, possibly owing to differences in the values of θm0’s, nm’s and/or the effect sizes that we would like to detect, we may now be able to choose appropriately the γm’s to improve the global power of the resulting multiple test functions δ†(T; q, γ) or δ*(T; q, γ). The FWER-control or FDR-control for such a choice of γm’s are then guaranteed to be satisfied by virtue of the results concerning the class of multiple test functions. In a nutshell this concrete setting provides a glimpse of the essence of this paper. This concrete setting is also used in the simulation study presented in subsection 7.2.
Since the problem of multiple testing is truly a general problem with potentially complicated data structures, we have opted to treat this more abstractly. Undeniably, it makes the paper denser and harder to comprehend; however, through the abstraction, the results obtained and the proofs presented are more general in scope.
3 Mathematical Underpinnings
3.1 Decision-Theoretic Elements
Let (𝒳, ℱ, 𝒫) be a statistical model, so (𝒳, ℱ) is a measurable space and 𝒫 is a collection of probability measures on (𝒳, ℱ). Though not needed in the abstract development, we may adopt for concreteness the interpretation that 𝒳 is the range space of an observable random entity X arising from an experiment. We shall denote by P ∈ 𝒫 the true underlying, but unknown, probability measure on (𝒳, ℱ). All probability statements and expectations will therefore be with respect to P, while a statistical hypothesis will be a proposition regarding P. In the sequel, for a space 𝒮, σ(𝒮) will denote an appropriate sigma-field of subsets of 𝒮. In decision problems with action space 𝔄 = {0, 1}, such as in hypothesis testing, a nonrandomized decision function is a δ : (𝒳, ℱ) → (𝔄, σ(𝔄)). In hypothesis testing, given X = x ∈ 𝒳, a decision δ(x) = 0 corresponds to deciding in favor of the null hypothesis (H0), whereas δ(x) = 1, a so-called discovery, corresponds to deciding in favor of the alternative hypothesis (H1).
It suffices to restrict ourselves to nonrandomized decision functions since, through the use of an auxiliary randomizer, usually a standard uniform variable U independent of X, we may always convert a randomized decision function δ* : (𝒳, ℱ) → ([0, 1], σ[0, 1]) into a nonrandomized decision function δ : (𝒳 × [0, 1], ℱ ⊗ σ[0, 1]) → (𝔄, σ(𝔄)) via δ(x, u) = I {u ≤ δ*(x)}. Thus, in our general formulation, the sample space 𝒳 may actually represent a product space between a data space and [0, 1]. This framework is appropriate, for instance, when dealing with discrete data or when using nonparametric rank-based decision functions. For more discussions on this matter, see [4, 25].
Decision or test functions depend on a size parameter α ∈ [0, 1] as in the concrete situation in Section 2. To further demonstrate this idea, consider the problem of testing the null hypothesis H0 : μ = 0 versus the alternative hypothesis H1 : μ ≠ 0 based on a random observable X ~ N (μ, 1), where N (a, b2) is a normal distribution with mean a and variance b2. Note that it suffices to consider this one-dimensional X by virtue of a Sufficiency reduction. The size-α test δ : 𝒳 ≡ ℜ → {0, 1} has δ(x; α) = I{|x| > Φ−1(1−α/2)}, where Φ−1(·) is the quantile function of the standard normal distribution. Henceforth, to simplify our notation, we adopt a functional notation where δ(α) represents the statistic defined on 𝒳 according to x ↦ δ(x; α). When we view this as a stochastic process in α, we obtain the notion of a (nonrandomized) decision process as introduced in [4], which is a stochastic process Δ = {δ(α) : α ∈ [0, 1]} where, ∀α ∈ [0, 1], δ(α) is a decision function, and such that the following conditions are satisfied.
-
(D1)
δ(0) = 0 and δ(1) = 1 a.e.-P.
-
(D2)
The sample paths α ↦ δ(α) are, a.e.-P, {0, 1}-valued step-functions which are nondecreasing and right-continuous.
3.2 Multiple Decision Functions
Let ℳ be a known finite set with |ℳ| = M. An ℳ-indexed multiple decision problem is one whose action space is 𝔄M. In the context of multiple hypotheses testing, for each m ∈ ℳ, there is a pair of hypotheses Hm0 and Hm1. Of interest is to simultaneously decide between Hm0 and Hm1 for each m ∈ ℳ. A multiple decision function (MDF) is a δ = (δm : m ∈ ℳ), where δm is a decision function. Thus, δ : (𝒳, ℱ) → (𝔄M, σ(𝔄M)). A multiple decision process (MDP) is a Δ = (Δm : m ∈ ℳ), where Δm = {δm(α) : α ∈ [0, 1]} is a decision process.
Let ℳ0 ≡ ℳ0(P) and ℳ1 ≡ ℳ1(P) be subsets of ℳ such that
We shall assume that the following condition holds.
-
(D3)
{Δm : m ∈ ℳ0(P)} and {Δm : m ∈ ℳ1(P)} are independent of each other and the elements of {Δm : m ∈ ℳ0(P)} are independent.
In multiple hypotheses testing, Hm0 is true under P if and only if m ∈ ℳ0(P). Observe that the elements of {Δm : m ∈ ℳ1(P)} need not be independent. In many cases, such as in the illustration in Section 2, the Δm test process may just be a function of Xm which is specific for the mth decision problem. Condition (D3) will then be satisfied if {Xm, m ∈ ℳ0(P)} is an independent collection. However, there are cases where the M test processes may be using the same data X, and in such a case we require that the independence condition in (D3) should hold.
In addition, we shall also assume the condition that
-
(D4)
∀m ∈ ℳ0(P), ∀α ∈ [0, 1]: EP {δm(α)} = α. The collection of all ℳ-indexed multiple decision processes satisfying conditions (D1)–(D4) will be denoted by 𝔇. We remark that the requirement of equality in (D4), given by EP {δm(α)} = α, will be fulfilled in many situations since an auxiliary randomizer is incorporated in our framework. However, there may still be situations when dealing with non-regular families of distributions (e.g., uniform family of distributions) where this condition may not be satisfied. The latter will manifest itself when the decision function achieves power one but with its size not yet equal to one.
3.3 Multiple Decision Size Functions
Let A = (Am : m ∈ ℳ) be an ℳ-indexed collection of measurable functions with Am : ([0, 1], σ[0, 1]) → ([0, 1], σ[0, 1]). We shall say that A is a multiple decision size function if it satisfies the following three conditions:
-
(A1)
∀m ∈ ℳ: Am(0) = 0 and Am(1) = 1.
-
(A2)
∀m ∈ ℳ: α ↦ Am(α) is strictly increasing and continuous.
-
(A3)
∀α ∈ [0, 1] : Πm∈ℳ[1 − Am(α)] ≥ 1 − α.
The idea behind the introduction of these size functions is that they will serve as ‘size-pickers’ for each of the individual test functions when a global Type I error level α is specified. Thus, given an α, the multiple decision function that will be of interest is (δm[Am(α)], m ∈ ℳ). Conditions (A1) and (A2) are intuitive requirements for size-pickers. On the other hand, Condition (A3) guarantees that the multiple decision size function (δm[Am(α)], m ∈ ℳ) achieves a weak FWER of no more than α. The weak FWER is the FWER under a P such that all the Hm0’s are true. We denote the collection of all ℳ-indexed multiple decision size functions by 𝔖.
When we deal with FDR-controlling multiple decision functions, we need an additional condition that controls the interplay among the multiple decision process Δ, the multiple size function A, and the underlying probability measure P. This condition is as follows:
-
(C)The multiple decision process Δ = {Δm : m ∈ ℳ}, multiple size function A = {Am : m ∈ ℳ}, and underlying probability measure P satisfy, whenever |ℳ0(P)| < M,
with
Note that α(1) is a random variable, and it is the minimum of the so-called generalized p-value statistics which are formally defined in (12). This statistic α(1) could be interpreted as the smallest global error rate that leads to a rejection of at least one Hm0’s when using the multiple decision process Δ and multiple size function A. Verifying the condition in (C) is not a trivial matter because of the randomness of α(1) whose distribution depends on the underlying unknown probability measure P. A stronger but perhaps easier to verify condition, which implies the condition in (C), is that
| (4) |
As mentioned above, condition (C) depends on the true probability measure P. Since P is not known, we do not know ℳ0(P). However, we may have an idea, possibly justified by sparsity of signals considerations, that the ratio M0/M, where M0 = |ℳ0(P)|, is no more than some value B with 0 < B < 1. Condition (C) is then implied by
| (5) |
where Ā(α) = ∑m∈ℳ Am (α)/M. Visually, this condition implies that each of the size functions must be inside the upper envelope U(α) determined by
for α ∈ [α(1), 1]. In essence, none of the size functions should dominate the other size functions, which is akin to Noether’s condition when dealing with the asymptotics of rank-based statistics.
A particular element of 𝔖 is the Šidák multiple decision size function (cf., [26]) with
| (6) |
This Šidák size function will play a central role in the proof of Theorem 2 which deals with multiple decision functions controlling the false discovery rate. Note, however, that the Bonferroni size function with
| (7) |
does not belong to 𝔖 since Am(1) < 1, hence condition (A1) is not satisfied. The fact that this Bonferroni size function does not even satisfy condition (A1) is testament to its being so conservative to the extent that it could not be considered among multiple decision size functions that could potentially lead to optimal multiple decision functions. For, if in the extreme case, we set FWER equal to α = 1, then any multiple decision function whose components are all identically equal to one will satisfy such an FWER control, and we could therefore set Am(1) = 1 for all m ∈ ℳ.
The Šidák multiple decision size function does not depend on m. We provide examples of multiple size functions that depend on m. The first one is that considered in Section 2 where we have, for γ = (γ1, γ2, …, γM) ∈ Γ,
We remark that the γm’s may also depend on α, as will be the case in a later concrete example. The collection {Am(α; γm) : m ∈ ℳ} clearly satisfies conditions (A1) and (A2), and if ∑m∈ℳ γm ≤ 1, then condition (A3) is also satisfied. Equality in condition (A3) is actually achieved by having ∑m∈ℳ γm = 1. Condition (C) in this special case becomes:
Observe that if all Hm0’s are false, so that ℳ0 = ∅, then the condition above is automatically satisfied; while if all the Hm0’s are true, so that ℳ0 = ℳ, then we are forced to have γm = 1/M, m = 1, 2, …, M, which leads to the Šidák multiple size functions. In practice, we will have M0/M bounded above by some number B ∈ (0, 1), and in this case a sufficient condition for the above condition to hold is
If the γm’s do not depend on α, then for this class of multiple size functions the sufficient condition in (4), via L’Hôpital’s Rule and focusing on the limit as α ↓ 0, is implied by the condition that , where . To see this, define the function g on [0, 1] by
The assertion above follows if we could show that g(α) is a decreasing1 function of α. This is the same as showing that
is decreasing in α. But this is immediate if we could show that, for every m = 1, 2, …, M,
is a decreasing function of α ∈ [0, 1]. To show that this is decreasing in α, observe that for 0 ≤ γ(m) < γ(M) ≤ 1, as α decreases from 1, (1 − α)γ(M) decreases at a faster rate that (1 − α)γ(m) and both are zero at α = 1. Consequently, as α decreases from 1, 1 − (1 − α)γ(M) decreases at a slower rate than 1 − (1 − α)γ(m), and since both are in [0, 1], then it follows that h(α) increases as α decreases from 1, that is, that h(α) is a decreasing function of α. Albeit a more formal way of showing that h(α) is decreasing in α is to show that its derivative is negative. We leave this alternative proof as a leisurely exercise for the curious reader.
The next example of a multiple decision size function is derived through an optimization of a global power of the multiple decision function. Let Xm ~ N(μm, 1) and consider testing Hm0 : μm ≤ 0 versus Hm1 : μm > 0. The usual test function of size ηm for Hm0 versus Hm1 is
Let {γm, m ∈ ℳ} be a collection of known positive reals, which could be viewed as the effect sizes of interest. The power of δm at μm = γm is given by
The weak family wise error rate of the associated multiple decision function is
where P0 is the probability measure associated with μm = 0 for all m ∈ ℳ. Suppose we seek to control this weak FWER at an overall level α ∈ [0, 1], so that condition (A3) is necessarily satisfied, but at the same time maximize the overall power of the multiple decision at μm = γm, m = 1, 2, …, M, given by
Then, (cf., [4]) the multiple size function {Am(α), m ∈ ℳ} should satisfy the two conditions: (i) for some λ ∈ ℜ,
where ϕ(·) is the standard normal density function, and (ii) ∑m∈ℳ log[1 − Am (α)] = log(1 − α). If the effect sizes γm’s are not all identical, then these size functions are not identical, that is, they will depend on m. Note that some constraints will need to be imposed on the γm’s in order for the resulting size functions {Am(·) : m ∈ ℳ} to satisfy condition (C).
Let us actually implement the optimality prescription above for a somewhat more restrictive family of multiple decision size functions given by the form
| (8) |
with κm ∈ [0, 1]. Utilizing the Gaussian setting of the preceding example, we therefore seek the optimal κm’s in order to control the weak FWER at level α and at the same time maximize the overall power. We will immediately impose the condition that Am(α)’s satisfy . Since 1 − Am(α) = (1 − α)κm, m = 1, 2, …, M, then it follows that . Within this family of multiple size functions, the condition for optimality is that, for some λ > 0, we must have, for every m = 1, 2, …, M, that
| (9) |
Letting bm ≡ bm(α) = bm(α; κm) = Φ−1((1 − α)κm), m = 1, 2, …, M, the equation in (9) simplifies to
| (10) |
Observe that the condition is equivalent to . Combining, we therefore obtain an expression for λ given by
| (11) |
Substituting into (10), we get the set of equations determining the optimal bm’s to be, for m = 1, 2, …, M,
We summarize these results into a proposition.
Proposition 1 For the family of multiple decision functions of form Am(α) = 1 − (1 − α)κm, m = 1, 2, …, M, the optimal κm’s for weak FWER control at α for the Gaussian model are given by
where the ’s solve in b = (b1, b2, …, bM)t the set of equations given by
where Dg(a) represents the diagonal matrix with diagonal elements from vector a, 1 is a M × 1 vector of 1’s, γ = (γ1, γ2, …, γM)t, , and
Clearly, to compute the optimal ’s, and hence the ’s and the optimal size functions, requires numerical methods. Furthermore, we point out that the optimal ’s depend on α. Thus, there arise some sort of dynamic adjusting of the optimal sizes to use for each of the decision functions in accordance with the weak FWER level α that is being used. We implemented the computation of the above functions using a Newton-Raphson procedure. For demonstration we took M = 5 and specified the effect sizes γm, m = 1, 2, …, 5, by taking the absolute values of five generated random variates from a standard normal distribution. The resulting effect sizes were γ = (1.08, 1.43, 0.19, 1.73, 0.10). Figure 1 presents in the first plot frame the functions bm(α), m = 1, 2, …, 5, the second plot frame provides the functions Am(α), m = 1, 2, …, 5, and the third plot frame provides the functions κm(α), m = 1, 2, …, M, for α ranging from .001 to .999 with .001 increments. Observe that the size functions are intersecting which is due to the dependence of the optimal κm’s to the value of α, a manifestation of the dynamic adjusting that is alluded to above. We actually point out that the second plot frame provides the optimal size functions without restricting to the Šidák-type size functions. See Theorem 3 for the general set of equations for determining the optimal size functions and from which the results above is a special case.
Fig. 1.
Demonstration of the optimal size functions in the restricted family of size functions for effect sizes γ = (1.08, 1.43, 0.19, 1.73, 0.10) with M = 5. The first plot frame contains the plots of the α ↦ bm(α), functions, the second plot frame is for the α ↦ Am(α) = 1 − (1 − α)κm(α) functions, and the third plot frame is for the α ↦ κm(α) functions.
We also recall the notion of generalized p-value statistics; see [4]. Given a Δ ∈ 𝔇 and an A ∈ 𝔖, for m ∈ ℳ we define the random variable
| (12) |
The collection (αm(Δ, A) : m ∈ ℳ) is called the vector of generalized p-value statistics associated with the pair (Δ, A). Observe that the usual p-value statistic associated with δm is Pm = Am(αm), hence the use of the adjective generalized for the αms. We shall assume that these generalized p-values are a.e. [P] distinct. In the multiple testing literature, there is also the notion of adjusted p-value statistics; see, for instance, pages 32–34 of [27]. The adjusted p-value associated with the mth null hypothesis Hm0 is defined to be the smallest global error rate (e.g., FWER, FDR) so that Hm0 is still rejected. Examining the definition of the generalized p-value statistics, we note that αm(Δ, A) could also be viewed as the smallest global error rate α such that the mth test function δ[Am(α)] rejects Hm0. In some sense, therefore, the adjusted p-value statistics and the generalized p-value statistics are related in that both are global error rates leading to the rejection of null hypotheses. Perhaps, our notion of a generalized p-value statistic is more general than the notion of an adjusted p-value statistic since we are allowing more general multiple size functions as pickers of the specific test functions to use at each of the components of the multiple decision problem.
4 Main Theorems and Classes of MDFs
We shall present in this section the two main results that will enable the construction of the classes of multiple decision functions controlling FWER and FDR. Recall that P denotes the true, but unknown, underlying probability measure acting on the space (𝒳, ℱ).
Given a Δ = {Δm : m ∈ ℳ} ∈ 𝔇 and an A = {Am : m ∈ ℳ} ∈ 𝔖, define the stochastic processes S0 = {S0(α) : α ∈ [0, 1]}, S = {S(α) : α ∈ [0, 1]}, and F = {F(α) : α ∈ [0, 1]} via
| (13) |
| (14) |
| (15) |
with the convention that 0/0 = 0. These quantities have the following interpretations. Given an α ∈ [0, 1], for each m ∈ ℳ, the decision function whose size is Am(α) is chosen from Δm, and the MDF δ(α) ≡ (δm[Am(α)] : m ∈ ℳ) is employed in the decision-making. For this MDF δ(α), S0(α) is the number of false discoveries, S(α) is the number of discoveries, and F(α) is the proportion of false discoveries among all discoveries. Observe, however, that since P is unknown, both S0 and F are unobservable, whereas S is observable.
For q ∈ [0, 1], let us also define the random variables
| (16) |
and
| (17) |
In essence, α†(q) is a first crossing-time random variable, whereas α*(q) is a last crossing-time random variable. The forms of these two random variables were motivated and justified in Sections 6 and 7 in [4] for a specific multiple decision size function, but the justifications in that paper carry over to the more general setting considered here.
The two main results of this paper are contained in Theorem 1 and Theorem 2. We present these theorems, but defer their proofs to Section 5 after some discussion about their implications and potential usefulness.
Theorem 1 Under conditions (D1)–(D4) for 𝔇 and (A1)–(A3) for 𝔖,
Observe that EP {I{S0(α†(q; Δ, A); Δ, A) ≥ 1}} is the FWER since it is the probability of committing at least one false discovery under P. Thus, Theorem 1 shows that for any q ∈ [0, 1], any multiple decision process Δ ∈ 𝔇, and any multiple decision size function A ∈ 𝔖, the MDF defined via
| (18) |
strongly controls the FWER at q.
Theorem 2 Under conditions (D1)–(D4) for 𝔇 and (A1)–(A3) for 𝔖, then for every Δ ∈ 𝔇 and every A ∈ 𝔖 satisfying condition (C),
Note that EP {F(α*(q; Δ, A); Δ, A)} is the false discovery rate (FDR) as defined in the seminal paper of [3]. The implication of Theorem 2 is that if, for each q ∈ [0, 1], and for any multiple decision process Δ ∈ 𝔇 and multiple decision size function A ∈ 𝔖 satisfying (C), we define the MDF
| (19) |
then δ*(q) is an MDF that (strongly) controls the FDR at q.
The importance of the preceding results is that each multiple decision process Δ ∈ 𝔇 may have an associated multiple decision size process A ≡ A(Δ) ∈ 𝔖 such that the resulting multiple decision functions δ†(q) or δ*(q) possess some optimality property, for example, with respect to the missed discovery rate, a Type II error rate. To define this rate, let
| (20) |
The quantity M(α) has the interpretation of being the proportion of missed discoveries relative to the number of correct alternative hypotheses. Then, the missed discovery rate (MDR) of the MDF in (19) is
For the given Δ, with proper choice of A, we may be able to find an MDF that strongly controls the FWER or the FDR, and which possesses an optimality property with respect to another criterion, such as having a small MDR. This idea was implemented in a more restricted setting in [4] when each of the pairs of hypotheses contained simple null and simple alternative hypotheses. We point out that previous works have usually focussed in developing a particular MDF and then verifying that it controls the FWER or the FDR, such as, for example, in [3] (more comprehensively, see [27]), with notable exceptions being the papers [6,8,7,28,9,18]. It is our hope that by providing a class of MDFs where each member strongly controls the FWER, given by
| (21) |
or a class of MDFs where each member controls the FDR, given by
| (22) |
then we acquire the possibility of selecting from these classes MDFs that possess other desirable properties with respect to some suitable Type II error rate, such as the MDR. In Section 7 we provide further discussions regarding this optimality issue and present efficiency comparisons to demonstrate the viability of our idea.
5 Proofs of Theorems
The proofs of the two theorems are analogous to those of Theorem 6.1 and Theorem 7.1 in [4] which can be found in the supplemental article [29]. Note that those proofs were for special forms of the multiple decision process and multiple decision size function, whereas in the current paper we are dealing with an arbitrary element Δ ∈ 𝔇 and an arbitrary element A ∈ 𝔖. In the proofs below, we assume that Δ ∈ 𝔇 and A ∈ 𝔖 have been chosen and are fixed. Also, q ∈ [0, 1] and recall that P ∈ 𝒫 denotes the true but un-known underlying probability measure. The dependence on (Δ,A,P) of some of the relevant processes and quantities below will not be explicitly written for brevity, unless needed for clarity.
5.1 Establishing Theorem 1
Proof We start by defining the stochastic process H1 = {H1(α) : α ∈ [0, 1]} via
| (23) |
The sample paths of this process are, a.e. [P], left-continuous with right-hand limits (caglad), are piecewise nonincreasing, and with
for every α ∈ (0, 1), where the first inequality is due to property (A3). In fact, by virtue of property (A1) and property (D1), note that
Now, in terms of H1, we have that
Since, as pointed out above, we have 1 − α ≤ H1(α), then by its definition, we must have α†(q) ≥ q. This implies that
| (24) |
For the quantity of main interest in the theorem, we have
The last probability cannot, however, be written as a product of probabilities since the δm(Am(α†(q))) for m ∈ ℳ0(P) need not be independent owing to the dependence on α†(q) which is determined by all the (Δm,m ∈ ℳ). On the other hand, we do have the set equality
| (25) |
where the αms are the generalized p-value statistics defined in (12).
Next, define the stochastic process H2 = {H2(α) : α ∈ [0, 1]} via
Analogously to the H1 process, this has caglad sample paths. Let us then define the quantity
Note that this is not a random variable since this depends on the unknown probability measure P, in contrast to α†(q). Furthermore, also note that
| (26) |
From their definitions, H1(α) ≥ H2(α), so that H1(α) < 1 − q implies H2(α) < 1 − q. Consequently,
| (27) |
Now, the importance of the quantity α#(q) arises because of the crucial set equality
| (28) |
To see this equality, first observe that the inclusion ⊆ immediately follows from (27). To prove the reverse inclusion, since {α#(q) < minm∈ℳ0(P) αm} implies that, for some α0 < minm∈ℳ0(P) αm, we have H2(α0) < 1 − q. But for such an α0, we will have δm(Am(α0)−) = 0 for all m ∈ ℳ0(P), so that
Consequently,
The reverse inclusion ⊇ thus follows, completing the proof of (28).
By (25), (28), and the iterated expectation rule, it now follows that
Since α#(q) is measurable with respect to the sub-σ-field σ(δm : m ∈ ℳ1(P)), whereas minm∈ℳ0(P) αm is measurable with respect to the sub-σ-field σ(δm : m ∈ ℳ0(P)), then by condition (D3), α#(q) and minm∈ℳ0(P) αm are independent. Furthermore, by condition (D3), we obtain
with the last equality a consequence of condition (D4). Therefore,
with the last inequality following from (26). Thus, finally, we have
This completes the proof of Theorem 1.
We remark that condition (D4) can be weakened to just having
| (29) |
to still get the desired strong FWER control. This is so since in the portion of the proof where we have
we simply replace the second = sign by ≥ and then the proof of the theorem goes through.
5.2 Establishing Theorem 2
Proof As we mentioned, the proof of Theorem 2 mimics that of Theorem 7.1 in [4] as presented in [29]. As an aside, we remark that the seed of the idea of providing a class of FDR-controlling multiple decision functions was planted to us upon realizing that the proof of the afore-mentioned Theorem 7.1 is functionally independent of the choice of the multiple decision size function.
The case with q = 0 is trivial since then α*(0) = 0, so that F(α*(0)) = 0. Thus we restrict to q ∈ (0, 1]. We first consider the case where P is such that |ℳ0(P)| < M. By the defining property of α*(q) given in (17), we have that
| (30) |
where A•(α) = ∑m∈ℳAm(α). Consequently, from (15),
| (31) |
For α ∈ [0, 1], define the sub-σ-field
| (32) |
Observe that 𝔉 = (ℱα : α ∈ [0, 1]) is a decreasing collection of sub-σ-fields of ℱ. By its definition α*(q) is an 𝔉-stopping time.
Let us define the process T0 = (T0(α) : α ∈ [0, 1]) according to
Fix 0 ≤ α ≤ β ≤ 1. Then, since δm ∈ {0, 1}, we have
with the last two equality only up to P-equivalence. The second equality follows from (D3), whereas the second-to-last equality follows since
because of condition (A2) for the Am(·)s and conditions (D2) and (D4) for the δm(·)s. The above results show that, under P, {(T0(α),ℱα) : α ∈ [0, 1]} forms a reverse martingale process. Further, observe that T0(1) = |ℳ0(P)| a.e. [P] due to conditions (D1) and (A1). Thus, EP(T0(1)) = |ℳ0(P)|.
From the inequality in (31), and also noting that α*(q) ≥ α (1) to get the second inequality, we obtain
where the last inequality is obtained using condition (C), while the third-to-last equality obtains by invoking the Optional Sampling Theorem for (reverse) martingales (cf., [30]), and the second-to-last equality because of EP[T0(1)] = |ℳ0(P)|.
At this point, note that since the Šidák multiple decision size function AS always satisfies condition (C) for all P ∈ 𝒫, including those with ℳ0(P) =ℳ, then ∀Δ ∈ 𝔇, ∀P ∈ 𝒫, we have the property
| (33) |
We now consider the case when P is such that ℳ0(P) = ℳ. Denote by 𝒫0 = {P ∈ 𝒫 : ℳ0(P) = ℳ}. Consider an arbitrary A ∈ 𝔖 and P ∈ 𝒫0. We need to establish that EP {F(α*(q;Δ,A);Δ,A)} ≤ q. For such a P ∈ 𝒫0, we have F(α;Δ,A) = I{S(α;Δ,A) > 0}, so that
We have, for any Δ ∈ 𝔇 and any A ∈ 𝔖, that
| (34) |
In Lemma D.1 of [29] it was established, using an inequality of [31], that for Wm(ηm),m ∈ ℳ, independent Bernoulli(ηm) random variables with ηm ∈ [0, 1] and satisfying ∏m∈ℳ(1 − ηm) = 1 − α, for each t ≥ 1,
| (35) |
where , m ∈ ℳ.
Noting that, under P ∈ 𝒫0, δm(Am(α))s are independent Bernoulli(Am(α)), then by using the inequality in (35) and condition (A3), it follows that for q ∈ (0, 1],
| (36) |
where the Šidák sizes in (36) have components
with α+ satisfying ∏m∈ℳ[1 − Am(α)] = 1 − α+. Observe that by (A3), we have necessarily that α+ ≤ α. Combining the results in (34) and (36), we obtain
But since we have already established that, for P ∈ 𝒫0, we have
then it follows that P{α*(q;Δ,A) > 0} ≤ q. This implies finally that
for any P ∈ 𝒫0. This completes the proof of Theorem 2.
Short and neat proofs of FDR control for multiple decision functions based on p-value statistics are also provided in Theorem 4.1 of [18] and in Proposition 2.7 of [8]. The two sufficient conditions in the latter paper do not cover our set-up since the size functions need not belong to their factorized threshold collection. In addition, their dependency control condition appear to be quite distinct from our condition (C). It is not clear to us whether similar short proofs could be employed in establishing our Theorem 2 using the representation of δ*(q) in terms of the generalized p-value statistics given in Section 6. Also, in contrast to Theorem 1 where we were able to have the weaker version of condition (D4) given in (29), we could not do this for Theorem 2. The reason is that we could not conclude under this weaker condition that the process {(T0(α),ℱα) : α ∈ [0, 1]} is a reverse supermartingale, which would have allowed us to get the desired result. It may be possible that under certain situations we do have this supermartingale property, but the weaker condition (29) appears not sufficient for this property to hold in general.
6 Representations of MDFs in Terms of the Generalized p-Values
This section expresses the MDFs δ†(q) in (18) and δ*(q) in (19) in terms of the generalized p-value statistics defined in (12). Without loss of generality, let ℳ = {1, 2…, M}. Define the vector of anti-rank statistics via
| (37) |
where 𝔐 is the space of all possible permutations of ℳ, and such that
Let us first consider the random variable α†(q) in (16). We see from its definition and those of the generalized p-value statistics that, for some J ∈ ℳ̄ ≡ {0} ∪ ℳ, we have α†(q) ∈ [α(J), δ(J+1)) if and only if
From the definition of the generalized p-value statistics we further have
Consequently, by defining the ℳ̄-valued random variable
| (38) |
we have the result that α†(q) ∈ [α(J†(q)), α(J†(q)+1)). As a consequence we obtain the representation of δ†(q) in (18) in terms of the αms given by
| (39) |
where we used the fact that, for each m ∈ ℳ, δm is constant in the interval
Next let us consider the random variable α*(q) in (17). We may re-express its defining equation via
But, since ∑m∈ℳ δ(m) [A(m)(α(j))] = j, then α*(q) ∈ [α(J), α(J+1)) iff
Defining the ℳ̄-valued random variable
| (40) |
we then have that α*(q) ∈ [α(J*(q)), α(J*(q)+1)]. As a consequence, an equivalent representation of the MDF δ*(q) in (19) in terms of the αms is provided by
| (41) |
The representations in (39) for δ†(q) and (41) for δ*(q) provide alternative computational approaches since, instead of computing α†(q) and α*(q), we may simply compute the generalized p-values, then J†(q) and J*(q), and then finally the realizations of the decision functions.
For a simple application, let us see what becomes of the MDFs δ†(q) and δ*(q) if we use the Šidák multiple decision size function AS given in (6).We use the alternate representations just obtained above. By simple manipulations, we immediately obtain that
But, for these Šidák size functions, the (ordinary) p-value statistics are given by
Re-expressing the J†(q) and J*(q) in terms of these p-values, we easily obtain by simple manipulations that
| (42) |
| (43) |
Observe that J†(q) in (42) leads to the step-down sequential Šidák FWER-controlling procedure, see [10,27]; whereas, J*(q) in (43) is the usual form of the step-up Benjamini-Hochberg FDR-controlling procedure in [3]. Thus, through the Šidák sizes, we are able to obtain from our formulation two popular MDFs for FWER and FDR control as special cases of the MDFs δ†(q) and δ*(q)!
7 On Developing Optimal MDFs
7.1 Computing Optimal Size Functions
In this subsection, we demonstrate the potential utility of the classes of MDFs arising from Theorems 1 and 2 in the context of obtaining MDFs with some optimality properties, especially in non-exchangeable multiple hypotheses testing settings, which include those where the power characteristics of the M test functions are not identical.
Consider a fixed multiple decision process Δ ∈ 𝔇 and fix a probability measure P1 ∈ 𝒫, with ℳ1(P1) = ℳ. Recall that in the single hypothesis testing situation, focus on the power of a test procedure is usually on a specific alternative determined by a specified effect size ideally set at the design of experiment stage so as to determine an appropriate sample size. The fixed P1 is tantamount to specifying the alternative P that is of most interest.
Define the mappings πm : [0, 1] → [0, 1] for m ∈ ℳ according to
| (44) |
When viewed as a function of P1, πm(α; ·) is the power function of δm when it is allocated a size of α. Of interest to us, though, is to view it as a function of α for the fixed P1. In this case, πm(·;P1) is the receiver operating characteristic curve (ROC) of the mth test or decision process. Assume that for each m ∈ ℳ, the mapping α ↦ πm(α; P1) is strictly increasing with πm(1; P1) = 1 and twice-differentiable.
Suppose we desire to strongly control the overall FWER or FDR at some pre-specified level q ∈ [0, 1], but at the same time maximize the total (or average) power at P = P1. Our idea is to first obtain the optimal multiple decision size function for weak FWER control associated with Δ, denoted by . This is the multiple decision size function A satisfying the condition ∀α ∈ [0, 1] : ∏m∈ℳ[1 − Am(α)] = 1 − α, and such that the total power at P = P1, given by ∑m∈ℳπm(Am(α);P1), is maximized. We formally present this optimal multiple decision size function in the following theorem.
Theorem 3 Assume that for the multiple decision process Δ = (Δm : m ∈ ℳ), each of the M ROC functions α ↦ πm(α; P1) are strictly increasing, concave, and twice-differentiable (in α) with first and second derivatives and , respectively, so that . Then the multiple decision size function A = (Am(α) : m ∈ ℳ) maximizing the global power of the multiple decision function δ(α) = (δm[Am(α)] : m ∈ ℳ) under weak FWER-control at level α is satisfying the two sets of conditions
- For some λ ∈ ℜ+ and for each m ∈ ℳ,
.
Equivalently, for fixed α ∈ (0, 1), , m ∈ ℳ, with z = (z1, …, zM)t satisfying the set of M equations
where , l(z) = (log(1 − zm), m = 1, 2, …, M)t, I is the M × M identity matrix, J = 11t, and 1 (0) is the M × 1 vector of 1’s (0’s).
Proof The proof of this theorem is straightforward by using Langrangian optimization which leads to the two conditions (i) and (ii). To obtain the equivalent set of equations, sum the M elements in item (i), use the constraint in item (ii), plug-in expression for log(λ) into the equations in item (i), and finally convert into vectors and matrices. The concavity of the ROC functions guarantees that a maximizer is obtained.
Conditions (i) and (ii) in Theorem 3 are analogous to the conditions in Theorem 4.3 in [4] which dealt with the situation when the individual test functions coincide with the Neyman-Pearson most powerful tests. The concavity of the ROC functions is critical and guarantees that these functions are above the 45-degree line, which implies that each δm(α) is an unbiased test function, that is, performs better than the test function which does not use the data xm but always rejects Hm0 with probability α. The alternative set of equations in Theorem 3 is conducive to a Newton-Raphson iteration for finding the (approximate) solution z. To perform this iterative approach, it is computationally more stable to perform the iterations on the v = (υ1, υ2, …, υM)t with
Thus, zm ≡ zm(υm) = exp(υm)/(1 + exp(υm)) for m = 1, 2, …, M. Define the mappings
where the matrices D1(v) and D2(v) are defined, respectively, via
In the above expressions, Dg(w1, w2, …, wM) means the diagonal matrix with elements w1, w2, …, wM. The Newton-Raphson iteration updating for v then proceeds via
Seed values for the υm’s are those associated with the Šidák sizes zm = 1 − (1 − α)1/M for m = 1, 2, …, M. A possible computational bottleneck in this approach is the invertion of the potentially huge matrix H. If this becomes problematic, then one may revert to directly using conditions (i) and (ii) in Theorem 3 as was done in the computational portion in [4].
Having determined the optimal multiple decision size function A* associated with Δ, which is at this point optimal only in the sense of weak FWER control, we can then apply Theorem 1 to obtain the MDF δ†(q; Δ, A*) which will strongly control the FWER at q; or apply Theorem 2 to obtain the MDF δ*(q; Δ, A*) which will (strongly) control the FDR at q. By virtue of the choice of the size process A*, which is tied-in to the multiple decision process Δ and the target probability measure P1, we expect that the MDFs δ†(q; Δ, A*) and δ*(q; Δ, A*) will perform better with respect to overall power at P1 relative to, for example, the sequential Šidák MDF or the BH MDF, which we saw from the preceding section are MDFs arising from the Šidák multiple decision size function, a size function that may not be optimal for the chosen multiple decision size process Δ.
7.2 Results of a Modest Simulation Study
To demonstrate the improvement in performance of the proposed procedures, we provide results of a simulation study showing the gain in global power, or the decrease in the missed discovery rate, of the proposed MDF. We focus only on the FDR-controlling procedure in this simulation, and compare its performance with the BH procedure. Note that results of a simulation study was presented in [4] demonstrating the improvement over the BH procedure of the MDF δ* in a Gaussian setting. In the current simulation study, we consider a non-Gaussian model.
The model considered in the simulation study has Tm ~ Ga(nm, 1/θm) for m = 1, 2, …, M, so that E[Tm] = nθm. We assumed that the Tm’s are independent random variables. The multiple decision problem is to decide, for each m, whether Hm0 : θm ≤ θm0 or Hm1 : θm > θm0. Denote by 𝒞(·; k), c(·; k), and 𝒞−1(·; k) the distribution, density, and quantile functions, respectively, of a chi-squared random variable with degrees-of-freedom k. The uniformly most powerful (UMP) decision process for testing Hm0 versus Hm1 using only Tm is given by Δm = (δm(·; α) : α ∈ [0, 1]) where
The associated P-value statistic for this test process is
so the decision function δm(tm; α) could also be expressed via
For a fixed θm1, which exceeds θm0, the ROC function associated with the decision process Δm is easily seen to be
where ξm = θm0/θm1, which can be viewed as the effect size, though it might be more fitting to call this as the reciprocal of the usual effect size. It is then straight-forward to verify that
These are the functions that are needed to implement the Newton-Raphson iterative procedure described in the preceding subsection for finding the optimal size functions for weak-FWER control, and consequently for finding α*(q). We implemented this algorithm using an R program and with use of the pchisq, dchisq, and qchisq objects. A computational limitation encountered is when Am(α) becomes very, very small which causes qchisq to output either an Inf or an NaN. When this is encountered for a generated sample, the sample is discarded from the simulation. In the simulation runs, this limitation arose in Table 1 only for the case with (n, p) = (30, .50). For this case, 120 samples got discarded prior to obtaining 1000 successful replications.
Table 1.
Results of simulation study for gamma-distributed component data. The effects sizes were generated according to ξm ~ U[.25, .75] for m = 1, 2, …, M. The FDR threshold was set to q = 20%. The theoretical proportion of correct Hm1’s is p. The number of simulation replications for each combination of simulation parameters was 1000. Note that the estimated FDRs and MDRs are in percentages.
| M | nm = n | p | BH | δ*(q) | ||
|---|---|---|---|---|---|---|
| FDR | MDR | FDR | MDR | |||
| 100 | 5 | .2 | 15.93574 | 55.56008 | 16.03789 | 55.24426 |
| 100 | 10 | .2 | 15.78390 | 38.48281 | 14.97451 | 38.12848 |
| 100 | 30 | .2 | 16.09659 | 10.88535 | 14.93114 | 8.026157 |
| 100 | 5 | .3 | 13.42253 | 46.06943 | 13.24371 | 45.90983 |
| 100 | 10 | .3 | 13.76047 | 31.97512 | 13.65171 | 30.47509 |
| 100 | 30 | .3 | 13.80077 | 5.639566 | 13.89979 | 2.626881 |
| 100 | 5 | .5 | 10.37083 | 41.65425 | 10.06751 | 41.40670 |
| 100 | 10 | .5 | 10.11820 | 26.34404 | 10.11692 | 23.57873 |
| 100 | 30 | .5 | 10.03628 | 7.615538 | 9.590189 | 4.151696 |
In the simulation study, we fixed M = 100 and nm = n ∈ {5, 10, 30} for m = 1, 2, …, M. We then specified p ∈ {.2, .3, .5}, which is the theoretical proportion of correct Hm1’s. The basic simulation experiment goes as follows: We generated B1, B2, …, BM to be IID Ber(p). We then generated ξ1, ξ2, …, ξM to be IID from a uniform distribution on [.25, .75]. [We note that this step is typically outside the basic simulation experiment, but we wanted to explore many effect sizes in the study.] With θm0 = 1, m = 1, 2, …, M, the true scale parameters were computed to be θm = (1 − Bm)θm0 + Bmθm0/ξm. We then generate the data Tm ~ Ga(nm, 1/θm) for m = 1, 2, …, M. For the observed data t = (t1, t2, …, tM), and for an FDR threshold of q = .20, we applied the BH procedure and the FDR-controlling procedure δ*(q), and for each procedure we computed the false discovery proportion (FDP) and the missed discovery proportion (MDP). Recall that for an action vector a = (a1, a2, …, aM) ∈ {0, 1}M, the FDP and MDP are the quantities
This basic experiment was replicated 1000 times, and the estimated FDR and MDR (missed discovery rate) are the averages of the FDP’s and MDP’s over these 1000 replications. Table 1 presents the results of this simulation study. From this table we see that both the BH and δ*(q) procedures control their FDR rates to be no more than the pre-specified q. We also observe that for the simulation runs performed, there is a slight improvement in terms of the missed discovery rate of the procedure δ*(q) over the BH procedure, with the improvement increasing as n increases. At the same time, it should be noted that the BH procedure, even though it does not exploit the ROC differences of the individual UMP decision processes, still performs quite comparably well in these simulation runs, lending credence to its practical appeal and utility owing to its simplicity and ease of implementation! We mentioned above that for the case (n, p) = (50, .50), some samples were discarded due to a computational limitation. We cannot conclude whether this led to some bias in the results. However, even without this case, the improvement of the proposed procedure over the BH is evident based on all the other cases in Table 1.
Finally, we performed one more set of simulation runs with M = 100, n ∈ {5, 10, 30}, p = .2, but with the effect sizes taking only four different values. The effect size vector is ξ = (ξm, m = 1, 2, …, M) with ξm = .1, m = 1, 2, …, 25, ξm = .3, m = 26, 27, …, 50, ξm = .5, m = 51, 52, …, 75, and ξm = .7, m = 76, 77, …, 100. The number of replications is again 1000. The results of these simulation runs are presented in Table 2. From this table we observe a more pronounced advantage of δ*(q) over the BH procedure in terms of their MDRs, especially when n is equal to 30. Both procedures still satisfied the FDR-threshold of q = 20%.
Table 2.
Simulation results for the case where the effect sizes only takes 4 distinct values, and with M = 100, p = .2, n ∈ {5, 10, 30}. The number of replications is 1000 and the FDR threshold is q = 20%.
| M | nm = n | p | BH | δ*(q) | ||
|---|---|---|---|---|---|---|
| FDR | MDR | FDR | MDR | |||
| 100 | 5 | .2 | 16.3581 | 39.05525 | 15.79368 | 36.95671 |
| 100 | 10 | .2 | 15.86433 | 26.73459 | 16.05328 | 22.97558 |
| 100 | 30 | .2 | 16.0207 | 9.896752 | 15.76027 | 5.316628 |
Finally, note from the results of these simulation studies that these procedures do not yet exhaust the pre-specified FDR-threshold. This indicates that some improvements in global power could still be made as what has been done for the BH procedure by incorporating an adjustment on the threshold using an estimate of the proportion of correct alternative hypotheses. However, we do not explore possible similar adjustments to our proposed procedures in the current paper. We refer the reader to [18] for more discussion on this α-exhaustion notion.
8 Concluding Remarks
In this paper we provided classes of FWER-controlling and FDR-controlling procedures derived from individual decision processes. The innovative idea is the use of multiple decision size functions, which serve as size-pickers for each of the decision processes. Through these classes of MDFs, optimal size functions may be obtained with respect to global Type II error at a specified alternative hypothesis probability measure P1, thereby potentially identifying MDFs with some optimal properties. Via a simulation study, the FDR-controlling procedure was demonstrated to perform better than the BH procedure for gamma-distributed data. Other issues have not been addressed which maybe of interest in future work. For instance, it would be desirable to extend the results in settings where the components of {δm : m ∈ ℳ0(P)} are dependent as in [32,33]; see also the review paper [34]. Another potential extension is to consider generalized FWER and FDR as in [35]. A possible criticism of the proposed approach is the need to know the ROC functions, which thereby entails the specification of a fixed P1, or equivalently, effect sizes for each of the M testing problems. However, this is similar to what is typically done even with single hypothesis testing problems where we focus on a specific alternative hypothesis which is of most interest to detect, such as in the sample size determination problem. In future work we are planning on exploring the use of asymptotic ROC functions which will arise by letting the sample sizes (the nm’s in the concrete example) and M increase and then considering a contiguous set of alternatives, e.g., Pitman-type alternatives. This may potentially alleviate or lessen the criticism mentioned above. Certainly, another approach to eliminating the need to specify a fixed P1 to obtain the ROC functions is to use a Bayesian approach. This programme was partly done in [24], though more work is still needed to clarify and fully understand this approach.
Acknowledgments
We thank Professor Sanat Sarkar for helpful discussions and sincerely thank the reviewers of this work for their sharp and critical comments which were extremely helpful in improving the manuscript. We very much thank Metrika editors, Professor Norbert Henze and Professor Udo Kamps, for providing an outlet for this work.
The authors acknowledge support from National Science Foundation (NSF) Grants DMS 0805809 and DMS 1106435, National Institutes of Health (NIH) Grants P20RR17698, R01CA154731, and P30GM103336-01A1.
Footnotes
Henceforth, decreasing will mean non-increasing, while increasing will mean non-decreasing
Contributor Information
Edsel A. Peña, Department of Statistics, University of South Carolina, Columbia, SC 29208 USA, Tel.: 803-576-5813, Fax: 803-777-4048, pena@stat.sc.edu
Joshua D. Habiger, Department of Statistics, Oklahoma State University, Stillwater, OK 74078, USA, jhabige@okstate.edu
Wensong Wu, Department of Mathematics and Statistics, Florida International University, Miami, FL 33199, USA, wenswu@fiu.edu.
References
- 1.Efron B. Microarrays, empirical Bayes and the two-groups model. Statist. Sci. 2008;23(1):1–22. [Google Scholar]
- 2.Efron B. Simultaneous inference: When should hypothesis testing problems be combined? The Annals of Applied Statistics. 2008;1:197–223. [Google Scholar]
- 3.Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B. 1995;57(1):289–300. [Google Scholar]
- 4.Peña EA, Habiger JD, Wu W. Power-enhanced multiple decision functions controlling family-wise error and false discovery rates. Ann. Statist. 2011;39(1):556–583. doi: 10.1214/10-aos844. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Westfall PH, Krishen A. Optimally weighted, fixed sequence and gatekeeper multiple testing procedures. J. Statist. Plann. Inference. 2001;99(1):25–40. [Google Scholar]
- 6.Genovese CR, Roeder K, Wasserman L. False discovery control with p-value weighting. Biometrika. 2006;93(3):509–524. [Google Scholar]
- 7.Roeder K, Wasserman L. Genome-wide significance levels and weighted hypothesis testing. Statist. Sci. 2009;24(4):398–413. doi: 10.1214/09-STS289. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Blanchard G, Roquain E. Two simple sufficient conditions for FDR control. Electron. J. Stat. 2008;2:963–992. [Google Scholar]
- 9.Roquain E, van de Wiel MA. Optimal weighting for false discovery rate control. Electronic Journal of Statistics. 2009;3:678–711. [Google Scholar]
- 10.Holm S. A simple sequentially rejective multiple test procedure. Scand. J. Statist. 1979;6(2):65–70. [Google Scholar]
- 11.Goeman JJ, Solari A. The sequential rejection principle of familywise error control. Ann. Statist. 2010;38(6):3782–3810. [Google Scholar]
- 12.Westfall PH, Krishen A, Young SS. Using prior information to allocate significance levels for multiple endpoints. Statistics in Medicine. 1998;17:2107–2119. doi: 10.1002/(sici)1097-0258(19980930)17:18<2107::aid-sim910>3.0.co;2-w. [DOI] [PubMed] [Google Scholar]
- 13.Genovese C, Wasserman L. Bayesian statistics, 7 (Tenerife, 2002) New York: Oxford Univ. Press; 2003. Bayesian and frequentist multiple testing; pp. 145–161. With discussions by Merlise A. Clyde and Christian P. Robert and Judith Rousseau, and a reply by the authors. [Google Scholar]
- 14.Storey J. The optimal discovery procedure: a new approach to simultaneous significance testing. Journal of the Royal Statistical Society, Series B. 2007;69:347–368. [Google Scholar]
- 15.Sun W, Cai T. Oracle and adaptive compound decision rules for false discovery rate control. Journal of the American Statistical Association. 2007;102:901–912. [Google Scholar]
- 16.Sarkar SK, Zhou T, Ghosh D. A general decision theoretic formulation of procedures controlling FDR and FNR from a Bayesian perspective. Statist. Sinica. 2008;18(3):925–945. [Google Scholar]
- 17.Kang G, Ye K, Liu N, Allison D, Gao G. Weighted multiple hypothesis testing procedures. Statistical Applications in Genetics and Molecular Biology. 2009;8:1–21. doi: 10.2202/1544-6115.1437. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Finner H, Dickhaus T, Roters M. On the false discovery rate and an asymptotically optimal rejection curve. Ann. Statist. 2009;37(2):596–618. [Google Scholar]
- 19.Habiger JD, Peña EA. Compound p-value statistics for multiple testing procedures. J. Multivariate Anal. 2014;126:153–166. doi: 10.1016/j.jmva.2014.01.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Müller P, Parmigiani G, Robert C, Rousseau J. Optimal sample size for multiple testing: the case of gene expression microarrays. J. Amer. Statist. Assoc. 2004;99(468):990–1001. [Google Scholar]
- 21.Scott J, Berger J. An exploration of aspects of Bayesian multiple testing. Journal of Statistical Planning and Inference. 2006;136:2144–2162. [Google Scholar]
- 22.Guindani M, Muller P, Zhang S. A Bayesian discovery procedure. JRSS B. 2009;71:905–925. doi: 10.1111/j.1467-9868.2009.00714.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Bogdan M, Chakrabarti A, Frommlet F, Ghosh JK. Asymptotic Bayes-optimality under sparsity of some multiple testing procedures. Ann. Statist. 2011;39(3):1551–1579. [Google Scholar]
- 24.Wu W, Peña EA. Bayes multiple decision functions. Electron. J. Stat. 2013;7:1272–1300. doi: 10.1214/13-EJS813. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Habiger JD, Peña EA. Randomized P-values and nonparametric procedures in multiple testing. J. Nonparametr. Stat. 2011;23(3):583–604. doi: 10.1080/10485252.2010.482154. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Šidák Z. Rectangular confidence regions for the means of multivariate normal distributions. J. Amer. Statist. Assoc. 1967;62:626–633. [Google Scholar]
- 27.Dudoit S, van der Laan MJ. Springer Series in Statistics. New York: Springer; 2008. Multiple testing procedures with applications to genomics. [Google Scholar]
- 28.Blanchard G, Roquain É. Adaptive false discovery rate control under independence and dependence. J. Mach. Learn. Res. 2009;10:2837–2871. [Google Scholar]
- 29.Peña EA, Habiger JD, Wu W. Supplement to ‘Power-Enhanced Multiple Decision Functions Controlling Family-Wise Error and False Discovery Rates’. 2011 doi: 10.1214/10-aos844. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Doob JL. Stochastic processes. New York: John Wiley & Sons Inc.; 1953. [Google Scholar]
- 31.Hoeffding W. On the distribution of the number of successes in independent trials. Ann. Math. Statist. 1956;27:713–721. [Google Scholar]
- 32.Sarkar SK, Chang CK. The Simes method for multiple hypothesis testing with positively dependent test statistics. J. Amer. Statist. Assoc. 1997;92(440):1601–1608. [Google Scholar]
- 33.Benjamini Y, Yekutieli D. The control of the false discovery rate in multiple testing under dependency. Ann. Statist. 2001;29(4):1165–1188. [Google Scholar]
- 34.Sarkar SK. On methods controlling the false discovery rate. Sankhyā. 2008;70(2, Ser. A):135–168. [Google Scholar]
- 35.Sarkar SK. Stepup procedures controlling generalized FWER and generalized FDR. Ann. Statist. 2007;35(6):2405–2420. [Google Scholar]

