Classes of Multiple Decision Functions Strongly Controlling FWER and FDR

Edsel A Peña; Joshua D Habiger; Wensong Wu

doi:10.1007/s00184-014-0516-6

. Author manuscript; available in PMC: 2016 Jul 1.

Published in final edited form as: Metrika. 2014 Oct 30;78(5):563–595. doi: 10.1007/s00184-014-0516-6

Classes of Multiple Decision Functions Strongly Controlling FWER and FDR

Edsel A Peña ¹, Joshua D Habiger ², Wensong Wu ³

PMCID: PMC4495772 NIHMSID: NIHMS639111 PMID: 26166847

Abstract

Two general classes of multiple decision functions, where each member of the first class strongly controls the family-wise error rate (FWER), while each member of the second class strongly controls the false discovery rate (FDR), are described. These classes offer the possibility that optimal multiple decision functions with respect to a pre-specified Type II error criterion, such as the missed discovery rate (MDR), could be found which control the FWER or FDR Type I error rates. The gain in MDR of the associated FDR-controlling procedure relative to the well-known Benjamini-Hochberg (BH) procedure is demonstrated via a modest simulation study with gamma-distributed component data. Such multiple decision functions may have the potential of being utilized in multiple testing, specifically in the analysis of high-dimensional data sets.

Keywords: false discovery rate, family-wise error rate, missed discovery rate, multiple decision problem, multiple testing, strong control

1 Introduction and Motivation

Consider the problem of simultaneously testing M pairs of null and alternative hypotheses, (H_m0, H_m1), m = 1, 2, …, M, where M is a known positive integer. Such multiple testing problems arise in many scientific areas; see, for instance, [1,2] for concrete examples. A discovery is said to have been made in the mth testing problem if H_m0 is rejected. It is called a false discovery if a discovery is declared but in reality H_m0 is true. On the other hand, a nondiscovery is made at the mth testing problem if H_m0 is not rejected. It is called a missed discovery if a nondiscovery is declared when in reality H_m0 is false. For the simultaneous testing problem, two commonly-used global Type I error rates are the family-wise error rate (FWER), which is the probability of at least one false discovery, and the false discovery rate (FDR), which is the expectation of the ratio of the number of false discoveries over the number of discoveries.

A conventional testing paradigm employed in these situations is to decide on the collection of statistical tests for the M pairs of hypotheses, e.g., a t-test or a Mann-Whitney-Wilcoxon test for each pair, obtain the p-value for each test, and then use the resulting M p-values in the FWER-controlling sequential Šidák procedure, provided an independence condition is satisfied, or the FDR-controlling procedure in [3]. In this conventional approach, there appears to be no leeway in the choice of the multiple testing procedure the moment the individual test procedures have been chosen. Furthermore, since p-values are probabilities computed under distributions specified by the null hypotheses, it is not apparent whether these p-value-based procedures are actually taking into account the probabilities under the alternative hypotheses distributions. If they do not, then it goes against the Neyman-Pearson dictum that in the search for optimal test procedures, it is germane to consider probabilities under both the null and the alternative hypotheses.

This paper is primarily motivated by these issues. In particular, we pose the following question: If we are given the M test procedures for each of the M pairs of hypotheses, could we obtain classes of multiple testing procedures whose elements either control the FWER or the FDR? If this has an affirmative answer, then a multiple testing procedure within these classes which is optimal with respect to some Type II error rate criterion may be found. In turn, we may then be able to choose the starting collection of test functions that will provide the best multiple testing procedure. This is the spirit of this paper. We will in fact demonstrate that, under certain conditions, when given a collection of test functions for the M pairs of hypotheses, we can generate classes of multiple testing procedures controlling the FWER or the FDR. These results will have important implications in the search for optimal multiple testing procedures that control either of these Type I error rates. The main results in this paper were motivated by those in [4], which did not deal with classes of multiple testing procedures, but instead simply focussed in developing improved FWER and FDR-controlling procedures from the Neyman-Pearson most powerful tests for each of the M pairs of hypotheses.

We shall investigate these issues in a somewhat general abstract framework. We have opted for the additional abstraction in our mathematical framework in order to make concise and elegant the derivations and also to cover more general data structures that arise or may arise in such multiple testing problems. Such data structures are typically high-dimensional and of more complicated forms (e.g., data for the mth testing problem need not be a multi-sample data set but it could be a spatio-temporal, an image, or a shape data), such as those arising in neuroscience, genomics, proteomics, etc. However, in order to make the general results more accessible to the reader, we will demonstrate them in concrete and conventional settings. In line with this, we first discuss a concrete situation in Section 2 as a way of introducing issues of interest and to describe in this specific situation the major results of the paper.

There are certainly papers that have tried to improve on the p-value based approaches. One approach was to use weighted p-values such as in [5] which provides a FWER controlling procedure and in [6] which gives a procedure that controls the FDR. A variety of p-value weighting schemes have been proposed, usually relying on a posited model for the p-values and/or whether control of the FWER or FDR is desired; see, for instance, [7] for a review. Though our proposed testing procedures may be viewed as weighted p-value based, we mention at the outset that our approach is intrinsically different from the weighted p-value approach as expounded in [6] in the manner in which the p-value statistics are weighted. In [6], it is stated in their section 2 that:

Whatever information one uses to construct p-value weights, the weight assignment remains a guess. This guess is to be made a priori, that is before seeing the p-values. For purposes of analysis, we model the weights as random variables that are related to the underlying truth or falsehood of each null hypothesis.

In contrast, in our approach, the p-value weights arise from the alternative hypotheses of most interest. This is akin to determining an appropriate sample size to achieve a desired power for an alternative hypothesis of most interest. Our weights thus take into major consideration the powers of the individual tests at specified alternative hypotheses, hence are not based on a priori assessments of the truth or falsity of each of the null hypotheses.

There are other papers that provide general FWER- and FDR-controlling procedures and which aim for some optimality properties. The paper [8] provides two sufficient conditions for FDR control, while [9] concerns aspects of optimality. The papers [10,11] present sequential rejection procedures which control FWER. Recent papers dealing with multiple decision functions (MDFs) with certain optimality properties are [12,5,13,6,14–19]. On the other hand, papers proposing MDFs with a Bayes or empirical Bayes flavor, are [20,21,1, 22]; and more recently, [23,24].

2 Essence of General Results via Concrete Situation

Prior to embarking on our abstract development we first provide a concrete situation to illuminate the issues and to offer a glimpse of the major results in this paper. Consider the problem of simultaneously testing M pairs of hypotheses H_m0 : θ_m ≤ θ_m0 versus H_m1 : θ_m > θ_m0 (m ∈ ℳ ≡ {1, 2, …, M}), where θ_m0’s are known positive constants. The tests are to be based on data T = (T_m, m ∈ ℳ), where T_m has a gamma distribution with shape parameter n_m, a known positive integer, and scale parameter θ_m, so that the marginal density function of T_m is

f_{T_{m}} (t | θ_{m}) = \frac{1}{θ_{m}^{n_{m}} Γ (n_{m})} t^{n_{m} - 1} exp {- t / θ_{m}} I {t > 0},

where I{·} is the indicator function and Γ(·) is the gamma function. The Sufficiency Principle allows us to simply focus on the T_m’s since if T_m1, T_m2, …, T_{mn_m} are the observable independent and identically distributed (IID) exponential random variables with scale parameter θ_m, then $T_{m} = \sum_{j = 1}^{n_{m}} T_{m j}$ is sufficient for θ_m and has a gamma distribution with parameter vector (n_m, θ_m). Henceforth, we shall denote by 𝒢(·; n, θ) and 𝒢⁻¹(·; n, θ) the distribution and quantile functions, respectively, of a gamma random variable with parameter vector (n, θ). We may also assume that (T_m, m ∈ ℳ₀) is an independent collection of random variables, where ℳ₀ = {m ∈ ℳ : H_m0 is true}. On the other hand, (T_m, m ∈ ℳ₁}, where ℳ₁ =ℳ \ ℳ₀, need not be an independent collection of random variables. It is well-known that, marginally, the α_m-size uniformly most powerful (UMP) test for H_m0 : θ_m ≤ θ_m0 versus H_m1 : θ_m > θ_m0 based on T_m is given by

δ_{m} (T_{m}; α_{m}) = I {T_{m} \geq 𝒢^{- 1} (1 - α_{m}; n_{m}, θ_{m 0})} .

Recall that a (non-randomized) test function δ is a {0, 1}-valued statistic such that δ = 0(1) means that H_m0 is not rejected (rejected). The collection δ = (δ_m, m ∈ ℳ), which is a mapping from the sample space $ℜ_{+}^{M}$ of the (T_m, m ∈ ℳ) into {0, 1}^M, is a specific case of a multiple decision or test function.

Since we are now performing M simultaneous tests, global measures of Type I error are required. The first such global measure of Type I error is the family-wise error rate (FWER). For the multiple testing function δ ≡ {δ_m(·; α_m), m ∈ ℳ}, its FWER is

F W E R (δ) = P {\sum_{m \in ℳ_{0}} δ_{m} (T_{m}; α_{m}) \geq 1}

where P is the true probability measure governing the (T_m, m ∈ ℳ). Another global measure of Type I error in the simultaneous testing setting is the false discovery rate (FDR), which for the multiple testing function δ is defined via

F D R (δ) = E_{P} {\frac{\sum_{m \in ℳ_{0}} δ_{m} (T_{m}; α_{m})}{max {\sum_{m \in ℳ} δ_{m} (T_{m}; α_{m}), 1}}} .

We emphasize that the true probability measure P in the probabilistic evaluation in the FWER or the expectation evaluation in the FDR is not known. Furthermore, note that the set of indices ℳ₀ also depends on this unknown P. Without knowing this P, if one is able to establish that either the FWER or the FDR is no more than a fixed specified level q ∈ [0, 1] by proper choice of the individual sizes α_m’s, or perhaps by proper construction of a multiple testing procedure δ which may be of much more complicated form than the (δ_m (·; α_m), m ∈ ℳ), e.g., each component δ_m may depend on all the T_m’s or it could have a sequential flavor, then strong control of either of these global Type I error measures is achieved.

There are existing multiple test functions that control either the FWER or the FDR which are anchored on the p-value statistics obtained from the individual test functions δ_m’s. For controlling the FWER we may, for instance, utilize the sequential Šidák procedure (see, also, [10]); while for FDR control a popular multiple test function is the Benjamini and Hochberg [3] (BH) procedure. In the context of the concrete situation described in this section, the main essence and contribution of this paper is the demonstration that, starting from the individual UMP tests δ_m’s, one could construct a class of multiple test functions whose members control the FWER and with this class including as a special case the Šidák procedure; as well as a class of multiple test functions whose members control the FDR and with the BH procedure being in fact a member of this class. We now describe specific subsets of these classes of multiple test functions. Let

Γ = {(a_{1}, a_{2}, \dots, a_{M}) : a_{m} \in [0, 1]; \sum_{m = 1}^{M} a_{m} = 1} .

Let γ = (γ_m, m ∈ ℳ) ∈ Γ. For each m ∈ ℳ, define the function A_m : [0, 1] → [0, 1] via

A_{m} (α; γ_{m}) = 1 - {(1 - α)}^{γ_{m}} .

(1)

Suppose that it is desired to control the FWER at level q ∈ (0, 1). For this purpose, define the statistic

α^{†} (T; q, γ) = inf {α \in [0, 1] : \prod_{m \in ℳ} {[1 - A_{m} (α; γ_{m})]}^{1 - δ_{m} [T_{m}; A_{m} (α; γ_{m}) -]} < 1 - q},

where δ_m(t_m; α−) ≡ lim_β↑α δ_m(t_m; β). Then, Theorem 1 on page 16 guarantees us that the multiple test function given by

δ^{†} (T; q, γ) = (δ_{m} (T_{m}; A_{m} (α^{†} (T; q, γ); γ_{m})), m \in ℳ)

(2)

in fact controls the FWER at q for any fixed γ ∈ Γ.

On the other hand, suppose that it is desired to control the FDR at level q. If we define the statistic

α^{*} (T; q, γ) = sup {α \in [0, 1] : \sum_{m \in ℳ} A_{m} (α; γ_{m}) \leq q \sum_{m \in ℳ} δ_{m} [T_{m}; A_{m} (α; γ_{m})]},

then Theorem 2 on page 16 assures us that the multiple test function

δ^{*} (T; q, γ) = (δ_{m} (T_{m}; A_{m} (α^{*} (T; q, γ); γ_{m})), m \in ℳ)

(3)

controls the FDR at level q for γ ∈ Γ satisfying additional conditions to be stated later. Observe that both δ^† (T; q, γ) and δ*(T; q, γ) are so-called compound multiple test functions since their mth component functions utilize the whole data vector T, not just T_m.

If we take γ_m = 1/M for each m ∈ ℳ, then the multiple test function δ^† (T; q, γ = 1/M) is the sequential Šidák procedure; while the multiple test function δ*(T; q, γ = 1/M) is the BH procedure. But since by varying the vector γ = (γ_m, m ∈ ℳ) we obtain many FWER- or FDR-controlling multiple test functions, there is now the possibility that by considering the different power characteristics of the individual test functions δ_m’s, possibly owing to differences in the values of θ_m0’s, n_m’s and/or the effect sizes that we would like to detect, we may now be able to choose appropriately the γ_m’s to improve the global power of the resulting multiple test functions δ^†(T; q, γ) or δ*(T; q, γ). The FWER-control or FDR-control for such a choice of γ_m’s are then guaranteed to be satisfied by virtue of the results concerning the class of multiple test functions. In a nutshell this concrete setting provides a glimpse of the essence of this paper. This concrete setting is also used in the simulation study presented in subsection 7.2.

Since the problem of multiple testing is truly a general problem with potentially complicated data structures, we have opted to treat this more abstractly. Undeniably, it makes the paper denser and harder to comprehend; however, through the abstraction, the results obtained and the proofs presented are more general in scope.

3 Mathematical Underpinnings

3.1 Decision-Theoretic Elements

Let (𝒳, ℱ, 𝒫) be a statistical model, so (𝒳, ℱ) is a measurable space and 𝒫 is a collection of probability measures on (𝒳, ℱ). Though not needed in the abstract development, we may adopt for concreteness the interpretation that 𝒳 is the range space of an observable random entity X arising from an experiment. We shall denote by P ∈ 𝒫 the true underlying, but unknown, probability measure on (𝒳, ℱ). All probability statements and expectations will therefore be with respect to P, while a statistical hypothesis will be a proposition regarding P. In the sequel, for a space 𝒮, σ(𝒮) will denote an appropriate sigma-field of subsets of 𝒮. In decision problems with action space 𝔄 = {0, 1}, such as in hypothesis testing, a nonrandomized decision function is a δ : (𝒳, ℱ) → (𝔄, σ(𝔄)). In hypothesis testing, given X = x ∈ 𝒳, a decision δ(x) = 0 corresponds to deciding in favor of the null hypothesis (H₀), whereas δ(x) = 1, a so-called discovery, corresponds to deciding in favor of the alternative hypothesis (H₁).

It suffices to restrict ourselves to nonrandomized decision functions since, through the use of an auxiliary randomizer, usually a standard uniform variable U independent of X, we may always convert a randomized decision function δ* : (𝒳, ℱ) → ([0, 1], σ[0, 1]) into a nonrandomized decision function δ : (𝒳 × [0, 1], ℱ ⊗ σ[0, 1]) → (𝔄, σ(𝔄)) via δ(x, u) = I {u ≤ δ*(x)}. Thus, in our general formulation, the sample space 𝒳 may actually represent a product space between a data space and [0, 1]. This framework is appropriate, for instance, when dealing with discrete data or when using nonparametric rank-based decision functions. For more discussions on this matter, see [4, 25].

Decision or test functions depend on a size parameter α ∈ [0, 1] as in the concrete situation in Section 2. To further demonstrate this idea, consider the problem of testing the null hypothesis H₀ : μ = 0 versus the alternative hypothesis H₁ : μ ≠ 0 based on a random observable X ~ N (μ, 1), where N (a, b²) is a normal distribution with mean a and variance b². Note that it suffices to consider this one-dimensional X by virtue of a Sufficiency reduction. The size-α test δ : 𝒳 ≡ ℜ → {0, 1} has δ(x; α) = I{|x| > Φ⁻¹(1−α/2)}, where Φ⁻¹(·) is the quantile function of the standard normal distribution. Henceforth, to simplify our notation, we adopt a functional notation where δ(α) represents the statistic defined on 𝒳 according to x ↦ δ(x; α). When we view this as a stochastic process in α, we obtain the notion of a (nonrandomized) decision process as introduced in [4], which is a stochastic process Δ = {δ(α) : α ∈ [0, 1]} where, ∀α ∈ [0, 1], δ(α) is a decision function, and such that the following conditions are satisfied.

(D1)
δ(0) = 0 and δ(1) = 1 a.e.-P.
(D2)
The sample paths α ↦ δ(α) are, a.e.-P, {0, 1}-valued step-functions which are nondecreasing and right-continuous.

3.2 Multiple Decision Functions

Let ℳ be a known finite set with |ℳ| = M. An ℳ-indexed multiple decision problem is one whose action space is 𝔄^M. In the context of multiple hypotheses testing, for each m ∈ ℳ, there is a pair of hypotheses H_m0 and H_m1. Of interest is to simultaneously decide between H_m0 and H_m1 for each m ∈ ℳ. A multiple decision function (MDF) is a δ = (δ_m : m ∈ ℳ), where δ_m is a decision function. Thus, δ : (𝒳, ℱ) → (𝔄^M, σ(𝔄^M)). A multiple decision process (MDP) is a Δ = (Δ_m : m ∈ ℳ), where Δ_m = {δ_m(α) : α ∈ [0, 1]} is a decision process.

Let ℳ₀ ≡ ℳ₀(P) and ℳ₁ ≡ ℳ₁(P) be subsets of ℳ such that

ℳ = ℳ_{0} (P) \cup ℳ_{1} (P) and ℳ_{0} (P) \cap ℳ_{1} (P) = \emptyset .

We shall assume that the following condition holds.

(D3)
{Δ_m : m ∈ ℳ₀(P)} and {Δ_m : m ∈ ℳ₁(P)} are independent of each other and the elements of {Δ_m : m ∈ ℳ₀(P)} are independent.

In multiple hypotheses testing, H_m0 is true under P if and only if m ∈ ℳ₀(P). Observe that the elements of {Δ_m : m ∈ ℳ₁(P)} need not be independent. In many cases, such as in the illustration in Section 2, the Δ_m test process may just be a function of X_m which is specific for the mth decision problem. Condition (D3) will then be satisfied if {X_m, m ∈ ℳ₀(P)} is an independent collection. However, there are cases where the M test processes may be using the same data X, and in such a case we require that the independence condition in (D3) should hold.

In addition, we shall also assume the condition that

(D4)
∀m ∈ ℳ₀(P), ∀α ∈ [0, 1]: E_P {δ_m(α)} = α. The collection of all ℳ-indexed multiple decision processes satisfying conditions (D1)–(D4) will be denoted by 𝔇. We remark that the requirement of equality in (D4), given by E_P {δ_m(α)} = α, will be fulfilled in many situations since an auxiliary randomizer is incorporated in our framework. However, there may still be situations when dealing with non-regular families of distributions (e.g., uniform family of distributions) where this condition may not be satisfied. The latter will manifest itself when the decision function achieves power one but with its size not yet equal to one.

3.3 Multiple Decision Size Functions

Let A = (A_m : m ∈ ℳ) be an ℳ-indexed collection of measurable functions with A_m : ([0, 1], σ[0, 1]) → ([0, 1], σ[0, 1]). We shall say that A is a multiple decision size function if it satisfies the following three conditions:

(A1)
∀m ∈ ℳ: A_m(0) = 0 and A_m(1) = 1.
(A2)
∀m ∈ ℳ: α ↦ A_m(α) is strictly increasing and continuous.
(A3)
∀α ∈ [0, 1] : Π_m∈ℳ[1 − A_m(α)] ≥ 1 − α.

The idea behind the introduction of these size functions is that they will serve as ‘size-pickers’ for each of the individual test functions when a global Type I error level α is specified. Thus, given an α, the multiple decision function that will be of interest is (δ_m[A_m(α)], m ∈ ℳ). Conditions (A1) and (A2) are intuitive requirements for size-pickers. On the other hand, Condition (A3) guarantees that the multiple decision size function (δ_m[A_m(α)], m ∈ ℳ) achieves a weak FWER of no more than α. The weak FWER is the FWER under a P such that all the H_m0’s are true. We denote the collection of all ℳ-indexed multiple decision size functions by 𝔖.

When we deal with FDR-controlling multiple decision functions, we need an additional condition that controls the interplay among the multiple decision process Δ, the multiple size function A, and the underlying probability measure P. This condition is as follows:

(C)
The multiple decision process Δ = {Δ_m : m ∈ ℳ}, multiple size function A = {A_m : m ∈ ℳ}, and underlying probability measure P satisfy, whenever |ℳ₀(P)| < M,
$sup_{α \in [α_{(1)}, 1]} [\frac{| ℳ_{0} (P) | {max}_{m \in ℳ_{0} (P)} A_{m} (α)}{\sum_{m \in ℳ (P)} A_{m} (α)}] \leq 1, a . s . [P],$
with
$α_{(1)} \equiv α_{(1)} (Δ, A) = inf {α \in [0, 1] : \sum_{m \in ℳ} δ_{m} [A_{m} (α)] \geq 1} .$

Note that α₍₁₎ is a random variable, and it is the minimum of the so-called generalized p-value statistics which are formally defined in (12). This statistic α₍₁₎ could be interpreted as the smallest global error rate that leads to a rejection of at least one H_m0’s when using the multiple decision process Δ and multiple size function A. Verifying the condition in (C) is not a trivial matter because of the randomness of α₍₁₎ whose distribution depends on the underlying unknown probability measure P. A stronger but perhaps easier to verify condition, which implies the condition in (C), is that

sup_{α \in [0, 1]} [\frac{| ℳ_{0} (P) | {max}_{m \in ℳ_{0} (P)} A_{m} (α)}{\sum_{m \in ℳ (P)} A_{m} (α)}] \leq 1 .

(4)

As mentioned above, condition (C) depends on the true probability measure P. Since P is not known, we do not know ℳ₀(P). However, we may have an idea, possibly justified by sparsity of signals considerations, that the ratio M₀/M, where M₀ = |ℳ₀(P)|, is no more than some value B with 0 < B < 1. Condition (C) is then implied by

\forall α \in [α_{(1)}, 1] : [\frac{{max}_{m \in ℳ} A_{m} (α)}{Ā (α)}] \leq \frac{1}{B}, a . s . [P],

(5)

where Ā(α) = ∑_m∈ℳ A_m (α)/M. Visually, this condition implies that each of the size functions must be inside the upper envelope U(α) determined by

α \in [0, 1] \mapsto U (α) \equiv \frac{Ā (α)}{B}

for α ∈ [α₍₁₎, 1]. In essence, none of the size functions should dominate the other size functions, which is akin to Noether’s condition when dealing with the asymptotics of rank-based statistics.

A particular element of 𝔖 is the Šidák multiple decision size function (cf., [26]) $A^{S} = (A_{m}^{S} : m \in ℳ)$ with

A_{m}^{S} (α) = 1 - {(1 - α)}^{1 / M}, α \in [0, 1], m \in ℳ .

(6)

This Šidák size function will play a central role in the proof of Theorem 2 which deals with multiple decision functions controlling the false discovery rate. Note, however, that the Bonferroni size function $A^{B} = (A_{m}^{B} : m \in ℳ)$ with

A_{m}^{B} (α) = α / M, α \in [0, 1], m \in ℳ,

(7)

does not belong to 𝔖 since A_m(1) < 1, hence condition (A1) is not satisfied. The fact that this Bonferroni size function does not even satisfy condition (A1) is testament to its being so conservative to the extent that it could not be considered among multiple decision size functions that could potentially lead to optimal multiple decision functions. For, if in the extreme case, we set FWER equal to α = 1, then any multiple decision function whose components are all identically equal to one will satisfy such an FWER control, and we could therefore set A_m(1) = 1 for all m ∈ ℳ.

The Šidák multiple decision size function does not depend on m. We provide examples of multiple size functions that depend on m. The first one is that considered in Section 2 where we have, for γ = (γ₁, γ₂, …, γ_M) ∈ Γ,

A_{m} (α; γ_{m}) = 1 - {(1 - α)}^{γ_{m}}, m \in ℳ .

We remark that the γ_m’s may also depend on α, as will be the case in a later concrete example. The collection {A_m(α; γ_m) : m ∈ ℳ} clearly satisfies conditions (A1) and (A2), and if ∑_m∈ℳ γ_m ≤ 1, then condition (A3) is also satisfied. Equality in condition (A3) is actually achieved by having ∑_m∈ℳ γ_m = 1. Condition (C) in this special case becomes:

\forall α \in [α_{(1)}, 1] : M_{0} max_{m \in ℳ_{0}} {1 - {(1 - α)}^{γ_{m}}} \leq \sum_{m \in ℳ} {1 - {(1 - α)}^{γ_{m}}}, a . s . [P] .

Observe that if all H_m0’s are false, so that ℳ₀ = ∅, then the condition above is automatically satisfied; while if all the H_m0’s are true, so that ℳ₀ = ℳ, then we are forced to have γ_m = 1/M, m = 1, 2, …, M, which leads to the Šidák multiple size functions. In practice, we will have M₀/M bounded above by some number B ∈ (0, 1), and in this case a sufficient condition for the above condition to hold is

\forall α \in [α_{(1)}, 1] : \frac{1 - {(1 - α)}^{γ_{(M)}}}{1 - \frac{1}{M} \sum_{m = 1}^{M} {(1 - α)}^{γ_{m}}} \leq \frac{1}{B}, a . s . [P] .

If the γ_m’s do not depend on α, then for this class of multiple size functions the sufficient condition in (4), via L’Hôpital’s Rule and focusing on the limit as α ↓ 0, is implied by the condition that $\frac{γ_{(M)}}{γ̄} \leq \frac{1}{B}$ , where $γ̄ = \frac{1}{M} \sum_{i = 1}^{M} γ_{m}$ . To see this, define the function g on [0, 1] by

g (α) \equiv g (α; γ) = \frac{1 - {(1 - α)}^{γ_{(M)}}}{\sum_{m = 1}^{M} [1 - {(1 - α)}^{γ_{(m)}}]} .

The assertion above follows if we could show that g(α) is a decreasing¹ function of α. This is the same as showing that

M g (α) = \frac{\sum_{m = 1}^{M} [1 - {(1 - α)}^{γ_{(M)}}]}{\sum_{m = 1}^{M} [1 - {(1 - α)}^{γ_{(m)}}]}

is decreasing in α. But this is immediate if we could show that, for every m = 1, 2, …, M,

α \mapsto h (α) \equiv \frac{1 - {(1 - α)}^{γ_{(M)}}}{1 - {(1 - α)}^{γ_{(m)}}}

is a decreasing function of α ∈ [0, 1]. To show that this is decreasing in α, observe that for 0 ≤ γ_(m) < γ_(M) ≤ 1, as α decreases from 1, (1 − α)^γ(M) decreases at a faster rate that (1 − α)^γ(m) and both are zero at α = 1. Consequently, as α decreases from 1, 1 − (1 − α)^γ(M) decreases at a slower rate than 1 − (1 − α)^γ(m), and since both are in [0, 1], then it follows that h(α) increases as α decreases from 1, that is, that h(α) is a decreasing function of α. Albeit a more formal way of showing that h(α) is decreasing in α is to show that its derivative is negative. We leave this alternative proof as a leisurely exercise for the curious reader.

The next example of a multiple decision size function is derived through an optimization of a global power of the multiple decision function. Let X_m ~ N(μ_m, 1) and consider testing H_m0 : μ_m ≤ 0 versus H_m1 : μ_m > 0. The usual test function of size η_m for H_m0 versus H_m1 is

δ_{m} (η_{m}) \equiv δ_{m} (x_{m}; η_{m}) = I {x_{m} > Φ^{- 1} (1 - η_{m})} .

Let {γ_m, m ∈ ℳ} be a collection of known positive reals, which could be viewed as the effect sizes of interest. The power of δ_m at μ_m = γ_m is given by

π_{m} (η_{m}) = Φ [γ_{m} - Φ^{- 1} (1 - η_{m})] .

The weak family wise error rate of the associated multiple decision function is

F W E R (η_{m}, m \in ℳ) = P_{0} {\sum_{m \in ℳ} δ_{m} (X_{m}; η_{m}) > 0} = 1 - \prod_{m \in ℳ} [1 - η_{m}],

where P₀ is the probability measure associated with μ_m = 0 for all m ∈ ℳ. Suppose we seek to control this weak FWER at an overall level α ∈ [0, 1], so that condition (A3) is necessarily satisfied, but at the same time maximize the overall power of the multiple decision at μ_m = γ_m, m = 1, 2, …, M, given by

POWER (η_{m}, m \in ℳ) = \sum_{m \in ℳ} π_{m} (η_{m}) .

Then, (cf., [4]) the multiple size function {A_m(α), m ∈ ℳ} should satisfy the two conditions: (i) for some λ ∈ ℜ,

\forall m \in ℳ : (1 - A_{m} (α)) \frac{ϕ [γ_{m} - Φ^{- 1} (1 - A_{m} (α))]}{ϕ [Φ^{- 1} (1 - A_{m} (α))]} = λ;

where ϕ(·) is the standard normal density function, and (ii) ∑_m∈ℳ log[1 − A_m (α)] = log(1 − α). If the effect sizes γ_m’s are not all identical, then these size functions are not identical, that is, they will depend on m. Note that some constraints will need to be imposed on the γ_m’s in order for the resulting size functions {A_m(·) : m ∈ ℳ} to satisfy condition (C).

Let us actually implement the optimality prescription above for a somewhat more restrictive family of multiple decision size functions given by the form

A_{m} (α) \equiv A_{m} (α; κ_{m}) = 1 - {(1 - α)}^{κ_{m}}, m = 1, 2, \dots, M,

(8)

with κ_m ∈ [0, 1]. Utilizing the Gaussian setting of the preceding example, we therefore seek the optimal κ_m’s in order to control the weak FWER at level α and at the same time maximize the overall power. We will immediately impose the condition that A_m(α)’s satisfy $\forall α \in [0, 1] : 1 - \prod_{m = 1}^{M} [1 - A_{m} (α)] = α$ . Since 1 − A_m(α) = (1 − α)^κ_m, m = 1, 2, …, M, then it follows that $\sum_{m = 1}^{M} κ_{m} = 1$ . Within this family of multiple size functions, the condition for optimality is that, for some λ > 0, we must have, for every m = 1, 2, …, M, that

{(1 - α)}^{κ_{m}} \frac{ϕ [γ_{m} - Φ^{- 1} ({(1 - α)}^{κ_{m}})]}{ϕ [Φ^{- 1} ({(1 - α)}^{κ_{m}})]} = λ .

(9)

Letting b_m ≡ b_m(α) = b_m(α; κ_m) = Φ⁻¹((1 − α)^κ_m), m = 1, 2, …, M, the equation in (9) simplifies to

log Φ (b_{m}) + γ_{m} b_{m} = log (λ) + \frac{1}{2} γ_{m}^{2}, m = 1, 2, \dots, M .

(10)

Observe that the condition $\sum_{m = 1}^{M} κ_{m} = 1$ is equivalent to $\sum_{m = 1}^{M} log Φ (b_{m}) = log (1 - α)$ . Combining, we therefore obtain an expression for λ given by

log (λ) = \frac{1}{M} \sum_{m = 1}^{M} γ_{m} b_{m} - \frac{1}{2 M} \sum_{m = 1}^{M} γ_{m}^{2} + \frac{1}{M} log (1 - α) .

(11)

Substituting into (10), we get the set of equations determining the optimal b_m’s to be, for m = 1, 2, …, M,

log Φ (b_{m}) + [γ_{m} b_{m} - \frac{1}{M} \sum_{j = 1}^{M} γ_{j} b_{j}] = \frac{1}{2} [γ_{m}^{2} - \frac{1}{M} \sum_{j = 1}^{M} γ_{j}^{2}] + \frac{1}{M} log (1 - α) .

We summarize these results into a proposition.

Proposition 1 For the family of multiple decision functions of form A_m(α) = 1 − (1 − α)^κ_m, m = 1, 2, …, M, the optimal κ_m’s for weak FWER control at α for the Gaussian model are given by

κ_{m}^{*} (α) = \frac{log Φ [b_{m}^{*} (α)]}{log (1 - α)}, m = 1, 2, \dots, M,

where the $b_{m}^{*} (α)$ ’s solve in b = (b₁, b₂, …, b_M)^t the set of equations given by

Q (b) + [D g (γ) - \frac{1}{M} (1 γ^{t})] b {\frac{1}{2} [D g (γ^{2}) - \frac{1}{M} (1 {(γ^{2})}^{t})] + \frac{1}{M} log (1 - α)} 1,

where Dg(a) represents the diagonal matrix with diagonal elements from vector a, 1 is a M × 1 vector of 1’s, γ = (γ₁, γ₂, …, γ_M)^t, $γ^{2} = {(γ_{1}^{2}, γ_{2}^{2}, \dots, γ_{M}^{2})}^{t}$ , and

Q (b) = {[log Φ (b_{1}), log Φ (b_{2}), \dots, log Φ (b_{M})]}^{t} .

Clearly, to compute the optimal $b_{m}^{*}$ ’s, and hence the $κ_{m}^{*}$ ’s and the optimal size functions, requires numerical methods. Furthermore, we point out that the optimal $k_{m}^{*}$ ’s depend on α. Thus, there arise some sort of dynamic adjusting of the optimal sizes to use for each of the decision functions in accordance with the weak FWER level α that is being used. We implemented the computation of the above functions using a Newton-Raphson procedure. For demonstration we took M = 5 and specified the effect sizes γ_m, m = 1, 2, …, 5, by taking the absolute values of five generated random variates from a standard normal distribution. The resulting effect sizes were γ = (1.08, 1.43, 0.19, 1.73, 0.10). Figure 1 presents in the first plot frame the functions b_m(α), m = 1, 2, …, 5, the second plot frame provides the functions A_m(α), m = 1, 2, …, 5, and the third plot frame provides the functions κ_m(α), m = 1, 2, …, M, for α ranging from .001 to .999 with .001 increments. Observe that the size functions are intersecting which is due to the dependence of the optimal κ_m’s to the value of α, a manifestation of the dynamic adjusting that is alluded to above. We actually point out that the second plot frame provides the optimal size functions without restricting to the Šidák-type size functions. See Theorem 3 for the general set of equations for determining the optimal size functions and from which the results above is a special case.

Fig. 1 — Demonstration of the optimal size functions in the restricted family of size functions for effect sizes γ = (1.08, 1.43, 0.19, 1.73, 0.10) with M = 5. The first plot frame contains the plots of the α ↦ *b_m*(α), functions, the second plot frame is for the α ↦ *A_m*(α) = 1 − (1 − α)^κ_m(α) functions, and the third plot frame is for the α ↦ κ_m(α) functions.

We also recall the notion of generalized p-value statistics; see [4]. Given a Δ ∈ 𝔇 and an A ∈ 𝔖, for m ∈ ℳ we define the random variable

α_{m} \equiv α_{m} (Δ, A) = inf {α \in [0, 1] : δ_{m} (A_{m} (α)) = 1} .

(12)

The collection (α_m(Δ, A) : m ∈ ℳ) is called the vector of generalized p-value statistics associated with the pair (Δ, A). Observe that the usual p-value statistic associated with δ_m is P_m = A_m(α_m), hence the use of the adjective generalized for the α_ms. We shall assume that these generalized p-values are a.e. [P] distinct. In the multiple testing literature, there is also the notion of adjusted p-value statistics; see, for instance, pages 32–34 of [27]. The adjusted p-value associated with the mth null hypothesis H_m0 is defined to be the smallest global error rate (e.g., FWER, FDR) so that H_m0 is still rejected. Examining the definition of the generalized p-value statistics, we note that α_m(Δ, A) could also be viewed as the smallest global error rate α such that the mth test function δ[A_m(α)] rejects H_m0. In some sense, therefore, the adjusted p-value statistics and the generalized p-value statistics are related in that both are global error rates leading to the rejection of null hypotheses. Perhaps, our notion of a generalized p-value statistic is more general than the notion of an adjusted p-value statistic since we are allowing more general multiple size functions as pickers of the specific test functions to use at each of the components of the multiple decision problem.

4 Main Theorems and Classes of MDFs

We shall present in this section the two main results that will enable the construction of the classes of multiple decision functions controlling FWER and FDR. Recall that P denotes the true, but unknown, underlying probability measure acting on the space (𝒳, ℱ).

Given a Δ = {Δ_m : m ∈ ℳ} ∈ 𝔇 and an A = {A_m : m ∈ ℳ} ∈ 𝔖, define the stochastic processes S₀ = {S₀(α) : α ∈ [0, 1]}, S = {S(α) : α ∈ [0, 1]}, and F = {F(α) : α ∈ [0, 1]} via

S_{0} (α) \equiv S_{0} (α; Δ, A) = \sum_{m \in ℳ_{0} (P)} δ_{m} (A_{m} (α));

(13)

S (α) \equiv S (α; Δ, A) = \sum_{m \in ℳ} δ_{m} (A_{m} (α));

(14)

F (α) \equiv F (α; Δ, A) = \frac{S_{0} (α)}{S (α)} I {S (α) > 0},

(15)

with the convention that 0/0 = 0. These quantities have the following interpretations. Given an α ∈ [0, 1], for each m ∈ ℳ, the decision function whose size is A_m(α) is chosen from Δ_m, and the MDF δ(α) ≡ (δ_m[A_m(α)] : m ∈ ℳ) is employed in the decision-making. For this MDF δ(α), S₀(α) is the number of false discoveries, S(α) is the number of discoveries, and F(α) is the proportion of false discoveries among all discoveries. Observe, however, that since P is unknown, both S₀ and F are unobservable, whereas S is observable.

For q ∈ [0, 1], let us also define the random variables

α^{†} (q) \equiv α^{†} (q; Δ, A) = inf {α \in [0, 1] : \prod_{m \in ℳ} {[1 - A_{m} (α)]}^{1 - δ_{m} (A_{m} (α) -)} < 1 - q};

(16)

and

α^{*} (q) \equiv α^{*} (q; Δ, A) = sup {α \in [0, 1] : \sum_{m \in ℳ} A_{m} (α) \leq q S (α; Δ, A)} .

(17)

In essence, α^†(q) is a first crossing-time random variable, whereas α*(q) is a last crossing-time random variable. The forms of these two random variables were motivated and justified in Sections 6 and 7 in [4] for a specific multiple decision size function, but the justifications in that paper carry over to the more general setting considered here.

The two main results of this paper are contained in Theorem 1 and Theorem 2. We present these theorems, but defer their proofs to Section 5 after some discussion about their implications and potential usefulness.

Theorem 1 Under conditions (D1)–(D4) for 𝔇 and (A1)–(A3) for 𝔖,

\forall Δ \in 𝔇, \forall A \in 𝔖, \forall q \in [0, 1] : E_{P} {I {S_{0} (α^{†} (q; Δ, A); Δ, A) \geq 1}} \leq q .

Observe that E_P {I{S₀(α^†(q; Δ, A); Δ, A) ≥ 1}} is the FWER since it is the probability of committing at least one false discovery under P. Thus, Theorem 1 shows that for any q ∈ [0, 1], any multiple decision process Δ ∈ 𝔇, and any multiple decision size function A ∈ 𝔖, the MDF defined via

δ^{†} (q) \equiv δ^{†} (q; Δ, A) = (δ_{m} [A_{m} (α^{†} (q; Δ, A))] : m \in ℳ),

(18)

strongly controls the FWER at q.

Theorem 2 Under conditions (D1)–(D4) for 𝔇 and (A1)–(A3) for 𝔖, then for every Δ ∈ 𝔇 and every A ∈ 𝔖 satisfying condition (C),

\forall q \in [0, 1] : E_{P} {F (α^{*} (q; Δ, A); Δ, A)} \leq q .

Note that E_P {F(α*(q; Δ, A); Δ, A)} is the false discovery rate (FDR) as defined in the seminal paper of [3]. The implication of Theorem 2 is that if, for each q ∈ [0, 1], and for any multiple decision process Δ ∈ 𝔇 and multiple decision size function A ∈ 𝔖 satisfying (C), we define the MDF

δ^{*} (q) \equiv δ^{*} (q; Δ, A) = (δ_{m} [A_{m} (α^{*} (q; Δ, A))] : m \in ℳ),

(19)

then δ*(q) is an MDF that (strongly) controls the FDR at q.

The importance of the preceding results is that each multiple decision process Δ ∈ 𝔇 may have an associated multiple decision size process A ≡ A(Δ) ∈ 𝔖 such that the resulting multiple decision functions δ^†(q) or δ*(q) possess some optimality property, for example, with respect to the missed discovery rate, a Type II error rate. To define this rate, let

M (α) \equiv M (α; Δ, A) = \frac{\sum_{m \in ℳ_{1} (P)} (1 - δ_{m} (A_{m} (α)))}{| ℳ_{1} (P) |} I {| ℳ_{1} (P) | > 0} .

(20)

The quantity M(α) has the interpretation of being the proportion of missed discoveries relative to the number of correct alternative hypotheses. Then, the missed discovery rate (MDR) of the MDF in (19) is

M D R (δ^{*} (q; Δ, A)) = E_{P} {M (α^{*} (q); Δ, A)} .

For the given Δ, with proper choice of A, we may be able to find an MDF that strongly controls the FWER or the FDR, and which possesses an optimality property with respect to another criterion, such as having a small MDR. This idea was implemented in a more restricted setting in [4] when each of the pairs of hypotheses contained simple null and simple alternative hypotheses. We point out that previous works have usually focussed in developing a particular MDF and then verifying that it controls the FWER or the FDR, such as, for example, in [3] (more comprehensively, see [27]), with notable exceptions being the papers [6,8,7,28,9,18]. It is our hope that by providing a class of MDFs where each member strongly controls the FWER, given by

ℭ^{†} = {δ^{†} (q; Δ, A) : Δ \in 𝔇, A \in 𝔖};

(21)

or a class of MDFs where each member controls the FDR, given by

ℭ^{*} = {δ^{*} (q; Δ, A) : Δ \in 𝔇, A \in 𝔖, with (Δ, A) satisfying (C)},

(22)

then we acquire the possibility of selecting from these classes MDFs that possess other desirable properties with respect to some suitable Type II error rate, such as the MDR. In Section 7 we provide further discussions regarding this optimality issue and present efficiency comparisons to demonstrate the viability of our idea.

5 Proofs of Theorems

The proofs of the two theorems are analogous to those of Theorem 6.1 and Theorem 7.1 in [4] which can be found in the supplemental article [29]. Note that those proofs were for special forms of the multiple decision process and multiple decision size function, whereas in the current paper we are dealing with an arbitrary element Δ ∈ 𝔇 and an arbitrary element A ∈ 𝔖. In the proofs below, we assume that Δ ∈ 𝔇 and A ∈ 𝔖 have been chosen and are fixed. Also, q ∈ [0, 1] and recall that P ∈ 𝒫 denotes the true but un-known underlying probability measure. The dependence on (Δ,A,P) of some of the relevant processes and quantities below will not be explicitly written for brevity, unless needed for clarity.

5.1 Establishing Theorem 1

Proof We start by defining the stochastic process H₁ = {H₁(α) : α ∈ [0, 1]} via

H_{1} (α) \equiv H_{1} (α; Δ, A) = \prod_{m \in ℳ} {[1 - A_{m} (α)]}^{1 - δ_{m} (A_{m} (α) -)} .

(23)

The sample paths of this process are, a.e. [P], left-continuous with right-hand limits (caglad), are piecewise nonincreasing, and with

1 - α \leq H_{1} (α -) = H_{1} (α) \leq H_{1} (α +)

for every α ∈ (0, 1), where the first inequality is due to property (A3). In fact, by virtue of property (A1) and property (D1), note that

lim_{α ↓ 0} H_{1} (α) = 1 and lim_{α ↑ 1} H_{1} (α) = 1 .

Now, in terms of H₁, we have that

α^{†} (q) = inf {α \in [0, 1] : H_{1} (α) < 1 - q} .

Since, as pointed out above, we have 1 − α ≤ H₁(α), then by its definition, we must have α^†(q) ≥ q. This implies that

H_{1} (α^{†} (q)) \geq 1 - q .

(24)

For the quantity of main interest in the theorem, we have

E_{P} [I {S_{0} (α^{†} (q)) \geq 1}] = P {S_{0} (α^{†} (q)) \geq 1} = 1 - P {S_{0} (α^{†} (q)) = 0} = 1 - P {\underset{m \in ℳ_{0} (P)}{\cap} [δ_{m} (A_{m} (α^{†} (q))) = 0]} .

The last probability cannot, however, be written as a product of probabilities since the δ_m(A_m(α^†(q))) for m ∈ ℳ₀(P) need not be independent owing to the dependence on α^†(q) which is determined by all the (Δ_m,m ∈ ℳ). On the other hand, we do have the set equality

\underset{m \in ℳ_{0} (P)}{\cap} [δ_{m} (A_{m} (α^{†} (q))) = 0] = {α^{†} (q) < min_{m \in ℳ_{0} (P)} α_{m}},

(25)

where the α_ms are the generalized p-value statistics defined in (12).

Next, define the stochastic process H₂ = {H₂(α) : α ∈ [0, 1]} via

H_{2} (α) \equiv H_{2} (α; Δ, A) = (\prod_{m \in ℳ_{0} (P)} [1 - A_{m} (α)]) (\prod_{m \in ℳ_{1} (P)} {[1 - A_{m} (α)]}^{1 - δ_{m} (A_{m} (α) -)}) .

Analogously to the H₁ process, this has caglad sample paths. Let us then define the quantity

α^{#} (q) \equiv α^{#} (q; Δ, A) = inf {α \in [0, 1] : H_{2} (α) < 1 - q} .

Note that this is not a random variable since this depends on the unknown probability measure P, in contrast to α^†(q). Furthermore, also note that

H_{2} (α^{#} (q)) \geq 1 - q .

(26)

From their definitions, H₁(α) ≥ H₂(α), so that H₁(α) < 1 − q implies H₂(α) < 1 − q. Consequently,

α^{†} (q) \geq α^{#} (q) .

(27)

Now, the importance of the quantity α^#(q) arises because of the crucial set equality

{α^{†} (q) < min_{m \in ℳ_{0} (P)} α_{m}} = {α^{#} (q) < min_{m \in ℳ_{0} (P)} α_{m}} .

(28)

To see this equality, first observe that the inclusion ⊆ immediately follows from (27). To prove the reverse inclusion, since {α^#(q) < min_{m∈ℳ₀(P)} α_m} implies that, for some α₀ < min_{m∈ℳ₀(P)} α_m, we have H₂(α₀) < 1 − q. But for such an α₀, we will have δ_m(A_m(α₀)−) = 0 for all m ∈ ℳ₀(P), so that

α_{0} \in {α \in [0, 1] : H_{1} (α) < 1 - q} .

Consequently,

α^{†} (q) = inf {α \in [0, 1] : H_{1} (α) < 1 - q} \leq α_{0} < min_{m \in ℳ_{0} (P)} α_{m} .

The reverse inclusion ⊇ thus follows, completing the proof of (28).

By (25), (28), and the iterated expectation rule, it now follows that

P {\underset{m \in ℳ_{0} (P)}{\cap} [δ_{m} (A_{m} (α^{†} (q))) = 0]} = P {α^{†} (q) < min_{m \in ℳ_{0} (P)} α_{m}} = P {α^{#} (q) < min_{m \in ℳ_{0} (P)} α_{m}} = E_{P} [P {α^{#} (q) < min_{m \in ℳ_{0} (P)} α_{m} | α^{#} (q)}] .

Since α^#(q) is measurable with respect to the sub-σ-field σ(δ_m : m ∈ ℳ₁(P)), whereas min_{m∈ℳ₀(P)} α_m is measurable with respect to the sub-σ-field σ(δ_m : m ∈ ℳ₀(P)), then by condition (D3), α^#(q) and min_{m∈ℳ₀(P)} α_m are independent. Furthermore, by condition (D3), we obtain

P {min_{m \in ℳ_{0} (P)} α_{m} > w} = P {\underset{m \in ℳ_{0} (P)}{\cap} [δ_{m} (A_{m} (w)) = 0]} = \prod_{m \in ℳ_{0} (P)} P {δ_{m} (A_{m} (w)) = 0} = \prod_{m \in ℳ_{0} (P)} [1 - A_{m} (w)],

with the last equality a consequence of condition (D4). Therefore,

P {\underset{m \in ℳ_{0} (P)}{\cap} [δ_{m} (A_{m} (α^{†} (q))) = 0]} = E_{P} {\prod_{m \in ℳ_{0} (P)} [1 - A_{m} (α^{#} (q))]} \geq E_{P} {(\prod_{m \in ℳ_{0} (P)} [1 - A_{m} (α^{#} (q))]) \times (\prod_{m \in ℳ_{1} (P)} {[1 - A_{m} (α^{#} (q))]}^{1 - δ_{m} (A_{m} (α^{#} (q)) -)})} = E_{P} {H_{2} (α^{#} (q)} \geq E_{P} (1 - q) = 1 - q,

with the last inequality following from (26). Thus, finally, we have

E_{P} [I {S_{0} (α^{†} (q)) \geq 1}] = 1 - P {\underset{m \in ℳ_{0} (P)}{\cap} [δ_{m} (A_{m} (α^{†} (q))) = 0]} \leq 1 - (1 - q) = q .

This completes the proof of Theorem 1.

We remark that condition (D4) can be weakened to just having

\forall m \in ℳ_{0} (P), \forall α \in [0, 1] : E_{P} {δ_{m} (α)} \leq α

(29)

to still get the desired strong FWER control. This is so since in the portion of the proof where we have

\prod_{m \in ℳ_{0} (P)} P {δ_{m} (A_{m} (w)) = 0} = \prod_{m \in ℳ_{0} (P)} [1 - A_{m} (w)],

we simply replace the second = sign by ≥ and then the proof of the theorem goes through.

5.2 Establishing Theorem 2

Proof As we mentioned, the proof of Theorem 2 mimics that of Theorem 7.1 in [4] as presented in [29]. As an aside, we remark that the seed of the idea of providing a class of FDR-controlling multiple decision functions was planted to us upon realizing that the proof of the afore-mentioned Theorem 7.1 is functionally independent of the choice of the multiple decision size function.

The case with q = 0 is trivial since then α*(0) = 0, so that F(α*(0)) = 0. Thus we restrict to q ∈ (0, 1]. We first consider the case where P is such that |ℳ₀(P)| < M. By the defining property of α*(q) given in (17), we have that

S (α^{*} (q)) \geq \frac{1}{q} A_{•} (α^{*} (q))

(30)

where A_•(α) = ∑_m∈ℳA_m(α). Consequently, from (15),

F (α^{*} (q)) \leq q \frac{S_{0} (α^{*} (q))}{A_{•} (α^{*} (q))} I {S (α^{*} (q) > 0} \leq q \frac{S_{0} (α^{*} (q))}{A_{•} (α^{*} (q))} .

(31)

For α ∈ [0, 1], define the sub-σ-field

ℱ_{α} \equiv ℱ_{α} (Δ, A) = σ {δ_{m} (A_{m} (β)) : β \in [α, 1], m \in ℳ} .

(32)

Observe that 𝔉 = (ℱ_α : α ∈ [0, 1]) is a decreasing collection of sub-σ-fields of ℱ. By its definition α*(q) is an 𝔉-stopping time.

Let us define the process T₀ = (T₀(α) : α ∈ [0, 1]) according to

T_{0} (α) \equiv T_{0} (α; Δ, A, P) = \sum_{m \in ℳ_{0} (P)} \frac{δ_{m} (A_{m} (α))}{A_{m} (α)} .

Fix 0 ≤ α ≤ β ≤ 1. Then, since δ_m ∈ {0, 1}, we have

E_{P} {T_{0} (α) | ℱ_{β}} = \sum_{m \in ℳ_{0} (P)} E_{p} {\frac{δ_{m} (A_{m} (α))}{A_{m} (α)} | ℱ_{β}} = \sum_{m \in ℳ_{0} (P)} [\frac{1}{A_{m} (α)}] P {δ_{m} (A_{m} (β)) = 1 | ℱ_{β}} \times E_{P} {δ_{m} (A_{m} (α)) | δ_{m} (A_{m} (β)) = 1} = \sum_{m \in ℳ_{0} (P)} δ_{m} (A_{m} (β)) \frac{1}{A_{m} (α)} \frac{A_{m} (α)}{A_{m} (β)} = T_{0} (β),

with the last two equality only up to P-equivalence. The second equality follows from (D3), whereas the second-to-last equality follows since

E_{P} {δ_{m} (A_{m} (α)) | δ_{m} (A_{m} (β)) = 1} = \frac{P {δ_{m} (A_{m} (α)) = 1, δ_{m} (A_{m} (β)) = 1}}{P {δ_{m} (A_{m} (β)) = 1}} = \frac{P {δ_{m} (A_{m} (α)) = 1}}{P {δ_{m} (A_{m} (β)) = 1}} = \frac{A_{m} (α)}{A_{m} (β)}

because of condition (A2) for the A_m(·)s and conditions (D2) and (D4) for the δ_m(·)s. The above results show that, under P, {(T₀(α),ℱ_α) : α ∈ [0, 1]} forms a reverse martingale process. Further, observe that T₀(1) = |ℳ₀(P)| a.e. [P] due to conditions (D1) and (A1). Thus, E_P(T₀(1)) = |ℳ₀(P)|.

From the inequality in (31), and also noting that α*(q) ≥ α ₍₁₎ to get the second inequality, we obtain

E_{P} [F (α^{*} (q))] \leq q E_{P} [\frac{S_{0} (α^{*} (q))}{A_{•} (α^{*} (q))}] = q \sum_{m \in ℳ_{0} (P)} E_{P} [\frac{δ_{m} (α^{*} (q))}{A_{•} (α^{*} (q))}] = q \sum_{m \in ℳ_{0} (P)} E_{P} [\frac{δ_{m} (α^{*} (q))}{A_{m} (α^{*} (q))} \frac{A_{m} (α^{*} (q))}{A_{•} (α^{*} (q))}] \leq q \sum_{m \in ℳ_{0} (P)} E_{P} [\frac{δ_{m} (α^{*} (q))}{A_{m} (α^{*} (q))} {sup_{α \in [α_{(1)}, 1]} \frac{{max}_{m \in | ℳ_{0} (P) |} A_{m} (α)}{A_{•} (α)}}] \leq q \frac{1}{| ℳ_{0} (P) |} E_{P} [T_{0} (α^{*} (q))] = q \frac{1}{| ℳ_{0} (P) |} E_{P} [T_{0} (1)] = q \frac{| ℳ_{0} (P) |}{| ℳ_{0} (P) |} = q,

where the last inequality is obtained using condition (C), while the third-to-last equality obtains by invoking the Optional Sampling Theorem for (reverse) martingales (cf., [30]), and the second-to-last equality because of E_P[T₀(1)] = |ℳ₀(P)|.

At this point, note that since the Šidák multiple decision size function A^S always satisfies condition (C) for all P ∈ 𝒫, including those with ℳ₀(P) =ℳ, then ∀Δ ∈ 𝔇, ∀P ∈ 𝒫, we have the property

E_{P} {F (α^{*} (q; Δ, A^{S}); Δ, A^{S})} \leq q .

(33)

We now consider the case when P is such that ℳ₀(P) = ℳ. Denote by 𝒫₀ = {P ∈ 𝒫 : ℳ₀(P) = ℳ}. Consider an arbitrary A ∈ 𝔖 and P ∈ 𝒫₀. We need to establish that E_P {F(α*(q;Δ,A);Δ,A)} ≤ q. For such a P ∈ 𝒫₀, we have F(α;Δ,A) = I{S(α;Δ,A) > 0}, so that

E_{P} [F (α^{*} (q; Δ, A); Δ, A)] = P {S (α^{*} (q; Δ, A); Δ, A) > 0} = P {α^{*} (q; Δ, A) > 0} .

We have, for any Δ ∈ 𝔇 and any A ∈ 𝔖, that

{α^{*} (q; Δ, A) > 0} = \underset{α \in (0, 1]}{\cup} {\frac{S (α; Δ, A)}{A_{•} (α)} \geq \frac{1}{q}} .

(34)

In Lemma D.1 of [29] it was established, using an inequality of [31], that for W_m(η_m),m ∈ ℳ, independent Bernoulli(η_m) random variables with η_m ∈ [0, 1] and satisfying ∏_m∈ℳ(1 − η_m) = 1 − α, for each t ≥ 1,

P {\frac{\sum_{m \in ℳ} W_{m} (η_{m})}{\sum_{m \in ℳ} η_{m}} \geq t} \leq P {\frac{\sum_{m \in ℳ} W_{m} (η_{m}^{S})}{\sum_{m \in ℳ} η_{m}^{S}} \geq t},

(35)

where $η_{m}^{S} = 1 - {(1 - α)}^{1 / M}$ , m ∈ ℳ.

Noting that, under P ∈ 𝒫₀, δ_m(A_m(α))s are independent Bernoulli(A_m(α)), then by using the inequality in (35) and condition (A3), it follows that for q ∈ (0, 1],

P {\frac{S (α; Δ, A)}{A_{•} (α)} \geq \frac{1}{q}} \leq P {\frac{S (α; Δ, A^{S +})}{A_{•}^{S +}} \geq \frac{1}{q}},

(36)

where the Šidák sizes $A^{S +} = (A_{m}^{S +}, m \in ℳ)$ in (36) have components

A_{m}^{S +} = 1 - {(1 - α^{+})}^{1 / M}, m \in ℳ,

with α⁺ satisfying ∏_m∈ℳ[1 − A_m(α)] = 1 − α⁺. Observe that by (A3), we have necessarily that α⁺ ≤ α. Combining the results in (34) and (36), we obtain

P {α^{*} (q; Δ, A) > 0} \leq P {α^{*} (q; Δ, A^{S +}) > 0} .

But since we have already established that, for P ∈ 𝒫₀, we have

P {α^{*} (q; Δ, A^{S +}) > 0} \leq q,

then it follows that P{α*(q;Δ,A) > 0} ≤ q. This implies finally that

E_{P} {F (α^{*} (q; Δ, A); Δ, A)} \leq q

for any P ∈ 𝒫₀. This completes the proof of Theorem 2.

Short and neat proofs of FDR control for multiple decision functions based on p-value statistics are also provided in Theorem 4.1 of [18] and in Proposition 2.7 of [8]. The two sufficient conditions in the latter paper do not cover our set-up since the size functions need not belong to their factorized threshold collection. In addition, their dependency control condition appear to be quite distinct from our condition (C). It is not clear to us whether similar short proofs could be employed in establishing our Theorem 2 using the representation of δ*(q) in terms of the generalized p-value statistics given in Section 6. Also, in contrast to Theorem 1 where we were able to have the weaker version of condition (D4) given in (29), we could not do this for Theorem 2. The reason is that we could not conclude under this weaker condition that the process {(T₀(α),ℱ_α) : α ∈ [0, 1]} is a reverse supermartingale, which would have allowed us to get the desired result. It may be possible that under certain situations we do have this supermartingale property, but the weaker condition (29) appears not sufficient for this property to hold in general.

6 Representations of MDFs in Terms of the Generalized p-Values

This section expresses the MDFs δ^†(q) in (18) and δ*(q) in (19) in terms of the generalized p-value statistics defined in (12). Without loss of generality, let ℳ = {1, 2…, M}. Define the vector of anti-rank statistics via

((1), (2), \dots, (M)) : (𝒳, ℱ) \to (𝔐, σ (𝔐))

(37)

where 𝔐 is the space of all possible permutations of ℳ, and such that

α_{(1)} < α_{(2)} < \dots < α_{(M)} .

Let us first consider the random variable α^†(q) in (16). We see from its definition and those of the generalized p-value statistics that, for some J ∈ ℳ̄ ≡ {0} ∪ ℳ, we have α^†(q) ∈ [α_(J), δ_(J+1)) if and only if

\forall j \in {1, 2, \dots, J} : \prod_{m \in ℳ} [1 - A_{(m)} {(α_{(j)}]}^{1 - δ_{(m)} [A_{(m)} (α_{(j)}) -]} \geq 1 - q; \prod_{m \in ℳ} [1 - A_{(m)} {(α_{(J + 1)}]}^{1 - δ_{(m)} [A_{(m)} (α_{(J + 1)}) -]} < 1 - q .

From the definition of the generalized p-value statistics we further have

δ_{(m)} [A_{(m)} (α_{(j)}) -] = I {m \leq j - 1} .

Consequently, by defining the ℳ̄-valued random variable

J^{†} (q) = max {k \in ℳ : \prod_{m = j}^{M} [1 - A_{(m)} (α_{(j)})] \geq 1 - q, j = 1, 2, \dots, k},

(38)

we have the result that α^†(q) ∈ [α_(J^†(q)), α_(J^†(q)+1)). As a consequence we obtain the representation of δ^†(q) in (18) in terms of the α_ms given by

δ^{†} (q) \equiv (δ_{m} (A_{m} (α^{†} (q))), m \in ℳ) = (δ_{m} (A_{m} (α_{(J^{†} (q))})), m \in ℳ),

(39)

where we used the fact that, for each m ∈ ℳ, δ_m is constant in the interval

[A_{m} (α_{(J^{†} (q))}), A_{m} (α_{(J^{†} (q) + 1)})) .

Next let us consider the random variable α*(q) in (17). We may re-express its defining equation via

α^{*} (q) = sup {α \in [0, 1] : \sum_{m \in ℳ} A_{(m)} (α) \leq q \sum_{m \in ℳ} δ_{(m)} [A_{(m)} (α)]} .

But, since ∑_m∈ℳ δ_(m) [A_(m)(α_(j))] = j, then α*(q) ∈ [α_(J), α_(J+1)) iff

\sum_{m \in ℳ} A_{(m)} (α_{(J)}) \leq q J; \forall j \in {J + 1, J + 2, \dots, M} : \sum_{m \in ℳ} A_{(m)} (α_{(j)}) > q j .

Defining the ℳ̄-valued random variable

J^{*} (q) = max {k \in ℳ : \sum_{m \in ℳ} A_{(m)} (α_{(k)}) \leq q k},

(40)

we then have that α*(q) ∈ [α_(J*(q)), α_(J*(q)+1)]. As a consequence, an equivalent representation of the MDF δ*(q) in (19) in terms of the α_ms is provided by

δ^{*} (q) \equiv (δ_{m} (A_{m} (α^{*} (q))), m \in ℳ) = (δ_{m} (A_{m} (α_{(J * (q))})), m \in ℳ) .

(41)

The representations in (39) for δ^†(q) and (41) for δ*(q) provide alternative computational approaches since, instead of computing α^†(q) and α*(q), we may simply compute the generalized p-values, then J^†(q) and J*(q), and then finally the realizations of the decision functions.

For a simple application, let us see what becomes of the MDFs δ^†(q) and δ*(q) if we use the Šidák multiple decision size function A^S given in (6).We use the alternate representations just obtained above. By simple manipulations, we immediately obtain that

J^{†} (q) = max {k \in ℳ : α_{(j)} \leq 1 - {(1 - q)}^{M / (M - j + 1)}, j = 1, 2, \dots, k}; J^{*} (q) = max {k \in ℳ : M {[1 - (1 - α_{(k)})]}^{1 / M} \leq q_{k}} .

But, for these Šidák size functions, the (ordinary) p-value statistics are given by

P_{m} = A_{m}^{S} (α_{m}) = 1 - {(1 - α_{m})}^{1 / M}, m \in ℳ .

Re-expressing the J^†(q) and J*(q) in terms of these p-values, we easily obtain by simple manipulations that

J^{†} (q) = max {k \in ℳ : P_{(j)} \leq 1 - {(1 - q)}^{1 / (M - j + 1)}, j = 1, 2, \dots, k};

(42)

J^{*} (q) = max {k \in ℳ : P_{(k)} \leq q k / M} .

(43)

Observe that J^†(q) in (42) leads to the step-down sequential Šidák FWER-controlling procedure, see [10,27]; whereas, J*(q) in (43) is the usual form of the step-up Benjamini-Hochberg FDR-controlling procedure in [3]. Thus, through the Šidák sizes, we are able to obtain from our formulation two popular MDFs for FWER and FDR control as special cases of the MDFs δ^†(q) and δ*(q)!

7 On Developing Optimal MDFs

7.1 Computing Optimal Size Functions

In this subsection, we demonstrate the potential utility of the classes of MDFs arising from Theorems 1 and 2 in the context of obtaining MDFs with some optimality properties, especially in non-exchangeable multiple hypotheses testing settings, which include those where the power characteristics of the M test functions are not identical.

Consider a fixed multiple decision process Δ ∈ 𝔇 and fix a probability measure P₁ ∈ 𝒫, with ℳ₁(P₁) = ℳ. Recall that in the single hypothesis testing situation, focus on the power of a test procedure is usually on a specific alternative determined by a specified effect size ideally set at the design of experiment stage so as to determine an appropriate sample size. The fixed P₁ is tantamount to specifying the alternative P that is of most interest.

Define the mappings π_m : [0, 1] → [0, 1] for m ∈ ℳ according to

π_{m} (α; P_{1}) = E_{P_{1}} [δ_{m} (α)], α \in [0, 1] .

(44)

When viewed as a function of P₁, π_m(α; ·) is the power function of δ_m when it is allocated a size of α. Of interest to us, though, is to view it as a function of α for the fixed P₁. In this case, π_m(·;P₁) is the receiver operating characteristic curve (ROC) of the mth test or decision process. Assume that for each m ∈ ℳ, the mapping α ↦ π_m(α; P₁) is strictly increasing with π_m(1; P₁) = 1 and twice-differentiable.

Suppose we desire to strongly control the overall FWER or FDR at some pre-specified level q ∈ [0, 1], but at the same time maximize the total (or average) power at P = P₁. Our idea is to first obtain the optimal multiple decision size function for weak FWER control associated with Δ, denoted by $A^{*} = (A_{m}^{*} (α), m \in ℳ) \in 𝔖$ . This is the multiple decision size function A satisfying the condition ∀α ∈ [0, 1] : ∏_m∈ℳ[1 − A_m(α)] = 1 − α, and such that the total power at P = P₁, given by ∑_m∈ℳπ_m(A_m(α);P₁), is maximized. We formally present this optimal multiple decision size function in the following theorem.

Theorem 3 Assume that for the multiple decision process Δ = (Δ_m : m ∈ ℳ), each of the M ROC functions α ↦ π_m(α; P₁) are strictly increasing, concave, and twice-differentiable (in α) with first and second derivatives $π_{m}^{'} (\cdot; P_{1})$ and $π_{m}^{″} (\cdot; P_{1})$ , respectively, so that $π_{m}^{″} (α; P_{1}) < 0$ . Then the multiple decision size function A = (A_m(α) : m ∈ ℳ) maximizing the global power of the multiple decision function δ(α) = (δ_m[A_m(α)] : m ∈ ℳ) under weak FWER-control at level α is $A^{*} (Δ) = (A_{m}^{*} (α) : m \in ℳ)$ satisfying the two sets of conditions

For some λ ∈ ℜ₊ and for each m ∈ ℳ,
$log {π_{m}^{'} [A_{m}^{*} (α); P_{1}]} + log [1 - A_{m}^{*} (α)] = log (λ);$
$\sum_{m \in ℳ} log [1 - A_{m}^{*} (α)] = log (1 - α)$ .

Equivalently, for fixed α ∈ (0, 1), $A_{m}^{*} (α) = z_{m}$ , m ∈ ℳ, with z = (z₁, …, z_M)^t satisfying the set of M equations

[I - \frac{1}{M} J] κ (z) + l (z) - [\frac{1}{M} log (1 - α)] 1 = 0

where $κ (z) = {(log π_{m}^{'} (z_{m}; P_{1}), m = 1, 2, \dots, M)}^{t}$ , l(z) = (log(1 − z_m), m = 1, 2, …, M)^t, I is the M × M identity matrix, J = 11^t, and 1 (0) is the M × 1 vector of 1’s (0’s).

Proof The proof of this theorem is straightforward by using Langrangian optimization which leads to the two conditions (i) and (ii). To obtain the equivalent set of equations, sum the M elements in item (i), use the constraint in item (ii), plug-in expression for log(λ) into the equations in item (i), and finally convert into vectors and matrices. The concavity of the ROC functions guarantees that a maximizer is obtained.

Conditions (i) and (ii) in Theorem 3 are analogous to the conditions in Theorem 4.3 in [4] which dealt with the situation when the individual test functions coincide with the Neyman-Pearson most powerful tests. The concavity of the ROC functions is critical and guarantees that these functions are above the 45-degree line, which implies that each δ_m(α) is an unbiased test function, that is, performs better than the test function which does not use the data x_m but always rejects H_m0 with probability α. The alternative set of equations in Theorem 3 is conducive to a Newton-Raphson iteration for finding the (approximate) solution z. To perform this iterative approach, it is computationally more stable to perform the iterations on the v = (υ₁, υ₂, …, υ_M)^t with

υ_{m} \equiv υ_{m} (z_{m}) = log [\frac{z_{m}}{1 - z_{m}}], m = 1, 2, \dots, M .

Thus, z_m ≡ z_m(υ_m) = exp(υ_m)/(1 + exp(υ_m)) for m = 1, 2, …, M. Define the mappings

V (v) = [I - \frac{1}{M} J] κ (z (v)) + l (z (v)) - [\frac{1}{M} log (1 - α)] 1; H (v) = [I - \frac{1}{M} J] D_{1} (v) + D_{2} (v),

where the matrices D₁(v) and D₂(v) are defined, respectively, via

D_{1} (v) = D g (\frac{π_{m}^{″} (z_{m})}{π_{m}^{'} (z_{m})} z_{m} (1 - z_{m}), m = 1, 2, \dots, M); D_{2} (v) = - D g (z_{m}, m = 1, 2, \dots, M) .

In the above expressions, Dg(w₁, w₂, …, w_M) means the diagonal matrix with elements w₁, w₂, …, w_M. The Newton-Raphson iteration updating for v then proceeds via

v^{new} \leftarrow v^{old} - {[H (v^{old})]}^{- 1} [V (v^{old})] .

Seed values for the υ_m’s are those associated with the Šidák sizes z_m = 1 − (1 − α)^1/M for m = 1, 2, …, M. A possible computational bottleneck in this approach is the invertion of the potentially huge matrix H. If this becomes problematic, then one may revert to directly using conditions (i) and (ii) in Theorem 3 as was done in the computational portion in [4].

Having determined the optimal multiple decision size function A* associated with Δ, which is at this point optimal only in the sense of weak FWER control, we can then apply Theorem 1 to obtain the MDF δ^†(q; Δ, A*) which will strongly control the FWER at q; or apply Theorem 2 to obtain the MDF δ*(q; Δ, A*) which will (strongly) control the FDR at q. By virtue of the choice of the size process A*, which is tied-in to the multiple decision process Δ and the target probability measure P₁, we expect that the MDFs δ^†(q; Δ, A*) and δ*(q; Δ, A*) will perform better with respect to overall power at P₁ relative to, for example, the sequential Šidák MDF or the BH MDF, which we saw from the preceding section are MDFs arising from the Šidák multiple decision size function, a size function that may not be optimal for the chosen multiple decision size process Δ.

7.2 Results of a Modest Simulation Study

To demonstrate the improvement in performance of the proposed procedures, we provide results of a simulation study showing the gain in global power, or the decrease in the missed discovery rate, of the proposed MDF. We focus only on the FDR-controlling procedure in this simulation, and compare its performance with the BH procedure. Note that results of a simulation study was presented in [4] demonstrating the improvement over the BH procedure of the MDF δ* in a Gaussian setting. In the current simulation study, we consider a non-Gaussian model.

The model considered in the simulation study has T_m ~ Ga(n_m, 1/θ_m) for m = 1, 2, …, M, so that E[T_m] = nθ_m. We assumed that the T_m’s are independent random variables. The multiple decision problem is to decide, for each m, whether H_m0 : θ_m ≤ θ_m0 or H_m1 : θ_m > θ_m0. Denote by 𝒞(·; k), c(·; k), and 𝒞⁻¹(·; k) the distribution, density, and quantile functions, respectively, of a chi-squared random variable with degrees-of-freedom k. The uniformly most powerful (UMP) decision process for testing H_m0 versus H_m1 using only T_m is given by Δ_m = (δ_m(·; α) : α ∈ [0, 1]) where

δ_{m} (t_{m}; α) = I {2 t_{m} / θ_{m 0} \geq 𝒞^{- 1} (1 - α; 2 n_{m})} .

The associated P-value statistic for this test process is

P_{m} (T_{m}) = 1 - 𝒞 (2 T_{m} / θ_{m 0}; 2 n_{m}),

so the decision function δ_m(t_m; α) could also be expressed via

δ_{m} (t_{m}; α) = I {P_{m} (t_{m}) \leq α} .

For a fixed θ_m1, which exceeds θ_m0, the ROC function associated with the decision process Δ_m is easily seen to be

π_{m} (α; θ_{m 0}, θ_{m 1}) = π_{m} (α; ξ_{m}) = 1 - 𝒞 (ξ_{m} 𝒞^{- 1} (1 - α; 2 n_{m}); 2 n_{m}),

where ξ_m = θ_m0/θ_m1, which can be viewed as the effect size, though it might be more fitting to call this as the reciprocal of the usual effect size. It is then straight-forward to verify that

π_{m}^{'} (α; ξ_{m}) = ξ_{m}^{n_{m}} exp {- \frac{1}{2} (ξ_{m} - 1) 𝒞^{- 1} (1 - α; 2 n_{m})}; π_{m}^{″} (α; ξ_{m}) = [(ξ_{m} - 1) / 2] π_{m}^{'} (α; ξ_{m}) / c [𝒞^{- 1} (1 - α; 2 n_{m}); 2 n_{m}] .

These are the functions that are needed to implement the Newton-Raphson iterative procedure described in the preceding subsection for finding the optimal size functions for weak-FWER control, and consequently for finding α*(q). We implemented this algorithm using an R program and with use of the pchisq, dchisq, and qchisq objects. A computational limitation encountered is when A_m(α) becomes very, very small which causes qchisq to output either an Inf or an NaN. When this is encountered for a generated sample, the sample is discarded from the simulation. In the simulation runs, this limitation arose in Table 1 only for the case with (n, p) = (30, .50). For this case, 120 samples got discarded prior to obtaining 1000 successful replications.

Table 1.

Results of simulation study for gamma-distributed component data. The effects sizes were generated according to ξ_m ~ U[.25, .75] for m = 1, 2, …, M. The FDR threshold was set to q = 20%. The theoretical proportion of correct H_m1’s is p. The number of simulation replications for each combination of simulation parameters was 1000. Note that the estimated FDRs and MDRs are in percentages.

M	n_m = n	p	BH		δ*(q)

			FDR	MDR	FDR	MDR

100	5	.2	15.93574	55.56008	16.03789	55.24426
100	10	.2	15.78390	38.48281	14.97451	38.12848
100	30	.2	16.09659	10.88535	14.93114	8.026157

100	5	.3	13.42253	46.06943	13.24371	45.90983
100	10	.3	13.76047	31.97512	13.65171	30.47509
100	30	.3	13.80077	5.639566	13.89979	2.626881

100	5	.5	10.37083	41.65425	10.06751	41.40670
100	10	.5	10.11820	26.34404	10.11692	23.57873
100	30	.5	10.03628	7.615538	9.590189	4.151696

Open in a new tab

In the simulation study, we fixed M = 100 and n_m = n ∈ {5, 10, 30} for m = 1, 2, …, M. We then specified p ∈ {.2, .3, .5}, which is the theoretical proportion of correct H_m1’s. The basic simulation experiment goes as follows: We generated B₁, B₂, …, B_M to be IID Ber(p). We then generated ξ₁, ξ₂, …, ξ_M to be IID from a uniform distribution on [.25, .75]. [We note that this step is typically outside the basic simulation experiment, but we wanted to explore many effect sizes in the study.] With θ_m0 = 1, m = 1, 2, …, M, the true scale parameters were computed to be θ_m = (1 − B_m)θ_m0 + B_mθ_m0/ξ_m. We then generate the data T_m ~ Ga(n_m, 1/θ_m) for m = 1, 2, …, M. For the observed data t = (t₁, t₂, …, t_M), and for an FDR threshold of q = .20, we applied the BH procedure and the FDR-controlling procedure δ*(q), and for each procedure we computed the false discovery proportion (FDP) and the missed discovery proportion (MDP). Recall that for an action vector a = (a₁, a₂, …, a_M) ∈ {0, 1}^M, the FDP and MDP are the quantities

F D P = \frac{\sum_{m = 1}^{M} a_{m} (1 - B_{m})}{max (1, \sum_{m = 1}^{M} a_{m})} and M D P = \frac{\sum_{m = 1}^{M} (1 - a_{m}) B_{m}}{max (1, \sum_{m = 1}^{M} B_{m})} .

This basic experiment was replicated 1000 times, and the estimated FDR and MDR (missed discovery rate) are the averages of the FDP’s and MDP’s over these 1000 replications. Table 1 presents the results of this simulation study. From this table we see that both the BH and δ*(q) procedures control their FDR rates to be no more than the pre-specified q. We also observe that for the simulation runs performed, there is a slight improvement in terms of the missed discovery rate of the procedure δ*(q) over the BH procedure, with the improvement increasing as n increases. At the same time, it should be noted that the BH procedure, even though it does not exploit the ROC differences of the individual UMP decision processes, still performs quite comparably well in these simulation runs, lending credence to its practical appeal and utility owing to its simplicity and ease of implementation! We mentioned above that for the case (n, p) = (50, .50), some samples were discarded due to a computational limitation. We cannot conclude whether this led to some bias in the results. However, even without this case, the improvement of the proposed procedure over the BH is evident based on all the other cases in Table 1.

Finally, we performed one more set of simulation runs with M = 100, n ∈ {5, 10, 30}, p = .2, but with the effect sizes taking only four different values. The effect size vector is ξ = (ξ_m, m = 1, 2, …, M) with ξ_m = .1, m = 1, 2, …, 25, ξ_m = .3, m = 26, 27, …, 50, ξ_m = .5, m = 51, 52, …, 75, and ξ_m = .7, m = 76, 77, …, 100. The number of replications is again 1000. The results of these simulation runs are presented in Table 2. From this table we observe a more pronounced advantage of δ*(q) over the BH procedure in terms of their MDRs, especially when n is equal to 30. Both procedures still satisfied the FDR-threshold of q = 20%.

Table 2.

Simulation results for the case where the effect sizes only takes 4 distinct values, and with M = 100, p = .2, n ∈ {5, 10, 30}. The number of replications is 1000 and the FDR threshold is q = 20%.

M	n_m = n	p	BH		δ*(q)

			FDR	MDR	FDR	MDR

100	5	.2	16.3581	39.05525	15.79368	36.95671
100	10	.2	15.86433	26.73459	16.05328	22.97558
100	30	.2	16.0207	9.896752	15.76027	5.316628

Open in a new tab

Finally, note from the results of these simulation studies that these procedures do not yet exhaust the pre-specified FDR-threshold. This indicates that some improvements in global power could still be made as what has been done for the BH procedure by incorporating an adjustment on the threshold using an estimate of the proportion of correct alternative hypotheses. However, we do not explore possible similar adjustments to our proposed procedures in the current paper. We refer the reader to [18] for more discussion on this α-exhaustion notion.

8 Concluding Remarks

In this paper we provided classes of FWER-controlling and FDR-controlling procedures derived from individual decision processes. The innovative idea is the use of multiple decision size functions, which serve as size-pickers for each of the decision processes. Through these classes of MDFs, optimal size functions may be obtained with respect to global Type II error at a specified alternative hypothesis probability measure P₁, thereby potentially identifying MDFs with some optimal properties. Via a simulation study, the FDR-controlling procedure was demonstrated to perform better than the BH procedure for gamma-distributed data. Other issues have not been addressed which maybe of interest in future work. For instance, it would be desirable to extend the results in settings where the components of {δ_m : m ∈ ℳ₀(P)} are dependent as in [32,33]; see also the review paper [34]. Another potential extension is to consider generalized FWER and FDR as in [35]. A possible criticism of the proposed approach is the need to know the ROC functions, which thereby entails the specification of a fixed P₁, or equivalently, effect sizes for each of the M testing problems. However, this is similar to what is typically done even with single hypothesis testing problems where we focus on a specific alternative hypothesis which is of most interest to detect, such as in the sample size determination problem. In future work we are planning on exploring the use of asymptotic ROC functions which will arise by letting the sample sizes (the n_m’s in the concrete example) and M increase and then considering a contiguous set of alternatives, e.g., Pitman-type alternatives. This may potentially alleviate or lessen the criticism mentioned above. Certainly, another approach to eliminating the need to specify a fixed P₁ to obtain the ROC functions is to use a Bayesian approach. This programme was partly done in [24], though more work is still needed to clarify and fully understand this approach.

Acknowledgments

We thank Professor Sanat Sarkar for helpful discussions and sincerely thank the reviewers of this work for their sharp and critical comments which were extremely helpful in improving the manuscript. We very much thank Metrika editors, Professor Norbert Henze and Professor Udo Kamps, for providing an outlet for this work.

The authors acknowledge support from National Science Foundation (NSF) Grants DMS 0805809 and DMS 1106435, National Institutes of Health (NIH) Grants P20RR17698, R01CA154731, and P30GM103336-01A1.

Footnotes

Henceforth, decreasing will mean non-increasing, while increasing will mean non-decreasing

Contributor Information

Edsel A. Peña, Department of Statistics, University of South Carolina, Columbia, SC 29208 USA, Tel.: 803-576-5813, Fax: 803-777-4048, pena@stat.sc.edu

Joshua D. Habiger, Department of Statistics, Oklahoma State University, Stillwater, OK 74078, USA, jhabige@okstate.edu

Wensong Wu, Department of Mathematics and Statistics, Florida International University, Miami, FL 33199, USA, wenswu@fiu.edu.

References

1.Efron B. Microarrays, empirical Bayes and the two-groups model. Statist. Sci. 2008;23(1):1–22. [Google Scholar]
2.Efron B. Simultaneous inference: When should hypothesis testing problems be combined? The Annals of Applied Statistics. 2008;1:197–223. [Google Scholar]
3.Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B. 1995;57(1):289–300. [Google Scholar]
4.Peña EA, Habiger JD, Wu W. Power-enhanced multiple decision functions controlling family-wise error and false discovery rates. Ann. Statist. 2011;39(1):556–583. doi: 10.1214/10-aos844. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Westfall PH, Krishen A. Optimally weighted, fixed sequence and gatekeeper multiple testing procedures. J. Statist. Plann. Inference. 2001;99(1):25–40. [Google Scholar]
6.Genovese CR, Roeder K, Wasserman L. False discovery control with p-value weighting. Biometrika. 2006;93(3):509–524. [Google Scholar]
7.Roeder K, Wasserman L. Genome-wide significance levels and weighted hypothesis testing. Statist. Sci. 2009;24(4):398–413. doi: 10.1214/09-STS289. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Blanchard G, Roquain E. Two simple sufficient conditions for FDR control. Electron. J. Stat. 2008;2:963–992. [Google Scholar]
9.Roquain E, van de Wiel MA. Optimal weighting for false discovery rate control. Electronic Journal of Statistics. 2009;3:678–711. [Google Scholar]
10.Holm S. A simple sequentially rejective multiple test procedure. Scand. J. Statist. 1979;6(2):65–70. [Google Scholar]
11.Goeman JJ, Solari A. The sequential rejection principle of familywise error control. Ann. Statist. 2010;38(6):3782–3810. [Google Scholar]
12.Westfall PH, Krishen A, Young SS. Using prior information to allocate significance levels for multiple endpoints. Statistics in Medicine. 1998;17:2107–2119. doi: 10.1002/(sici)1097-0258(19980930)17:18<2107::aid-sim910>3.0.co;2-w. [DOI] [PubMed] [Google Scholar]
13.Genovese C, Wasserman L. Bayesian statistics, 7 (Tenerife, 2002) New York: Oxford Univ. Press; 2003. Bayesian and frequentist multiple testing; pp. 145–161. With discussions by Merlise A. Clyde and Christian P. Robert and Judith Rousseau, and a reply by the authors. [Google Scholar]
14.Storey J. The optimal discovery procedure: a new approach to simultaneous significance testing. Journal of the Royal Statistical Society, Series B. 2007;69:347–368. [Google Scholar]
15.Sun W, Cai T. Oracle and adaptive compound decision rules for false discovery rate control. Journal of the American Statistical Association. 2007;102:901–912. [Google Scholar]
16.Sarkar SK, Zhou T, Ghosh D. A general decision theoretic formulation of procedures controlling FDR and FNR from a Bayesian perspective. Statist. Sinica. 2008;18(3):925–945. [Google Scholar]
17.Kang G, Ye K, Liu N, Allison D, Gao G. Weighted multiple hypothesis testing procedures. Statistical Applications in Genetics and Molecular Biology. 2009;8:1–21. doi: 10.2202/1544-6115.1437. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Finner H, Dickhaus T, Roters M. On the false discovery rate and an asymptotically optimal rejection curve. Ann. Statist. 2009;37(2):596–618. [Google Scholar]
19.Habiger JD, Peña EA. Compound p-value statistics for multiple testing procedures. J. Multivariate Anal. 2014;126:153–166. doi: 10.1016/j.jmva.2014.01.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Müller P, Parmigiani G, Robert C, Rousseau J. Optimal sample size for multiple testing: the case of gene expression microarrays. J. Amer. Statist. Assoc. 2004;99(468):990–1001. [Google Scholar]
21.Scott J, Berger J. An exploration of aspects of Bayesian multiple testing. Journal of Statistical Planning and Inference. 2006;136:2144–2162. [Google Scholar]
22.Guindani M, Muller P, Zhang S. A Bayesian discovery procedure. JRSS B. 2009;71:905–925. doi: 10.1111/j.1467-9868.2009.00714.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Bogdan M, Chakrabarti A, Frommlet F, Ghosh JK. Asymptotic Bayes-optimality under sparsity of some multiple testing procedures. Ann. Statist. 2011;39(3):1551–1579. [Google Scholar]
24.Wu W, Peña EA. Bayes multiple decision functions. Electron. J. Stat. 2013;7:1272–1300. doi: 10.1214/13-EJS813. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Habiger JD, Peña EA. Randomized P-values and nonparametric procedures in multiple testing. J. Nonparametr. Stat. 2011;23(3):583–604. doi: 10.1080/10485252.2010.482154. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Šidák Z. Rectangular confidence regions for the means of multivariate normal distributions. J. Amer. Statist. Assoc. 1967;62:626–633. [Google Scholar]
27.Dudoit S, van der Laan MJ. Springer Series in Statistics. New York: Springer; 2008. Multiple testing procedures with applications to genomics. [Google Scholar]
28.Blanchard G, Roquain É. Adaptive false discovery rate control under independence and dependence. J. Mach. Learn. Res. 2009;10:2837–2871. [Google Scholar]
29.Peña EA, Habiger JD, Wu W. Supplement to ‘Power-Enhanced Multiple Decision Functions Controlling Family-Wise Error and False Discovery Rates’. 2011 doi: 10.1214/10-aos844. [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Doob JL. Stochastic processes. New York: John Wiley & Sons Inc.; 1953. [Google Scholar]
31.Hoeffding W. On the distribution of the number of successes in independent trials. Ann. Math. Statist. 1956;27:713–721. [Google Scholar]
32.Sarkar SK, Chang CK. The Simes method for multiple hypothesis testing with positively dependent test statistics. J. Amer. Statist. Assoc. 1997;92(440):1601–1608. [Google Scholar]
33.Benjamini Y, Yekutieli D. The control of the false discovery rate in multiple testing under dependency. Ann. Statist. 2001;29(4):1165–1188. [Google Scholar]
34.Sarkar SK. On methods controlling the false discovery rate. Sankhyā. 2008;70(2, Ser. A):135–168. [Google Scholar]
35.Sarkar SK. Stepup procedures controlling generalized FWER and generalized FDR. Ann. Statist. 2007;35(6):2405–2420. [Google Scholar]

[R1] 1.Efron B. Microarrays, empirical Bayes and the two-groups model. Statist. Sci. 2008;23(1):1–22. [Google Scholar]

[R2] 2.Efron B. Simultaneous inference: When should hypothesis testing problems be combined? The Annals of Applied Statistics. 2008;1:197–223. [Google Scholar]

[R3] 3.Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B. 1995;57(1):289–300. [Google Scholar]

[R4] 4.Peña EA, Habiger JD, Wu W. Power-enhanced multiple decision functions controlling family-wise error and false discovery rates. Ann. Statist. 2011;39(1):556–583. doi: 10.1214/10-aos844. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] 5.Westfall PH, Krishen A. Optimally weighted, fixed sequence and gatekeeper multiple testing procedures. J. Statist. Plann. Inference. 2001;99(1):25–40. [Google Scholar]

[R6] 6.Genovese CR, Roeder K, Wasserman L. False discovery control with p-value weighting. Biometrika. 2006;93(3):509–524. [Google Scholar]

[R7] 7.Roeder K, Wasserman L. Genome-wide significance levels and weighted hypothesis testing. Statist. Sci. 2009;24(4):398–413. doi: 10.1214/09-STS289. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] 8.Blanchard G, Roquain E. Two simple sufficient conditions for FDR control. Electron. J. Stat. 2008;2:963–992. [Google Scholar]

[R9] 9.Roquain E, van de Wiel MA. Optimal weighting for false discovery rate control. Electronic Journal of Statistics. 2009;3:678–711. [Google Scholar]

[R10] 10.Holm S. A simple sequentially rejective multiple test procedure. Scand. J. Statist. 1979;6(2):65–70. [Google Scholar]

[R11] 11.Goeman JJ, Solari A. The sequential rejection principle of familywise error control. Ann. Statist. 2010;38(6):3782–3810. [Google Scholar]

[R12] 12.Westfall PH, Krishen A, Young SS. Using prior information to allocate significance levels for multiple endpoints. Statistics in Medicine. 1998;17:2107–2119. doi: 10.1002/(sici)1097-0258(19980930)17:18<2107::aid-sim910>3.0.co;2-w. [DOI] [PubMed] [Google Scholar]

[R13] 13.Genovese C, Wasserman L. Bayesian statistics, 7 (Tenerife, 2002) New York: Oxford Univ. Press; 2003. Bayesian and frequentist multiple testing; pp. 145–161. With discussions by Merlise A. Clyde and Christian P. Robert and Judith Rousseau, and a reply by the authors. [Google Scholar]

[R14] 14.Storey J. The optimal discovery procedure: a new approach to simultaneous significance testing. Journal of the Royal Statistical Society, Series B. 2007;69:347–368. [Google Scholar]

[R15] 15.Sun W, Cai T. Oracle and adaptive compound decision rules for false discovery rate control. Journal of the American Statistical Association. 2007;102:901–912. [Google Scholar]

[R16] 16.Sarkar SK, Zhou T, Ghosh D. A general decision theoretic formulation of procedures controlling FDR and FNR from a Bayesian perspective. Statist. Sinica. 2008;18(3):925–945. [Google Scholar]

[R17] 17.Kang G, Ye K, Liu N, Allison D, Gao G. Weighted multiple hypothesis testing procedures. Statistical Applications in Genetics and Molecular Biology. 2009;8:1–21. doi: 10.2202/1544-6115.1437. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] 18.Finner H, Dickhaus T, Roters M. On the false discovery rate and an asymptotically optimal rejection curve. Ann. Statist. 2009;37(2):596–618. [Google Scholar]

[R19] 19.Habiger JD, Peña EA. Compound p-value statistics for multiple testing procedures. J. Multivariate Anal. 2014;126:153–166. doi: 10.1016/j.jmva.2014.01.007. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] 20.Müller P, Parmigiani G, Robert C, Rousseau J. Optimal sample size for multiple testing: the case of gene expression microarrays. J. Amer. Statist. Assoc. 2004;99(468):990–1001. [Google Scholar]

[R21] 21.Scott J, Berger J. An exploration of aspects of Bayesian multiple testing. Journal of Statistical Planning and Inference. 2006;136:2144–2162. [Google Scholar]

[R22] 22.Guindani M, Muller P, Zhang S. A Bayesian discovery procedure. JRSS B. 2009;71:905–925. doi: 10.1111/j.1467-9868.2009.00714.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R23] 23.Bogdan M, Chakrabarti A, Frommlet F, Ghosh JK. Asymptotic Bayes-optimality under sparsity of some multiple testing procedures. Ann. Statist. 2011;39(3):1551–1579. [Google Scholar]

[R24] 24.Wu W, Peña EA. Bayes multiple decision functions. Electron. J. Stat. 2013;7:1272–1300. doi: 10.1214/13-EJS813. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R25] 25.Habiger JD, Peña EA. Randomized P-values and nonparametric procedures in multiple testing. J. Nonparametr. Stat. 2011;23(3):583–604. doi: 10.1080/10485252.2010.482154. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R26] 26.Šidák Z. Rectangular confidence regions for the means of multivariate normal distributions. J. Amer. Statist. Assoc. 1967;62:626–633. [Google Scholar]

[R27] 27.Dudoit S, van der Laan MJ. Springer Series in Statistics. New York: Springer; 2008. Multiple testing procedures with applications to genomics. [Google Scholar]

[R28] 28.Blanchard G, Roquain É. Adaptive false discovery rate control under independence and dependence. J. Mach. Learn. Res. 2009;10:2837–2871. [Google Scholar]

[R29] 29.Peña EA, Habiger JD, Wu W. Supplement to ‘Power-Enhanced Multiple Decision Functions Controlling Family-Wise Error and False Discovery Rates’. 2011 doi: 10.1214/10-aos844. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R30] 30.Doob JL. Stochastic processes. New York: John Wiley & Sons Inc.; 1953. [Google Scholar]

[R31] 31.Hoeffding W. On the distribution of the number of successes in independent trials. Ann. Math. Statist. 1956;27:713–721. [Google Scholar]

[R32] 32.Sarkar SK, Chang CK. The Simes method for multiple hypothesis testing with positively dependent test statistics. J. Amer. Statist. Assoc. 1997;92(440):1601–1608. [Google Scholar]

[R33] 33.Benjamini Y, Yekutieli D. The control of the false discovery rate in multiple testing under dependency. Ann. Statist. 2001;29(4):1165–1188. [Google Scholar]

[R34] 34.Sarkar SK. On methods controlling the false discovery rate. Sankhyā. 2008;70(2, Ser. A):135–168. [Google Scholar]

[R35] 35.Sarkar SK. Stepup procedures controlling generalized FWER and generalized FDR. Ann. Statist. 2007;35(6):2405–2420. [Google Scholar]

PERMALINK

Classes of Multiple Decision Functions Strongly Controlling FWER and FDR

Edsel A Peña

Joshua D Habiger

Wensong Wu

Abstract

1 Introduction and Motivation

2 Essence of General Results via Concrete Situation

3 Mathematical Underpinnings

3.1 Decision-Theoretic Elements

3.2 Multiple Decision Functions

3.3 Multiple Decision Size Functions

Fig. 1.

4 Main Theorems and Classes of MDFs

5 Proofs of Theorems

5.1 Establishing Theorem 1

5.2 Establishing Theorem 2

6 Representations of MDFs in Terms of the Generalized p-Values

7 On Developing Optimal MDFs

7.1 Computing Optimal Size Functions

7.2 Results of a Modest Simulation Study

Table 1.

Table 2.

8 Concluding Remarks

Acknowledgments

Footnotes

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Classes of Multiple Decision Functions Strongly Controlling FWER and FDR

Edsel A Peña

Joshua D Habiger

Wensong Wu

Abstract

1 Introduction and Motivation

2 Essence of General Results via Concrete Situation

3 Mathematical Underpinnings

3.1 Decision-Theoretic Elements

3.2 Multiple Decision Functions

3.3 Multiple Decision Size Functions

Fig. 1.

4 Main Theorems and Classes of MDFs

5 Proofs of Theorems

5.1 Establishing Theorem 1

5.2 Establishing Theorem 2

6 Representations of MDFs in Terms of the Generalized p-Values

7 On Developing Optimal MDFs

7.1 Computing Optimal Size Functions

7.2 Results of a Modest Simulation Study

Table 1.

Table 2.

8 Concluding Remarks

Acknowledgments

Footnotes

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases