POWER-ENHANCED MULTIPLE DECISION FUNCTIONS CONTROLLING FAMILY-WISE ERROR AND FALSE DISCOVERY RATES

Edsel A Peña; Joshua D Habiger; Wensong Wu

doi:10.1214/10-aos844

. Author manuscript; available in PMC: 2014 Jul 10.

Published in final edited form as: Ann Stat. 2011 Feb;39(1):556–583. doi: 10.1214/10-aos844

POWER-ENHANCED MULTIPLE DECISION FUNCTIONS CONTROLLING FAMILY-WISE ERROR AND FALSE DISCOVERY RATES

Edsel A Peña ^1,¹, Joshua D Habiger ¹, Wensong Wu ¹

PMCID: PMC4091923 NIHMSID: NIHMS585692 PMID: 25018568

Abstract

Improved procedures, in terms of smaller missed discovery rates (MDR), for performing multiple hypotheses testing with weak and strong control of the family-wise error rate (FWER) or the false discovery rate (FDR) are developed and studied. The improvement over existing procedures such as the Šidák procedure for FWER control and the Benjamini–Hochberg (BH) procedure for FDR control is achieved by exploiting possible differences in the powers of the individual tests. Results signal the need to take into account the powers of the individual tests and to have multiple hypotheses decision functions which are not limited to simply using the individual p-values, as is the case, for example, with the Šidák, Bonferroni, or BH procedures. They also enhance understanding of the role of the powers of individual tests, or more precisely the receiver operating characteristic (ROC) functions of decision processes, in the search for better multiple hypotheses testing procedures. A decision-theoretic framework is utilized, and through auxiliary randomizers the procedures could be used with discrete or mixed-type data or with rank-based nonparametric tests. This is in contrast to existing p-value based procedures whose theoretical validity is contingent on each of these p-value statistics being stochastically equal to or greater than a standard uniform variable under the null hypothesis. Proposed procedures are relevant in the analysis of high-dimensional “large M, small n” data sets arising in the natural, physical, medical, economic and social sciences, whose generation and creation is accelerated by advances in high-throughput technology, notably, but not limited to, microarray technology.

Key words and phrases: Benjamini–Hochberg procedure, Bonferroni procedure, decision process, false discovery rate (FDR), family wise error rate (FWER), Lagrangian optimization, Neyman–Pearson most powerful test, microarray analysis, reverse martingale, missed discovery rate (MDR), multiple decision function and process, multiple hypotheses testing, optional sampling theorem, power function, randomized p-values, generalized multiple decision p-values, ROC function, Šidák procedure

1. Introduction and motivation

The advent of modern technology, epitomized by the microarray, has led to the generation of very high-dimensional data pertaining to characteristics of a large number, M, of attributes, hereon called genes, associated with usually a small number, n, of units or subjects. Several such data sets are, for example, described in [10], and these are the inputs to so-called parallel inference problems. The most common form of inference is multiple hypotheses testing, wherein for the mth gene there are two competing hypotheses, a null hypothesis H_m₀ and an alternative hypothesis H_m₁, for which a decision is to be made based on the data. In such multiple decision-making, there is a need to be cognizant and cautious of the Hyde-ian nature of multiplicity, while also exploiting the Jekyll-ian potentials of multiplicity [39]. Furthermore, this entails a tenuous balance between two competing desires: controlling the rate of rejection of correct null hypotheses, while at the same time maintaining the rate of discovery of correct alternative hypotheses.

As in single-pair hypothesis testing, a type I error occurs when a correct null hypothesis is rejected, while a type II error occurs when a false null hypothesis is not rejected. Several type I errors have been proposed in multiple testing; see [6] and [7]. Our focus is on the weak family wise error rate (FWER), the probability of rejecting at least one null hypothesis when all the nulls are correct; strong FWER, the probability of rejecting at least one correct null hypothesis; and false discovery rate (FDR), the expected proportion of the number of false rejections of nulls relative to the number of rejections [1, 37]. Our type II error rate is the missed discovery rate (MDR), the expected number of false nonrejections of null hypotheses. Other type II errors have been discussed in [5–7, 9, 41]. The usual framework in developing multiple decision functions is to bound the chosen type I error rate, and then minimize or make small the MDR. For example, a procedure controlling weak FWER, under an independence assumption, is that of Šidák [36]; while a conservative one not requiring independence is the Bonferroni procedure [3]. For FDR control, the most common procedure is the BH procedure [1]. Control of type I error measures related to the FDR have also been discussed in [8–10, 12, 15, 40, 41, 45], while [20, 23, 34] focused on estimation of the proportion of correct null hypotheses.

Procedures like the Šidák, Bonferroni and BH, rely on the set of p-values of individual tests. Their validity hinges on each p-value statistic being stochastically equal to or greater than a standard uniform variable under the null hypothesis. This fails, however, with noncontinuous variables or when rank-based nonparametric tests are used. Crucially, p-value based procedures also do not exploit the power characteristics of the individual tests, contrary to Neyman and Pearson’s [27] adage that such considerations are germane in constructing optimal tests. Such p-value based procedures are fine in exchangeable settings where power characteristics of the individual tests are identical, but not in situations where genes or subclasses of genes have different structures; see [11, 13, 29].

Some papers dealing with procedures exploiting the power functions are [38, 49]. The use of weighted p-values to improve type II performance have also been explored in [16, 21, 29, 30, 46]. Other approaches for optimal procedures are those in [42, 43] which employ a Neyman–Pearson approach and [45] where oracle and adaptive compound rules were obtained. Compound rules are characterized by information borrowing from each of the genes, so a decision function for a specific gene utilizes information from other genes. Decision-theoretic and Bayesian approaches were also implemented in [10, 17, 26, 33, 35]. More recently, [11] argues for separate subclass analysis, while [13] proposed use of external covariates, with the procedures having a Bayes and empirical Bayes flavor.

The main goal of this paper is to develop better multiple testing procedures controlling weak FWER, strong FWER and FDR by taking into account the individual powers of the tests. We focus on the most fundamental setting where the null and alternative hypotheses for each gene are both simple. This is also the setting in [29]. This admits, as starting point, the Neyman–Pearson most powerful (MP) test for each pair of hypotheses. Each MP test will have a power, but we will see that it is beneficial to look at each of these powers as function of their MP test’s size, their so-called receiver operating characteristic (ROC) function.

The paper proceeds as follows. Section 2 presents the decision-theoretic elements. Section 3 reviews and reexamines MP tests, p-value statistics and ROC functions. Section 4 develops the optimal weak FWER-controlling procedure, with existence and uniqueness established in Section 4.2. Section 4.3 analytically describes the procedure for differentiable ROC functions. Section 4.4 provides a concrete example using normal distributions, while Section 4.5 discusses a size-investing strategy for optimality. Section 5 discusses limitations, extensions and connections: Section 5.1 deals with the restriction to the class of simple procedures; Section 5.2 deals with extensions to the composite hypotheses setting in the presence of the monotone likelihood ratio (MLR) property; and Section 5.3 relates the optimal procedure to weighted p-value based procedures. Section 6 develops an improved procedure which strongly controls the FWER, whereas Section 7 develops an improved procedure which controls FDR. The development of these new procedures is anchored on the weak FWER-controlling optimal procedure. We establish that the sequential Šidák and BH procedures are special cases of these more general procedures. Section 8 provides a modest simulation study demonstrating that the new FDR-controlling procedure improves on the BH procedure. Section 9 contains a summary and some concluding remarks.

To manage the length of the paper and provide more focus on the main ideas and results, technical proofs of lemmas, propositions, theorems and corollaries are all gathered in the supplemental article [28].

2. Mathematical setting

Let (Ω, Inline graphic , P) be a probability space and = {1, 2, …, M} an index set with M a known positive integer. For each m ∈ , let X_m: (Ω, ) → ( , ), some space with σ-field of subsets . Form the product space ( , ) with = and = σ( ) so X = (X₁, X₂, …, X_M): (Ω, ) → ( Inline graphic , ). The probability measure of X is Q = PX⁻¹, while the (marginal) probability measure of X_m is $Q_{m} = P X_{m}^{- 1}$ . For each m ∈ , let Q_m₀ and Q_m₁ be two known probability measures on ( , ). We assume that Q ∈ , a class of probability measures on ( , ) with marginal probability measure Q_m ∈ {Q_m₀, Q_m₁} for each m ∈ Inline graphic . Let θ = (θ₁, …, θ_M): → Θ ≡ {0, 1}^M with θ_m(Q) = I {Q_m = Q_m₁}, I {·} denoting indicator function. De-fine, for each Q ∈ , the subcollections ≡ (Q) = {m ∈ : θ_m(Q) = 0} and ≡ (Q) = {m ∈ : θ_m(Q) = 1}. In this paper, we shall impose an independence condition given by:

Condition (I)

(X_m, m ∈ Inline graphic (Q)) is an independent collection of random entities, that is, ∀B_m ∈ , Q( B_m) = Q_m(B_m).

However, the collection (X_m, m ∈ Inline graphic (Q)) need not be an independent collection, but it is independent of (X_m, m ∈ (Q)). Two extreme subcollections of are = {Q ∈ : θ_m(Q) = 0, ∀m ∈ } and = {Q ∈ : θ_m(Q) = 1, ∀m ∈ }. By Condition (I), is a singleton set, Q₀ will denote its element; while Inline graphic need not be a singleton set. The decision problem is to determine (Q) and (Q) based on X, which is equivalent to simultaneously testing the M pairs of hypotheses H_m₀: Q_m = Q_m₀ versus H_m₁: Q_m = Q_m₁ for m ∈ .

We adopt a decision-theoretic framework similar to [33]. The action space is Inline graphic = {0, 1}^M with generic element a = (a₁, a₂, …, a_M)^t ∈ with a_m = 0(1) meaning H_m₀ is accepted (rejected). The parameter space is , though the effective parameter space is Θ = {0, 1}^M with generic element θ = (θ₁, θ₂, …, θ_M)^t. We introduce several loss functions, L: Inline graphic × → ℜ₊, defined via

L_{0} (a, Q) = I {a^{t} (1 - θ (Q)) \geq 1};

(2.1)

L_{1} (a, Q) = [\frac{a^{t} (1 - θ (Q))}{a^{t} 1}] I {a^{t} 1 > 0};

(2.2)

L_{2} (a, Q) = {(1 - a)}^{t} θ (Q),

(2.3)

with the convention that 0/0 = 0 and 1 is an M × 1 vector of 1’s. The loss function L₀(a, Q) equals 1 if and only if at least one false discovery is committed. The loss L₁(a, Q) is the false discovery proportion, being the ratio between the number of false discoveries and the number of discoveries; whereas the loss L₂(a, Q) is the number of missed discoveries being the number of true alternative hypotheses that were not discovered. We focus on this missed discovery number since the relevant question is how many correct alternatives [θ(Q)^t 1] were missed by using the action a? See also [29] which essentially uses this loss function to induce their power metric. Other types of losses, such as the false negative proportion with (a, Q) ↦ [(1 − a)^t θ(Q)]/[(1 − a)^t 1]I {(1 − a)^t 1 > 0}, have also been considered; see [15, 33].

A nonrandomized multiple decision function (MDF) is a δ: ( Inline graphic , ) → ( , σ ( )), where σ ( ) is the power set of . Such an MDF may be represented by δ(x) = (δ₁(x), δ₂(x), …, δ_M (x))^t, where δ_m(x) ∈ {0, 1}. In general, each δ_m could be made to depend on the full data x instead of just x_m. We denote by the class of all nonrandomized MDFs. A randomized MDF may also be considered. Denote by Inline graphic ( ) the space of all probability measures over ( , σ ( )). A randomized MDF is a δ^*: ( , ) → ( ( ), σ ( ( ))). For a realization X = x, an action is chosen from according to the probability measure δ^*(x). Denote by the space of all randomized MDFs. Clearly, ⊂ . By augmenting data X with a randomizer U ~ U (0, 1) which is independent of X, randomized MDFs could be made nonrandomized with respect to the augmented data (X, U). Henceforth, Inline graphic represents all nonrandomized MDFs δ(X, U)’s based on (X, U).

For brevity of notation, P_Q{f (X, U) ∈ B} and E_Q{f (X, U)} represent probability and expectation with respect to (X, U) with X ~ Q, U ~ U (0, 1) and X and U independent. For δ ∈ Inline graphic and the loss functions defined earlier, we have the risk functions

R_{0} (δ, Q) = E_{Q} {L_{0} (δ (X, U), Q)};

(2.4)

R_{1} (δ, Q) = E_{Q} {L_{1} (δ (X, U), Q)};

(2.5)

R_{2} (δ, Q) = E_{Q} {L_{2} (δ (X, U), Q)} .

(2.6)

Given a δ = (δ₁, δ₂, …, δ_M)^t, let π_δ (Q) = (π_δ₁ (Q), π_δ₂ (Q), …, π_{δ_M} (Q))^t with π_{δ_m} (Q) = E_Q{δ_m(X, U)} be its vector of power functions. Then (2.6) becomes R₂(δ, Q) = (1 − π_δ (Q))^t θ(Q). In terms of these risk functions, for δ ∈ Inline graphic , its weak FWER is FWER(δ) = R₀(δ, Q₀). If each δ_m depends only on X_m and U, by Condition (I),

FWER (δ) = 1 - E {\prod_{m \in M} [1 - P_{Q_{m 0}} {δ_{m} (X_{m}, U) = 1 ∣ U}]},

(2.7)

where the expectation is with respect to U. When Q = Q₀ and with the mth component $δ_{m}^{*}$ of the randomized MDF depending only on X_m, an alternative formulation is to have U = (U₁, U₂, …, U_M) a vector of i.i.d. U (0, 1) variables which is independent of the X_m’s. The mth component may then be re-defined via $δ_{m} (X_{m}, U_{m}) = I {U_{m} \leq δ_{m}^{*} (X_{m})}$ . Then (2.7) becomes FWER(δ) = 1 − Inline graphic [1 − P_{Q_m0} {δ_m(X_m, U_m) = 1}].

The risk function R₁(δ, Q) is the false discovery rate (FDR) of δ at Q [1]; while the risk function R₂(δ, Q) will be called the missed discovery rate (MDR) of δ at Q. The adjective “rate” is somewhat misleading since R₂(δ, Q) takes values in [0, | Inline graphic (Q)|] instead of [0, 1]; however, this does not cause difficulty since, given the true underlying probability measure Q of X, | (Q)| is constant. This risk is related to the expected number of true positives (ETP), an error measure used in [38, 42], via ETP(δ, Q) = | (Q)| − R₂(δ, Q).

To find an optimal MDF weakly controlling FWER in a subclass Inline graphic ⊆ , a threshold α ∈ (0, 1) is specified and then we seek a δ^* ∈ with R₀(δ^*, Q₀) = FWER(δ^*) ≤ α, and such that for any δ ∈ satisfying R₀(δ, Q₀) = FWER(δ) ≤ α, we have R₂(δ^*, Q) ≤ R₂(δ, Q). This criterion has a minimax flavor. One may require only that R₂(δ^*, Q^*) ≤ R₂(δ, Q^*) where Q^* is the true, but unknown, probability law of X; but this may be too strong to preclude a solution to the optimization problem. However, see [42] for a situation with a different type I error and where an optimal, albeit an oracle, solution for minimizing R₂(δ, Q^*) is possible. Observe that for δ ∈ Inline graphic , by using the representation of R₂(δ, Q) in terms of the powers, R₂(δ, Q) = R₂(δ, Q) = M − π_{δ_m} (Q). The optimality condition on the MDR amounts therefore to maximizing π_{δ_m} (Q_m₁). Interestingly, if we had standardized the loss function L₂(a, Q) to take values in [0, 1] via division by | Inline graphic (Q)| = θ(Q)^t 1, the minimax justification does not carry through!

For strong FWER control, we seek a compound MDF, δ^* ∈ Inline graphic , with R₀(δ^*, Q^*) ≤ α whatever the true, but unknown, probability law Q^* of X is, and with $\sum_{m \in M} π_{δ_{m}^{*}} (Q_{m 1})$ large, possibly maximal, among all δ ∈ satisfying R₀(δ, Q^*) ≤ α. For (strong) FDR-control, a threshold q^* ∈ (0, 1) is specified and we seek a compound MDF, δ^* ∈ Inline graphic , such that, whatever Q^* is, R₁(δ^*, Q^*) ≤ q^*, and with $\sum_{m \in M} π_{δ_{m}^{*}} (Q_{m 1})$ large, possibly maximal, among all δ ∈ satisfying R₁(δ, Q^*) ≤ q^*. For discussion of weak and strong control, refer to [6, 7]. Discussion of optimality in multiple testing can be found in [25] where maximin optimality results are established for some step-down and step-up MTPs.

3. Revisiting MP tests and p-value statistics

An MDF δ ∈ Inline graphic whose mth component δ_m depends only on (X_m, U_m) for every m ∈ is called simple; otherwise, it is compound. The subclass of simple MDFs, denoted by , will be our initial focus in searching for an optimal weak FWER-controlling MDF. The resulting optimal MDF will then anchor our search for strong FWER- and FDR-controlling compound MDFs. Before implementing this program, we introduce the unifying concept of decision processes.

3.1. Decision processes, ROC functions, p-value statistics

First, a brief review. Let X: (Ω, Inline graphic ) → ( , ) and Q = PX⁻¹. Based on X, consider testing the pair of hypotheses H₀: Q = Q₀ versus H₁: Q = Q₁, where Q₀ and Q₁ are two probability measures on ( , ). Let q₀ and q₁ be versions of the densities of Q₀ and Q₁ with respect to some fixed dominating measure ν, for example, ν = Q₀ + Q₁. Recall that a test or decision function is a δ: ( Inline graphic , ) → ([0, 1], σ [0, 1]), with σ [0, 1] the Borel sigma-field on [0, 1]. Given X = x, δ (x) is the probability of deciding in favor of H₁. Its size is α_δ = E_Q₀δ(X); it is of level α ∈ [0, 1] if α_δ ≤ α. Its power is π_δ = E_Q₁ δ(X). δ^* is most powerful (MP) of level α if α_δ^* ≤ α and for all δ with α_δ ≤ α, we have π_δ^* ≥ π_δ.

Definition 3.1

A collection Δ = {δ_η: η ∈ [0, 1]} of test functions such that, a.e. [Q], δ₀(x) = 0, δ₁(x) = 1 and η ↦ δ_η (x) is nondecreasing and right-continuous, is a decision process. Its size function is A_Δ: [0, 1] → [0, 1] and its power function is ρ_Δ: [0, 1] → [0, 1], where A_Δ (η) = α_{δ_η}= E_Q₀ δ_η(X) and ρ_Δ(η) = π_{δ_η} = E_Q₁ δ_η(X). Its receiver operating characteristic (ROC) curve is ROC(Δ) ≡ Graph{(A_Δ (η), ρ_Δ (η)): η ∈ [0, 1]}. If A_Δ (η) = η for all η ∈ [0, 1], η ↦ ρ_Δ (η) is the ROC function of Δ.

The use of the phrase power function in Definition 3.1 is atypical since we are not viewing this as a function of a parameter as is the usual meaning of this phrase. However, for lack of a better name, we shall adopt this terminology. In the sequel, δ_η and δ(η) will be used interchangeably to also represent δ(·; η).

Let L: ( Inline graphic , ) → (ℜ₊, σ (ℜ₊)) be a version of the likelihood ratio function: L(x) = q₁(x)/q₀(x) a.e. [ν]. Let G₀(·) and G₁(·) be the distribution functions of L(X) when (X) = Q₀ and (X) = Q₁, where (X) is probability measure of X. For a monotone nondecreasing right-continuous function M(·) from ℜ into ℜ, let M⁻¹(r) = inf{x ∈ ℜ: M(x) ≥ r} and ΔM(r) = M(r) − M(r−). By the Neyman–Pearson fundamental lemma [27], the MP test function of level η for testing H₀ versus H₁ is

δ^{*} (X; η) \equiv δ_{η}^{*} = I {L (X) > c (η)} + γ (η) I {L (X) = c (η)},

(3.1)

where $c (η) = G_{0}^{- 1} (1 - η)$ and γ (η) = (G₀(c(η)) − (1 − η))/ΔG₀(c(η)). Let U ~ U (0, 1) be independent of X. Redefine δ^* via $δ_{η}^{* *} \equiv δ^{* *} (X, U; η) = I {U \leq δ^{*} (X; η)}$ , which is nonrandomized w.r.t. (X, U). In essence, with the aid of an auxiliary randomizer U, the MP test could always be made nonrandomized. The decision process formed from these MP tests, given by

Δ^{*} = {δ_{η}^{* *} : η \in [0, 1]} = {δ_{η}^{* *} : η \in [0, 1]},

(3.2)

is called the most powerful (MP) decision process. The power (at Q = Q₁) of the MP test $δ_{η}^{*}$ or $δ_{η}^{* *}$ is

ρ_{Δ^{*}} (η) \equiv π_{δ_{η}^{*}} = π_{δ_{η}^{* *}} = 1 - G_{1} (c (η)) + γ (η) Δ G_{1} (c (η)) .

(3.3)

It is well known [24] that $π_{δ_{η}^{*}} < 1$ implies $α_{δ_{η}^{*}} = η$ . We denote by A_Δ^* and ρ_Δ^* the size and power functions of Δ^*. If $π_{δ_{η}^{*}} < 1$ for all η < 1, then η ↦ ρ_Δ^* (η) is the ROC function of Δ^*. We present below some important properties of this function.

Before stating the proposition, we reiterate that all formal proofs of propositions, theorems, lemmas and corollaries are in the supplemental article [28].

Proposition 3.1

The function ρ_Δ^*: [0, 1] → [0, 1] in (3.3) is concave, continuous and nondecreasing. Furthermore, ρ_Δ^* (η) ≥ η and it is strictly increasing on the set Inline graphic ≡ {η ∈ [0, 1]: ρ_Δ^* (η) < 1}.

Definition 3.2

Let Δ = {δ_η: η ∈ [0, 1]} be a decision process, where δ_η: ( Inline graphic × [0, 1], ⊗ σ [0, 1]) → ({0, 1}, σ{0, 1}). Its (randomized) p-value statistic is S_Δ: ( × [0, 1], ⊗ σ[0, 1]) → ([0, 1], σ [0, 1]) with S_Δ (x, u) = inf{η ∈ [0, 1]: δ_η(x, u) = 1}.

When ∀(η, x, u) : δ_η(x, u) = δ_η(x), then S_Δ(X, U) is the usual p-value statistic. See also [4] for a more specialized definition of a randomized p-value statistic. We refer the reader to [18] for properties of this p-value statistic and its use in existing FDR-controlling procedures.

Proposition 3.2

Let Δ = {δ_η : η ∈ [0, 1]} be a decision process with p-value statistic S_Δ. Then, for all s ∈ [0, 1], H₀(s) ≡ P_Q₀ (S_Δ ≤ s) = A_Δ(s) and H₁(s) ≡ P_Q₁ (S_Δ ≤ s) = π_δ₍_s₎ = ρ_Δ(s). Consequently, S_Δ ~ U[0, 1] under Inline graphic (X) = Q₀ if and only if ∀η ∈ [0, 1] : A_Δ(η) = η.

4. Optimal weak FWER control

Return now to the multiple decision problem in Section 2. We extend the notion of decision processes to the multiple decision setting.

Definition 4.1

A collection Δ = (Δ_m : m ∈ Inline graphic ), where Δ_m = (δ_m(η) : η ∈ [0, 1]) is a decision process on ( × [0, 1]^M, ⊗ σ[0, 1]^M), is a multiple decision process (MDP). It is simple if each Δ_m is simple; otherwise, it is compound. When simple its multiple decision size function is A_Δ = (A_{Δ_m} : m ∈ Inline graphic ) and its multiple decision ROC function is ρ_Δ = (ρ_{Δ_m} : m ∈ ), where A_{Δ_m} and ρ_{Δ_m} are the size and ROC functions of Δ_m.

4.1. Optimization problem

Let Δ be a simple MDP. Then, a multiple decision size vector η = (η_m : m ∈ Inline graphic ) ∈ ≡ [0, 1]^M determines from Δ an MDF δ_Δ(η) = (δ_m(η_m) : m ∈ ) ∈ . For this MDF, FWER(δ_Δ(η)) = 1 − [1 − A_{Δ_m} (η_m)] and R₂(δ_Δ(η), Q₁) = M − ρ_{Δ_m} (η_m) for Q₁ ∈ . Fix an FWER-threshold α ∈ (0, 1). Suppose there exists a multiple decision size vector $η_{Δ}^{*} (α) \in N$ such that

η_{Δ}^{*} (α) = \underset{η \in N}{arg max} {\sum_{m \in M} ρ_{Δ_{m}} (η_{m}) : \prod_{m \in M} [1 - A_{Δ_{m}} (η_{m})] \geq 1 - α} .

Then, $A_{Δ} (η_{Δ}^{*} (α)) = (A_{Δ_{m}} (η_{Δ, m}^{*} (α)) : m \in M)$ is the optimal multiple decision size vector for weak FWER control at α associated with the simple MDP Δ. The associated optimal simple MDF is $δ_{Δ} (A_{Δ} (η_{Δ}^{*} (α)))$ .

But, since H_m₀ and H_m₁ are both simple, then there exists a simple most powerful MDP, $Δ^{*} = (Δ_{m}^{*} : m \in M)$ , where $Δ_{m}^{*} = (δ_{m}^{*} (η) : η \in [0, 1])$ with $δ_{m}^{*} (η)$ being the simple Neyman–Pearson MP test function of size η for H_m₀ versus H_m₁. Consider the simple MDF obtained from Δ^* given by $δ_{m}^{*} (A_{Δ_{m}} (η_{Δ, m}^{*} (α))) : m \in M$ . This will satisfy the FWER constraint, and by virtue of the MP property of each $δ_{m}^{*} (A_{Δ_{m}} (η_{Δ, m}^{*} (α)))$ for each m ∈ Inline graphic ,

\sum_{m \in M} ρ_{Δ_{m}^{*}} (A_{Δ_{m}} (η_{Δ, m}^{*} (α))) \geq \sum_{m \in M} ρ_{Δ_{m}} (A_{Δ_{m}} (η_{Δ, m}^{*} (α))) .

Thus, in searching for the optimal weak FWER-controlling simple MDF, it suffices to restrict to the simple most powerful MDP Δ^*. Without loss of generality (wlog), we may assume $A_{Δ_{m}^{*}} (η) = η$ for m ∈ Inline graphic and η ∈ [0, 1]. The optimization problem reduces to finding $η_{Δ^{*}}^{*} (α) \in N$ satisfying

η_{Δ}^{*} (α) = \underset{η \in N}{arg max} {\sum_{m \in M} ρ_{Δ_{m}^{*}} (η_{m}) : \prod_{m \in M} (1 - η_{m}) \geq 1 - α} .

(4.1)

The optimal weak FWER-controlling simple MDF is then

δ_{W}^{*} (α) \equiv (δ_{m}^{*} (η_{Δ^{*}, m}^{*} (α)) : m \in M) .

(4.2)

Two well-known and conventional choices for the size vector η = (η_m : m ∈ Inline graphic ) which satisfy the weak FWER constraint are the Šidák sizes η_m = η_m(α) = 1 − (1 − α)^1/^M and the Bonferroni-adjusted sizes η_m = η_m(α) = α/M. The former requires the independence Condition (I) and is sharp, the latter is conservative but does not require Condition (I). Both ignore possible differences in power traits of the individual test functions.

4.2. Existence and uniqueness of optimal size vector

We establish the existence of an optimal multiple decision size vector for weak FWER control within the class Inline graphic . As pointed out in Section 4.1, it suffices to look for the optimal weak FWER-controlling simple MDF by starting with the most powerful simple MDP $Δ^{*} = (Δ_{m}^{*} : m \in M)$ . For brevity, $ρ_{m} \equiv ρ_{Δ_{m}^{*}}$ and $A_{m} (η) \equiv A_{Δ_{m}^{*}} (η) = η$ . Recall that Inline graphic = [0, 1]^M, the multiple decision size space. In a nutshell, the existence of an optimal multiple decision size vector for weak FWER control exploits convexity properties of relevant subsets of . This is formalized by establishing a sequence of propositions which are presented below. For α ∈ [0, 1], define the weak FWER constraint set

C_{α} = {\begin{cases} {η \in N : \sum_{m \in M} log (1 - η_{m}) \geq log (1 - α)}, & if α < 1, \\ N, & if α = 1. \end{cases}

(4.3)

Proposition 4.1

C_α satisfies (i) η = 0 ∈ C_α; (ii) (0, α_m) ∈ C_α for all m ∈ Inline graphic , where (0, α_m) is the zero-vector with the mth element replaced by α; and (iii) it is convex and closed.

Proposition 4.2

For η₀ ∈ Inline graphic let U(η₀) = {η ∈ : η_m ≥ η₀_m, ∀m ∈ }, the upper set of η₀, and let U B(C_α) = {η ∈ : C_α ∩ U(η) = {η}}, the upper boundary set of C_α. Then, for all α ∈ [0, 1), U B(C_α) = {η ∈ : log(1 − η_m) = log(1 − α)}.

Proposition 4.3

Let Inline graphic ≡ {η ∈ : ρ_m(η_m) ≥ Mb} for b ∈ [0, 1]. Then { : b ∈ [0, 1]} satisfies (i) η = 1 ∈ , (ii) it is closed and convex, and (iii) = ⊇ ⊇ for 0 ≤ b₁ ≤ b₂ ≤ 1.

Proposition 4.4

Let B_α = {b ∈ [0, 1] : Inline graphic ∩ C_α ≠ Ø} for α ∈ [0, 1) and let $b_{α}^{*} = sup B_{α}$ . Then $B_{α} = [0, b_{α}^{*}]$ .

Building on these intermediate results, the existence of an optimal weak FWER-controlling multiple decision size vector is obtained.

Theorem 4.1 (Existence)

Let α ∈ [0, 1). Then $C_{α} \cap N_{b_{α}^{*}} \neq \emptyset$ . Furthermore, η ∈ Inline graphic is a weak FWER-α optimal multiple decision size vector if and only if $η \in C_{α} \cap N_{b_{α}^{*}}$ .

Theorem 4.1 guarantees existence of an optimal weak FWER multiple decision size vector, but it does not address whether the solution is unique. We present a result on this issue in the following theorem.

Theorem 4.2 (Uniqueness)

Let α ∈ [0, 1) and define C_α(m) = {η_m ∈ [0, 1] : η ∈ C_α}, called the mth section of C_α. If, for all m ∈ Inline graphic , the mapping η_m ↦ ρ_m(η_m) is strictly increasing on C_α(m), then the optimal weak FWER-α multiple decision size vector is unique and it is the η^* satisfying $C_{α} \cap N_{b_{α}^{*}} = {η^{*}}$ .

It is easy to see that a sufficient condition for uniqueness of the optimal size vector is that, for all m ∈ Inline graphic , η_m ∈ [0, sup C_α(η_m)) ⇒ ρ_m(η_m) < 1. Nonuniqueness may occur with nonregular families of densities, for example, uniform or shifted exponential, where the power of the MP test may equal one even though its size is still less than one. It occurs if the decision processes in the MDP do not satisfy the condition that ∀η ∈ [0, 1], ∀m ∈ Inline graphic , A_m(η) = η, which is the case with discrete data or when using nonparametric rank-based test functions with randomization not permitted.

4.3. Finding optimal size vector

Generally, without differentiability of the ROC functions as in the case with discrete distributions, linear or nonlinear programming methods are needed to obtain the optimal solution. In the case, however, where the ROC functions are twice-differentiable, the optimal size vector is in a more explicit form.

Theorem 4.3

Let $Δ^{*} = (Δ_{m}^{*}, m \in M)$ be the MP MDP, and assume that the ROC functions η_m ↦ ρ_m(η_m) are strictly increasing and twice-differentiable with first and second derivatives $ρ_{m}^{'}$ and $ρ_{m}^{″}$ , respectively. Given α ∈ (0, 1), the optimal weak FWER-α multiple decision size vector $η^{*} \equiv η_{Δ^{*}}^{*} (α) = (η_{m}^{*} (α), m \in M)$ is the η ∈ Inline graphic satisfying (i) for some λ ∈ ℜ₊, ∀m ∈ , $ρ_{m}^{'} (η_{m}) (1 - η_{m}) = λ$ and (ii) log(1 − η_m) = log(1 − α).

A question arises as to whether the optimal sizes are monotonic in α. Such a property is desirable since it will imply that if at FWER size α₁ we have δ_m(η_m(α₁)) = 1, then at an FWER size α₂ with α₂ > α₁, we will also have δ_m(η_m(α₂)) = 1. This property will also be critical in proving a martingale property needed for the development of the FDR-controlling procedure. This issue is the content of the following proposition.

Proposition 4.5

Assume the conditions of Theorem 4.3. Then, for each m ∈ Inline graphic , the mapping $α \mapsto η_{m}^{*} (α)$ is nondecreasing and continuous.

4.4. Gaussian example for weak FWER control

For m ∈ Inline graphic , let $X_{m} ~ N (μ_{m}, σ_{m 0}^{2})$ , where the μ_m’s are unknown and $σ_{m 0}^{2}$ ’s are known. Consider the multiple hypotheses testing problem H_m₀ : μ_m = μ_m₀ and H_m₁ : μ_m = μ_m₁ with μ_m₀ < μ_m₁ for m ∈ . The MP test of size η_m for H_m₀ versus H_m₁ is $δ_{m}^{*} (X_{m}; η_{m}) \equiv δ_{m}^{*} (η_{m}) = I {X_{m} \geq μ_{m_{0}} + σ_{m 0} Φ^{- 1} (1 - η_{m})}$ , where Φ(·) and Φ⁻¹(·) are the cumulative distribution and quantile functions, respectively, of a standard normal variable. The mth effect size is γ_m = (μ_m₁ − μ_m₀)/σ_m₀, and the ROC function of the decision process $Δ_{m}^{*} = (δ_{m}^{*} (η_{m}) : η_{m} \in [0, 1])$ is ρ_m(η_m) ≡ ρ_m(η_m; γ_m) = Φ(γ_m − Φ⁻¹(1 − η_m)), clearly twice-differentiable with respect to η_m. With ϕ(·) the standard normal density function,

{(ρ_{m})}^{'} (η_{m}) = \frac{ϕ (γ_{m} - Φ^{- 1} (1 - η_{m}))}{ϕ (Φ^{- 1} (1 - η_{m}))} .

For fixed α ∈ (0, 1) and γ_m’s, consider the mappings d ↦ η_m(d), m ∈ Inline graphic , defined implicitly by the equation

\frac{ϕ (γ_{m} - Φ^{- 1} (1 - η_{m}))}{ϕ (Φ^{- 1} (1 - η_{m}))} (1 - η_{m}) - d = 0.

(4.4)

The optimal value of d, denoted by d^*, solves the equation

\sum_{m \in M} log (1 - η_{m} (d)) - log (1 - α) = 0.

(4.5)

The optimal sizes of the M MP tests are then η_m(d^*), m ∈ Inline graphic . An R [19] implementation of this numerical problem first defines v_m = 1 − Φ⁻¹(1 − η_m), so condition (4.4) amounts to solving for v_m = v_m(d) the equation

log Φ (v_{m}) + γ_{m} v_{m} - log (d) - γ_{m}^{2} / 2 = 0.

(4.6)

We utilized a Newton–Raphson iteration in solving for v_m’s in (4.6) and the uni-root routine in the R Library to solve for d in (4.5). Upon obtaining v_m(d)’s, the η_m(d)’s are computed via η_m(d) = 1 − Φ(v_m(d)).

Figure 1 demonstrates the optimal sizes when M = 2,000 and for uniformly distributed effect sizes. Observe from the second panel that when the effect size is small, which converts to low power, then the optimal size for the test is also small, but also note that when the effect size is large, which converts to high power, then the optimal test size is also small. For the tests with moderate effect sizes or power, then the optimal sizes are higher. This behavior could also be seen by looking at the third panel in the figure which shows the achieved power of the tests at the optimal sizes.

Fig. 1 — *Optimal test sizes and powers for* 2,000 *MP tests of hypotheses under normality when the effect sizes were generated from a* uniform[0.1, 10] *distribution. Panel four shows the powers for both the optimal* [*solid black*] *and the Šidák* [*dashed red*] *tests with respect to effect sizes*.

The efficiency of the optimal procedure relative to the Šidák procedure was measured via the ratio (multiplied by 100) of the average power over the M tests, defined by Inline graphic ρ_m(η_m)/M, of the optimal procedure and the average power of the Šidák procedure. The fourth panel in Figure 1 depicts the powers of the resulting tests versus the effect size for both procedures (solid blue = optimal; dashed red = Šidák). For these uniformly-generated effect sizes, the efficiency of the optimal procedure over the Šidák is 103.5%. This efficiency is affected by the vector of effect sizes. For instance, when we change the effect sizes in Figure 1 to be generated from a uniform over [0.1, 2], then the efficiency jumps to 181.7%, though it should also be pointed out that since the effect sizes are small, then the overall powers of both procedures are also small.

4.5. A size-investing strategy

In the preceding Gaussian example, as well as in other situations we examined, for example, with exponential and Bernoulli distributions, we observed the phenomenon where, among the M tests, those with low powers (small effect sizes) and those with high powers (large effect sizes) are allocated relatively small sizes in the weak FWER-controlling optimal procedure. The tests with larger sizes are those with moderate powers or effect sizes. This is a size-investing strategy in the multiple hypotheses testing problem, and it has intuitive content. With the overall goal of making more real discoveries while controlling the proportion of false discoveries for a pre-specified, usually small, overall size α, the optimal procedure dictates that not much size should be accorded those tests with either very low or very high powers. The former case will not lead to any discoveries anyway if the size that could be allocated is small, while the latter case will lead to discoveries even if the test sizes are made small. Thus, there is more to be gained by investing larger sizes on those tests that are of moderate powers, and an appropriate tweaking of their test sizes according to condition (i) in Theorem 4.3 improves the ability to achieve more real discoveries. However, this phenomenon is dependent on the magnitude of the overall size. If this overall size is made larger, more leeway ensues to the extent that it may then be more beneficial to allocate more size to those with low powers since those tests with moderate powers, when they had small sizes, may now have larger powers because of the consequent increase in their sizes. The precise and crucial determinant of where the differential sizes should be allocated are the rates of change of the ROC functions, with some size-attenuation. Interesting discussions of size and weight allocation strategies can also be found in [49], where the size allocation was related to the “α-spending” function of [22], in [14] which deals with α-investing in sequential procedures that control expected false discoveries, and in [16, 29] which discuss optimal weights for the p-values.

A tangential real-life manifestation of this strategy occurred during the 2008 American presidential election, with the total resources (financial, manpower, etc.) available to the candidates analogous to the overall size in the multiple testing problem. In the waning days of the campaign, the major candidates, then-Senator Barack Obama of the Democratic Party and Senator John McCain of the Republican Party, focused their campaign efforts, in terms of allocating their financial and manpower resources, in the “battleground states” of North Carolina, Virginia and Pennsylvania, while basically ignoring the “in-the-bag states” of South Carolina, then expected to vote for McCain, and California, then expected to vote for Obama. Also, by virtue of the deep resources of the Obama campaign, it was able to allocate more resources even in states that traditionally voted Republican, whereas the McCain campaign, with a relatively smaller war chest, had to “drop” some states (e.g., Michigan) in their campaign. The behaviors of the two camps somehow mirror the size-investing strategy with proper accounting of each campaign’s overall resources.

5. Restrictions, extensions and connections

5.1. On the restriction to

The optimization problem for weak FWER control could be construed as limited since we restricted to the subclass Inline graphic thus leading to an optimal weak FWER-controlling procedure that is still simple. In [42, 45], it was demonstrated that performance is enhanced via compound MDFs.

Examples of compound MDFs are the estimated optimal discovery procedure (ODP) in [42, 43], the FDR-controlling procedure in [1], and the oracle-based adaptive MDFs in [45].

Could we immediately start from compound MDFs in the search for an optimal weak FWER-controlling compound MDF? Let us suppose that δ = (δ_m : m ∈ Inline graphic ) is a compound MDF, so δ_m depends on (X, U) and not only on (X_m, U_m). For such an MDF, we have

R_{0} (δ, Q) = P_{Q} {\underset{m \in M_{0} (Q)}{\cup} [δ_{m} (X, U) = 1]} .

(5.1)

Now, even if the independence Condition (I) holds, (δ_m(X, U): m ∈ Inline graphic (Q)) need not be an independent collection. As such no closed-form exact expression for R₀(δ, Q) need exist. The right-hand side in (5.1) could be Bonferroni-bounded by

EFP (δ, Q) \equiv \sum_{m \in M_{0} (Q)} α_{δ_{m}} (Q),

(5.2)

called the expected number of false positives in [42]. Alternatively, if a generalized positive quadrant dependence (PQD) condition holds, with

P_{Q} {\underset{m \in M_{0} (Q)}{\cap} [δ_{m} (X, U) = 0]} \geq \prod_{m \in M_{0} (Q)} P_{Q} {δ_{m} (X, U) = 0},

then the right-hand side in (5.1) could be upper-bounded by

PQD (δ, Q) \equiv 1 - \prod_{m \in M_{0} (Q)} [1 - α_{δ_{m}} (Q)],

(5.3)

where α_{δ_m} (Q) = E_Qδ_m(X, U), the size of δ_m when m ∈ Inline graphic (Q). For this compound MDF, its MDR is R₂(δ, Q) = [1 − π_{δ_m} (Q)], where π_{delta;_m}(Q) = E_Qδ_m(X, U) is the power of δ_m when m ∈ (Q).

An optimization approach could proceed by putting an upper threshold α ∈ (0, 1) on either (5.2) or (5.3), and then finding the δ that minimizes R₂(δ, Q), or equivalently, maximizes ETP(δ, Q) ≡ Inline graphic π_{δ_m}(Q), the latter quantity referred to as the expected number of true positives in [42]. The MDFs in [38] and [42] were both obtained through this program. The MDF in [38] is

δ_{SPJ} (α) = \underset{δ \in D_{0}}{arg max} {ETP (δ, Q_{1}) : EFP (δ, Q_{0}) \leq α},

(5.4)

where Q₀ ∈ Inline graphic and Q₁ ∈ ; whereas the optimal discovery procedure (ODP) in [42] is

δ_{STO} (α; Q) = \underset{δ \in D}{arg max} {ETP (δ, Q) : EFP (δ, Q) \leq α},

(5.5)

where Q is the true probability measure of X. The use of EFP as type I error measure in [42] enabled a calculus of variations optimization to obtain the ODP. This has a particularly interesting structure when we utilize as its input the vector of p-value statistics ( $S_{m}^{*} (x_{m}, u_{m}) : m \in M$ ) from the MP MDP $Δ^{*} = (Δ_{m}^{*} : m \in M)$ with multiple decision size function $A_{Δ^{*}}^{*} = {(A_{m}^{*} (η) : η \in [0, 1]) : m \in M}$ and multiple decision ROC function $ρ_{Δ^{*}}^{*} = {ρ_{m}^{*} (η) : η \in [0, 1]) : m \in M}$ and with $A_{m}^{*} (\cdot)$ and $ρ_{m}^{*} (\cdot)$ both differentiable with derivatives ${(A_{m}^{*})}^{'} (\cdot)$ and ${(ρ_{m}^{*})}^{'} (\cdot)$ . The significance thresholding function Inline graphic : ([0, 1], σ [0, 1]) → (ℜ, σ (ℜ)) utilized in the ODP becomes

S (s; Q) = \frac{\sum_{m \in M_{1} (Q)} {(ρ_{m}^{*})}^{'} (s)}{\sum_{m \in M_{0} (Q)} {(A_{m}^{*})}^{'} (s)},

(5.6)

a consequence of Lemma 2 in [42] and Proposition 3.2. The ODP δ_STO = (δ_m,_STO: m ∈ Inline graphic ) has a single-thresholding structure with components

δ_{m, STO} (S_{m}^{*} (x_{m}, u_{m}); Q) = I {S (S_{m}^{*} (x_{m}, u_{m}); Q) \geq λ}, m \in M,

where λ ∈ [0, ∞) is chosen so the size constraint on EFP(δ_STO(α; Q), Q) is approximately satisfied. Observe that each of these components is still of simple-type, unless λ is determined in a data-dependent manner using the full data (x, u). Note also that δ_STO was derived under complete knowledge of the unknown Q, or more specifically, the sets Inline graphic (Q) and (Q), as can be seen in (5.6), hence is referred to as an oracle MDF. For the simple null versus simple alternative hypotheses case, the size functions $A_{m}^{*} (\cdot)$ ’s and the ROC functions $ρ_{m}^{*} (\cdot)$ ’s will be known, but with composite hypotheses they will be unknown. To implement δ_STO, it was proposed in [42, 43] that these unknown quantities, sets, functions, or significance thresholding function, be estimated using the data (x, u). This will make the estimated ODP of compound type. But note that through this plug-in approach the exact optimality property of the ODP need not anymore hold for the estimated version; see also [13, 45]. In contrast, δ_SPJ is determined only by the two classes of extreme probability measures, Inline graphic and , so the marginal probability measures, Q_m’s, are completely known, and not by the unknown true probability measure Q governing X. This fact was criticized in [42] as a “potentially problematic optimality” criterion. More importantly, it should be recognized that both δ_SPJ and δ_STO need not be the optimal weak or strong FWER- or FDR-controlling MDFs since the Bonferroni upper bound for R₀(δ, Q) utilized in their derivations is hardly a sharp upper bound.

The criticism leveled against δ_SPJ could also be invoked against our optimal weak FWER-controlling procedure since we also relied on a criterion determined only by the extreme classes Inline graphic and . However, note that each component of the optimal weak FWER-controlling multiple decision size vector, and consequently each component of $δ_{W}^{*} (α)$ , uses all of the Q_m₀’s and Q_m₁’s, analogously to the ODP, though the MDF $δ_{W}^{*} (α)$ is still neither adaptive nor compound. Our development of this simple MDF, which is optimal in the class Inline graphic , is a prelude to our development of adaptive and compound MDFs strongly-controlling FWER and FDR. The MDF $δ_{W}^{*} (α)$ will be the anchor for these FWER and FDR strongly-controlling compound MDFs. These new MDFs are discussed in Section 6 for strong FWER-control and in Section 7 for FDR control. Our approach to obtaining these strongly-controlling MDFs is indirect, whereas that in [42] is direct. There is also an intrinsic difference in the problems considered since our focus is on the type I error risk functions R₀ and R₁, whereas in [38, 42] the simpler type I error metric of EFP was utilized. Looking forward, though our starting point is the optimal weak FWER-controlling simple MDF $δ_{W}^{*} (α)$ , there is confidence in the viability of our indirect approach to generate good MDFs since we will establish later that both the sequential Šidák procedure and the BH procedure are special cases of our new MDFs under exchangeability.

5.2. Families with MLR property

The initial simplification to the simple null versus simple alternative hypotheses for each m ∈ Inline graphic could be perceived as a limitation because of the need to know the Q_m₁’s to determine the ROC functions. However, this approach, which was also implemented in [29, 38, 42], is natural and historically-justified by the Neyman–Pearson framework. We surmise that in this multiple decision problem, the solution to the simple null versus simple alternative hypotheses setting will play a prominent role in solving the composite hypotheses setting, since it appears that for an MDF to possess optimality, it will require knowledge, either in exact, approximate, or estimated forms, of the alternative hypotheses distributions. We touch on this aspect in the presence of the monotone likelihood ratio (MLR) property; see [24].

Suppose that for each m ∈ Inline graphic , the density function q_m belongs to a one-dimensional parametric family = {q_m(·; ξ_m) : ξ_m ∈ Γ_m ⊂ ℜ} which possesses the MLR property. A typical pair of hypotheses to be tested would be $H_{m 0}^{*} : ξ_{m} \leq ξ_{m 0}$ versus $H_{m 1}^{*} : ξ_{m} > ξ_{m 0}$ , where ξ_m₀ is known. With the MLR property, a uniformly most powerful (UMP) test function δ_m(X_m, U_m; η_m) of size η_m exists, with this UMP test identical to the MP test of size η_m for the simple null hypothesis H_m₀ : ξ_m = ξ_m₀ versus the simple alternative hypothesis H_m₁ : ξ_m = ξ_m₁, with ξ_m₁ > ξ_m₀. When dealing with the single-pair hypothesis testing problem, recall that exact knowledge of the value of ξ₁ is not necessary since the critical constants of the size-η MP test for H₀ : ξ = ξ₀ versus H₁ : ξ = ξ₁ can be made independent of ξ₁. In contrast, for the multiple decision problem, to determine the optimal size allocations for each of the M MP tests, the powers of the tests at the ξ_m₁’s are required, hence the need to know the values of the ξ_m₁’s. When M is large, such information may not be so forthcoming. The default procedure is the simplistic approach of simply assuming that the (Q_m₀, Q_m₁) is invariant in m, which is the exchangeable setting. However, this exchangeable assumption is most likely wrong as a consequence of varied effect sizes or different test functions utilized. See, for instance, [11] for real situations where exchangeability do not hold. We propose two possible solutions to this dilemma.

The first approach is to solicit from the scientific investigator the values of the ξ_m₁’s for which the powers are of most interest. Such values may coincide with those that are scientifically different from the ξ_m₀’s. Such elicitation, which may not be very feasible in practice if M is large, but which may be made possible by forming subclasses or clusters of the M genes as in [11], amounts to specifying effect sizes. Formation of such clusters must be made in close consultation with the investigator, or perhaps guided by the result of a preliminary cluster analysis using data independent of that used in the decision functions. For the specified ξ_m₁’s, the ROC functions in the determination of the optimal weak FWER-controlling multiple size vector become $ρ_{m} (η) = π_{δ_{m}^{*} (η)} (ξ_{m 1})$ for m ∈ Inline graphic , where $δ_{m}^{*} (η)$ is the simple MP test of size η for testing H_m₀ : ξ_m = ξ_m₀ versus H_m₁ : ξ_m = ξ_m₁, and $π_{δ_{m}^{*} (η)} (ξ_{m 1})$ is the power of $δ_{m}^{*} (η)$ (at ξ_m = ξ_m₁). In the clustered situation with $M = ⊎_{k = 1}^{K} M_{k}$ , we may denote by ρ̄_k(η) and ζ_k, respectively, the common ROC function and size for the decision functions in cluster Inline graphic . Under second-order differentiability of ρ̄_k(η)’s, by Theorem 4.3, the optimal weak FWER-α controlling multiple size vector ζ(α) = (ζ₁(α), ζ₂(α), …, ζ_K (α)) is the ζ = (ζ₁, ζ₂, …, ζ_K) that solves the set of equations $\forall k = 1, 2, \dots, K : {\bar{ρ}}_{k}^{'} (ζ_{k}) (1 - ζ_{k}) = λ$ for some λ ∈ ℜ₊ with $\sum_{k = 1}^{K} ∣ M_{k} ∣ log (1 - ζ_{k}) = log (1 - α)$ .

The second approach, analogous to those in [21, 30, 42, 43, 45, 49] is to estimate or approximate the underlying values of the ξ_m’s either using the observed data x, possibly via shrinkage-type estimators, or through the use of prior information which could be informed by external covariates as in [13]. Addressing this same restriction of requiring knowledge of the simple null and simple alternative hypotheses and advocating this second approach, [29], page 679, stated: “although leading to oracle procedures, it can be used in practice as soon as the null and alternative distributions are estimated or guessed reasonably accurately from independent data.” By “independent data” is meant in [29] as data different from that used in performing the actual tests. However, such external data need not always be used for estimating or imputing the unknown parameters. For example, suppose that for each m ∈ Inline graphic , data x_m could be partitioned into (v_m, w_m). We may then use ξ̃_m(v_m) = max{ξ_m₀, ξ̂_m(v_m)}, where ξ̂_m(v_m) is the maximum likelihood estimate of ξ_m based on v_m, and proceed as in the preceding paragraph with ξ_m₁ set to ξ̃_m(v_m) for each m ∈ Inline graphic , and with the component data w_m used in the test functions. The resulting MDF will be of an adaptive type, possibly also compound as in [45] if shrinkage estimators are used for estimating the ξ_m’s using the v_m components. Observe that if for some m₀ ∈ , ξ̃_m(v_m₀) and ξ_m₀0 are very close or identical, then a relatively small size will be allocated to the MP test for component m₀. This amounts to downgrading the testing problem for this component, a fact of importance since a criticism of multiple hypotheses testing, especially when using FDR, is that an unscrupulous investigator may keep adding irrelevant genes. When using the adaptive MDF arising from the optimal multiple decision size vector, this investigator’s strategy will backfire since the adaptive MDF will automatically downgrade the irrelevant genes. This second approach still requires deeper study. For instance, there is the issue of how to partition each x_m into the v_m and w_m components. Furthermore, the impact of a misspecified ξ_m₁, possibly arising from the estimation procedure, needs to be ascertained.

5.3. Connections to p-value statistics

Proposition 3.2 indicates that the ROC function η ↦ ρ_m(η) is differentiable if and only if the distribution function of the p-value statistic S_m(X_m, U_m) under H_m₁ : Q_m = Q_m₁ is differentiable. In this case, $ρ_{m}^{'} (\cdot)$ coincides with h_m(·), the density function of S_m(X_m, U_m) under H_m₁ : Q_m = Q_m₁. Condition (i) in Theorem 4.3 is equivalent to the constancy in m of h_m(η_m)(1− η_m). This is surprising since it indicates that it is not enough to simply find the sizes that maximize these h_m(·)’s, as dictated by the Neyman–Pearson lemma when dealing with a single pair of null and alternative hypotheses. Rather, in the multiple hypotheses testing scenario, there is attenuation in that larger sizes incur penalties. Condition (i) in Theorem 4.3 governs the interactions among the M tests regarding their size allocations to achieve the best overall result, in terms of overall type II error, among themselves.

The optimal weak FWER-controlling MDF can be converted to a procedure based on the p-value statistics. If $η^{*} (α) = (η_{m}^{*} (α), m \in M)$ is the optimal weak FWER-α multiple decision size vector and (S_m(x_m, u_m), m ∈ Inline graphic ) is the vector of computed p-value statistics, the decision based on data (x, u) = ((x_m, u_m), m ∈ ) is $δ^{*} (x, u) = (I {S_{m} (x_{m}, u_{m}) \leq η_{m}^{*} (α)}, m \in M)$ , an MDF based on weighted p-values. This is related to the approach in several papers using weighted p-values such as [16, 21, 29, 30, 46]. In our case, the weights are tied-in to the optimal sizes.

6. Strong FWER control

Let $Δ^{*} = (Δ_{m}^{*}, m \in M)$ be the MP MDP with $Δ_{m}^{*} = (δ_{m}^{*} (η) : η \in [0, 1])$ the MP decision process for H_m₀ : Q_m = Q_m₀ versus H_m₁ : Q_m = Q_m₁ based on (X_m, U_m). Wlog, assume that the size function A_m(·) of $Δ_{m}^{*}$ satisfies A_m(η) = η. Define η : [0, 1] → [0, 1]^M such that η(α) =(η_m(α), m ∈ Inline graphic ) is the optimal weak FWER-controlling multiple decision size vector at level α. Assume that each component of this mapping is nondecreasing and continuous, which is the case when the ROC functions of Δ^* are twice-differentiable as established in Proposition 4.5.

For a weak FWER threshold of α ∈ [0, 1], the optimal MDF in Inline graphic is $δ_{W}^{*} (α) = (δ_{m}^{*} (η_{m} (α)), m \in M)$ , as given in (4.2). Associated with this MDF is the generalized multiple decision p-value statistic W = (W_m, m ∈ ), where

W_{m} \equiv W_{m} (X_{m}, U_{m}) = inf {α \in [0, 1] : δ_{m}^{*} (η_{m} (α)) = 1} .

(6.1)

The w_m = W_m(x_m, u_m) is the smallest weak FWER size leading to rejection of H_m₀ when using $δ_{W}^{*} (α)$ given data (x, u) = ((x_m, u_m), m ∈ Inline graphic ). The usual p-value statistic S_m [see (3.2)] for $δ_{m}^{*}$ is related to W_m via

\forall m \in M : S_{m} (X_{m}, U_{m}) = η_{m} (W_{m} (X_{m}, U_{m})) .

(6.2)

Now, a lá [42, 45], suppose an Oracle knows Q, the true underlying probability measure of X. For the MDF $δ_{W}^{*} (α)$ , its FWER is

R_{0} (δ_{W}^{*} (α), Q) = 1 - \prod_{m \in M} {[1 - η_{m} (α)]}^{1 - θ_{m} (Q)} .

This is nondecreasing and continuous in α since the mappings α ↦ η_m(α) for each m ∈ Inline graphic are nondecreasing and continuous. If the Oracle desires to control this type I error rate at a value q^* ∈ [0, 1] and also minimize the MDR given by $R_{2} (δ_{W}^{*} (α), Q) = ∣ M_{1} (Q) ∣ - \sum_{m \in M_{1} (Q)} ρ_{m} (η_{m} (α))$ , where ρ_m(η_m(α)) is the power of $δ_{m}^{*} (η_{m} (α))$ , then she should choose the largest α ∈ [0, 1] such that $R_{0} (δ_{W}^{*} (α), Q) = q^{*}$ . Owing to the continuity and nondecreasing properties of $R_{0} (δ_{W}^{*} (α), Q)$ in α, the Oracle’s optimal α could also be expressed via

α^{†} (q^{*}; Q) = inf {α \in [0, 1] : \prod_{m \in M} {[1 - η_{m} (α)]}^{1 - θ_{m} (Q)} < 1 - q^{*}} .

However, there is no Oracle and Q is not known, else there is no multiple decision problem. Thus, α^†(q^*; Q) is not observable. A natural idea is to estimate the unknown θ_m(Q), the state of the mth pair of hypotheses. An intuitive and simple estimator of θ_m(Q) for a fixed value of α is

{\hat{θ}}_{m} (Q) = δ_{m}^{*} (η_{m} (α) -) \equiv δ_{m}^{*} (X_{m}, U_{m}; η_{m} (α) -) .

(6.3)

In turn, we obtain a step-down estimator α^†(q^*) ≡ α^†(X, U ; q^*) of the Oracle-based α^†(q^*; Q) given by

α^{†} (q^{*}) = inf {α \in [0, 1] : \prod_{m \in M} {[1 - η_{m} (α)]}^{1 - δ_{m}^{*} (η_{m} (α) -)} < 1 - q^{*}} .

(6.4)

This determines a compound MDF $δ_{S}^{*} (q^{*}) \equiv δ_{S}^{*} (X, U; q^{*}) \in D$ , where

δ_{S}^{*} (q^{*}) = (δ_{m}^{*} (η_{m} (α^{†} (q^{*}))), m \in M) .

(6.5)

By virtue of the optimal choice of the η_m(α)’s and the use of the MP tests, we expect $δ_{S}^{*} (q^{*})$ to possess excellent, if not optimal, MDR-properties. By taking the infimum over the weak FWER-size α coupled with the estimation of θ_m(Q) by $δ_{m}^{*} (η_{m} (α) -)$ in (6.4), there occurs an adaptive downweighting of components whose H_m₀’s are most likely correct as dictated by the data (x, u). Theorem 6.1 below establishes that $δ_{S}^{*} (q^{*})$ in (6.5) does strongly control the FWER.

Theorem 6.1

Let q^* ∈ [0, 1]. Then, ∀Q ∈ Inline graphic , $R_{0} (δ_{S}^{*} (q^{*}), Q) \leq q^{*}$ .

Next, we reexpress $δ_{S}^{*} (q^{*})$ in terms of the generalized p-value statistic W. This is achieved by defining the random variable

J^{†} (q^{*}) = max {j \in M : \prod_{m = i}^{M} [1 - η_{(m)} (W_{(i)})] \geq 1 - q^{*}, i = 1, 2, \dots, j} .

Since α^†(q^*) ∈ [W_(J^†(q^*)), W_{(J^†(q^*)+1)}), then

δ_{S}^{*} (q^{*}) = (δ_{m}^{*} (η_{m} (W_{(J^{†} (q^{*}))})), m \in M) .

The next result shows that the sequential step-down Šidák MDF, which strongly controls FWER, is a special case of $δ_{S}^{*} (q^{*})$ under exchangeability.

Proposition 6.1

If the M ROC functions are identical, then $δ_{S}^{*} (q^{*})$ coincides with the sequential Šidák step-down FWER-controlling MDF.

7. Strong FDR control

Assume the same framework as in Section 6. Our idea in obtaining an FDR-controlling MDF builds on the development of the BH MDF, specifically the rationale of Theorem 2 in [1]. Let q^* ∈ [0, 1] be the desired FDR threshold and Q be the underlying probability measure of X. We introduce two stochastic processes: T₀ = {T₀(α; Q) : α ∈ [0, 1]} and T = {T(α): α ∈ [0, 1]}, where

T_{0} (α; Q) = \sum_{m \in M_{0} (Q)} δ_{m}^{*} (η_{m} (α)) and T (α) = \sum_{m \in M} δ_{m}^{*} (η_{m} (α)) .

For the MDF $δ_{W}^{*} (α)$ , its FDR is

R_{1} (δ_{W}^{*} (α), Q) = E_{Q} {\frac{T_{0} (α; Q)}{T (α)} I {T (α) > 0}} .

By the definition of the generalized p-value statistics W_m’s in (6.1), we have for α ∈ [W₍_m₎, W₍_m₊₁₎) that T(α) = m, whereas

E_{Q} {T_{0} (α; Q)} = \sum_{m \in M} (1 - θ_{m} (Q)) η_{m} (α) \leq \sum_{m \in M} η_{m} (α) .

(7.1)

Focus now on an α ∈ [W₍_m₎, W₍_m₊₁₎). If Inline graphic η_j (W₍_m₎) ≤ mq^*, then the best α in this interval will be the largest value satisfying η_j (α) ≤ mq*, since by increasing α, the MDR decreases as argued in the development of $δ_{S}^{*} (q^{*})$ in Section 6. This motivates our definition of α^*(q^*) = α^*(X, U ; q^*) as the step-up estimator

α^{*} (q^{*}) = sup {α \in [0, 1] : \sum_{m \in M} η_{m} (α) \leq q^{*} \sum_{m \in M} δ_{m}^{*} (η_{m} (α))} .

(7.2)

This induces a compound MDF $δ_{F}^{*} (q^{*}) \equiv δ_{F}^{*} (X, U; q^{*}) \in D$ given by

δ_{F}^{*} (q^{*}) = (δ_{m}^{*} (η_{m} (α^{*} (q^{*}))), m \in M) .

(7.3)

Theorem 7.1 establishes that $δ_{F}^{*} (q^{*})$ does control the FDR at q^*. Interestingly, the proof of this theorem, which can be found in [28], employs a reverse martingale argument.

Theorem 7.1

Let q^*∈ [0, 1]. If, ∀Q ∈ Inline graphic \{Q₀} and ∀α ∈ (0, 1), | (Q)| η_m(α) ≤ η_m(α), then $R_{1} (δ_{F}^{*} (q^{*}), Q) \leq q^{*}$ ≤ q^* for ∀Q ∈ .

Some remarks are in order regarding the condition in Theorem 7.1. Clearly, the Šidák multiple decision size vector, which is the optimal multiple decision size vector when the ROC functions are identical, always satisfies this condition. When not in this exchangeable setting, this condition induces some control on the differences of the ROC functions. The next proposition establishes that the BH procedure is a special case of $δ_{F}^{*} (q^{*})$ under exchangeability.

Proposition 7.1

If the ROC functions are identical, then $δ_{F}^{*} (q^{*})$ is the FDR-q^* controlling MDF in [1].

Examination of the proof of Proposition 7.1 as presented in [28] shows that the BH MDF δ^BH(q^*) coincides with the Šidák-size based MDF δ^S(q^*). The martingale proof for Theorem 7.1 thus carries over to establishing FDR control by δ^BH(q^*). We mention that a martingale-based proof of FDR control by δ^BH(q^*) has also been presented in [44].

We also provide an alternative form of $δ_{F}^{*} (q^{*})$ in terms of the generalized p-value statistics W_m’s, a form analogous to the conventional formulation of the BH procedure. Define

J^{*} (q^{*}) \equiv J^{*} (X, U; q^{*}) = max {m \in M : \sum_{j \in M} η_{j} (W_{(m)}) \leq q^{*} m} .

(7.4)

Then, it is easy to see that $δ_{F}^{*} (q^{*})$ rejects H₍_m₎₀ for m ∈ {1, 2, …, J ^*(q^*)} and accepts H₍_m₎₀ for m ∈ {J *(q*) + 1, J ^*(q^*) + 2, …, M}.

Finally, let us examine further the generalized p-value statistics W_m’s. Focusing on W₍₁₎, under Q₀, we have that, for a ∈ (0, 1),

P_{Q_{0}} (W_{(1)} > a) = P_{Q_{0}} {\underset{m \in M}{\cap} [δ_{m}^{*} (η_{m} (a)) = 0]} = \prod_{m \in M} [1 - η_{m} (a)] = 1 - a,

the second equality obtained by using the independence of the $δ_{m}^{*}$ ’s under Q₀. Thus, W₍₁₎ is standard uniform when all null hypotheses are correct. Using this uniformity result and Lemma D.2 presented in [28] dealing with lower and upper bounds of η_• for η ∈ U B(C_α), we obtain in Proposition 7.2 presented below a lower bound for $R_{1} (δ_{F}^{*} (q^{*}), Q_{0})$ , the FDR when all the null hypotheses are correct.

Proposition 7.2

∀q^* ∈ [0, 1], $1 - {(1 - q^{*} / M)}^{M} \leq R_{1} (δ_{F}^{*} (q^{*}), Q_{0}) \leq q^{*}$ .

8. A modest simulation

We compared through computer simulations the performances of $δ_{F}^{*}$ and δ^BH in terms of FDR and MDR. The simulation model utilized is similar to the Gaussian example illustrating the optimal weak FWER-controlling procedure in Section 4.4. In this model, the observables are X_m ~ N(μ_m, 1) for each m ∈ Inline graphic , which are independent of each other. The mth pair of hypotheses is H_m₀ : μ_m ≤ 0 versus H_m₁ : μ_m > 0. The UMP size-η_m test is $δ_{m}^{*} (X_{m}; η_{m}) = I {X_{m} > Φ^{- 1} (1 - η_{m})}$ . The true values of the means μ_m’s are μ_m = ξ_mθ_m, m ∈ Inline graphic , with θ_m ~ Ber(p) and effect sizes ξ_m ~ |N(ν, 1)|, again independently generated from each other. The parameter combinations were induced by taking M ∈ {20, 50, 100}, p ∈ {0.1, 0.2, 0.4} and ν ∈ {1, 2, 4}. The FDR-threshold utilized were q^* ∈ {0.05, 0.10}. Since the computational implementation of $δ_{F}^{*}$ takes time, for each combination of (q^*, M, ν, p), we limited our simulations to 1,000 replications. The simulated FDR and MDR^* were the averages of the false discovery proportions, L₁(a, Q)’s, and the standardized missed discovery proportions, L₂(a, Q)/| Inline graphic (Q)|, over the 1,000 replications. We used this standardized MDR since, for each replicate, a Q is generated, hence | (Q)| differs over the replications. In essence, we are comparing the averages of $R_{2} (δ_{F}^{*}, Q) / ∣ M_{1} (Q) ∣$ and R₂(δ^BH, Q)/| (Q)|, where the averaging is with respect to the mechanism generating the Q’s over the simulation replications.

We only report results for q^* = 0.10 in Table 1 since results for q^* = 0.05 lead to similar conclusions. From this table, we observe that both $δ_{F}^{*}$ and δ^BH fulfill the FDR-constraint, and in a conservative manner, which is expected from theory. More importantly, the MDR-performance of $δ_{F}^{*}$ is better compared to that of δ^BH, with this dominance holding for all twenty-seven parameter combinations. Observe that as M is increased with (ν, p) remaining the same, there is an increase in their MDR^*’s; whereas, when ν is increased, which increases the effect sizes, their MDR^*’s decrease. Interestingly, the impact of a change of value in p, the proportion of true alternative hypotheses, did not necessarily translate into a monotone change in their MDR^*’s, especially when M = 20, though for the larger M-values, the change in MDR^* appears monotonically decreasing.

Table 1.

Comparison of the false discovery rate (FDR) and standardized missed discovery rate (MDR^*) performance of MDFs $δ_{F}^{*}$ and δ^BH under a variety of simulation parameters. This table is for q^* = 0.10. The FDR and MDR^* are in percentages. The number of replications is 1,000

δ_{F}^{*} - FDR

δ_{F}^{*} - MD R^{*}

δ^BH-FDR

δ^BH-MDR*

0.1

8.03

70.80

8.43

72.64

0.1

0.2

7.55

79.64

8.77

81.99

0.1

0.4

6.05

77.47

6.65

80.30

0.1

7.70

54.42

8.43

55.80

0.1

0.2

7.39

56.32

7.59

57.31

0.1

0.4

6.47

47.82

6.21

49.38

0.1

9.14

8.62

9.48

10.30

0.1

0.2

7.80

7.34

6.97

9.20

0.1

0.4

6.15

3.58

5.65

5.53

0.1

8.83

84.87

9.26

87.05

0.1

0.2

7.11

83.49

7.14

86.65

0.1

0.4

6.45

78.91

6.42

82.30

0.1

8.36

63.36

8.99

65.04

0.1

0.2

8.74

57.30

8.73

58.93

0.1

0.4

5.80

48.71

5.93

50.21

0.1

8.84

10.28

8.93

12.09

0.1

0.2

7.93

6.91

7.81

8.79

0.1

0.4

6.34

3.40

6.07

5.68

0.1

100

0.1

9.14

87.10

9.02

90.02

0.1

100

0.2

8.21

84.05

8.78

87.38

0.1

100

0.4

5.92

80.12

5.88

83.73

0.1

100

0.1

9.79

66.10

9.24

67.93

0.1

100

0.2

7.68

58.25

7.94

59.93

0.1

100

0.4

5.74

49.29

6.10

50.90

0.1

100

0.1

8.37

10.44

8.62

12.36

0.1

100

0.2

7.72

5.93

7.81

8.22

0.1

100

0.4

5.69

3.80

6.14

5.72

Open in a new tab

It may appear from this simulation study that the standardized improvement of $δ_{F}^{*}$ over δ^BH is minuscule. However, note that when translated to overall number of discoveries, when M is large, $δ_{F}^{*}$ will lead to many more discoveries than δ^BH while still maintaining desired FDR control. Such an increase in the number of discoveries may have important practical implications, such as enlarging the number of genes to be explored in consequent studies. This may translate to enhanced chances of discovering crucial and important genes without sacrificing the type I error rate.

9. Summary and concluding remarks

This paper provides some resolution on the role of the individual powers of test or decision functions, more appropriately their ROC functions, in multiple hypotheses testing problems. The importance and relevance of these problems have arisen because of the proliferation of high-dimensional “large M, small n” data sets in the natural, medical, physical, economic and social sciences. Such data sets are being created or generated due to advances in high-throughput technology, the latter fueled by speedy developments in computer technology and miniaturization.

Almost a century ago, Neyman and Pearson demonstrated the need to take into account the power function and the alternative hypothesis configuration when seeking an optimal test procedure in single-pair hypothesis testing. Their work led to a divorce from the then-existing significance or p-value approach. Currently, many multiple hypotheses testing procedures, epitomized by the Šidák procedures for weak and strong FWER control and by the Benjamini–Hochberg (BH) procedure for FDR control, are based on the p-values of the individual tests and do not consider differences in the power traits of the individual tests. They are appropriate in so-called exchangeable settings wherein power characteristics of the individual tests are identical. Such settings, however, are more the exception than the rule, since nonidentical power characteristics easily arise due to differences in the effect sizes, the dispersion parameters, or the test functions that are employed.

This paper examined whether differences in power characteristics of the individual tests could be exploited to improve on existing procedures for FWER and FDR control. Procedures were developed under the historically most fundamental scenario where the null and the alternative hypotheses are simple. First, an optimal MDF within the class of simple MDFs was shown to exist for weak FWER control. This MDF is better than the Šidák weak FWER-controlling MDF, though the latter is a special case of the optimal MDF under exchangeability. Optimality also informs us of an optimal size-investing strategy. Second, by using this optimal, though still restricted, MDF as an anchor, a compound MDF strongly controlling FWER was obtained. The sequential Šidák MDF is a special case of this MDF under exchangeability. Third, we developed a compound MDF that controls FDR. The BH procedure obtains from this MDF under exchangeability. By construction, these new MDFs have smaller MDRs relative to those that did not exploit power differences. The improvement was demonstrated through a modest simulation study by comparing the new FDR-controlling MDF and the BH MDF.

Though the proposed MDFs do improve on existing ones, we could not claim that they are optimal among all compound MDFs for strong FWER or FDR control. This question of global optimality is a difficult and elusive one. So far none of the existing compound MDFs, such as the estimated ODP in [42], could claim global optimality. In our case, the possible drawback is that in constructing the new MDFs, we started with the class of simple MDFs. The resulting MDFs are indeed compound, but establishing global optimality is not transparent. A question even arise as to whether there truly exists an optimal MDF among all compound MDFs that, say, control FDR. One thing certain about our MDFs is that they do control FWER or FDR. This is in contrast to some MDFs that are obtained from oracle MDFs via plugging-in of estimates for unknown quantities. Even though the oracle MDF, which are unimplementable, satisfies the type I error rate control, the plug-in step will usually invalidate such control. See [45] where optimality was in an asymptotic sense and with the type I error rate being the mFDR, as well as [13,29] for more discussions on these issues.

A natural layer to add in the decision-theoretic formulation of the problem is a Bayesian layer where a prior measure is specified on the unknown probability measure Q or, alternatively, on θ(Q). There is a possibility that through this Bayesian approach, one may be able to obtain a characterization of the class of optimal MDFs controlling type I error rates, or when the two types of error rates are combined, for example, via a weighted linear combination. The papers [10, 11, 26, 33] which employ Bayes or empirical Bayes approaches are highly relevant on this front.

Finally, we mention that there are still other aspects of the multiple decision problem not dealt with in this paper. First is the extension to situations with composite null and alternative hypotheses. We indicated some ideas in Section 5.2 for distributional models possessing the MLR property, but further and more extensive studies are needed. Second are possible dependencies among the components in (X_m, m ∈ Inline graphic (Q)). We have assumed that this is an independent collection, but it is certainly of theoretical and applied relevance to examine dependent settings. Potential results in such scenarios will extend those in [2, 31, 32]. In these composite hypotheses and dependent data settings, we expect that resampling-based ideas and approaches, such as those in [47, 48], will be central.

Supplementary Material

Online Supplement

NIHMS585692-supplement-Online_Supplement.pdf^{(160.6KB, pdf)}

Acknowledgments

The first author is grateful to Dr. James Berger for facilitating his sabbatical leave visit at the Statistical and Applied Mathematical Sciences Institute (SAMSI) during Fall 2008 as this afforded him quality time for generating ideas relevant to this project. As such this work was partially supported by the National Science Foundation (NSF) under Grant DMS-0635449 to SAMSI. However, any opinions, findings and conclusions or recommendations expressed in this paper are those of the authors and do not necessarily reflect the views of the National Science Foundation. He is also grateful to Prof. Odd Aalen and Prof. Bo Lindqvist for facilitating his visits to the University of Oslo and the Norwegian University of Science and Technology (NTNU) which led to critical ideas for this project. The authors are highly grateful to the two reviewers, Associate Editor and the Editors for their comments, suggestions and criticisms. Special thanks to Prof. Sanat Sarkar and Prof. Lan Wang for a careful reading of an earlier version of the manuscript, and thank the following for comments or for pointing out references: Prof. J. Lynch, Dr. A. McLain, Prof. G. Rempala, Prof. J. Sethuraman, Prof. G. Taraldsen, Prof. A. Vidyashankar, Prof. L. Wasserman and Prof. P. Westfall. We also thank Dr. M. Peña for discussions about microarrays.

Footnotes

SUPPLEMENTARY MATERIAL

Supplement to “Power-Enhanced Multiple Decision Functions Controlling Family-Wise Error and False Discovery Rates” (DOI: 10.1214/10-AOS844SUPP;.pdf). The proofs of lemmas, propositions, theorems and corollaries are provided in this supplemental article [28].

References

1.Benjamini Y, Hochberg Y. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J Roy Statist Soc Ser B. 1995;57:289–300. [Google Scholar]
2.Benjamini Y, Yekutieli D. The control of the false discovery rate in multiple testing under dependency. Ann Statist. 2001;29:1165–1188. [Google Scholar]
3.Bonferroni C. Teoria statistica delle classi e calcolo delle probabilita. Publ R Instit Super Sci Econ Commere Firenze. 1936;8:1–62. [Google Scholar]
4.Cox DR, Hinkley DV. Theoretical Statistics. Chapman and Hall; London: 1974. [Google Scholar]
5.Dudoit S, Gilbert HN, van der Laan M. Technical report. Univ. California; Berkeley: 2007. Resampling-based empirical Bayes multiple testing procedures for controlling generalized tail probability and expected value error rates: Focus on the false discovery rate and simulation study. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Dudoit S, Shaffer JP, Boldrick JC. Multiple hypothesis testing in microarray experiments. Statist Sci. 2003;18:71–103. [Google Scholar]
7.Dudoit S, van der Laan MJ. Multiple Testing Procedures With Applications to Genomics. Springer; New York: 2008. [Google Scholar]
8.Efron B. Large-scale simultaneous hypothesis testing: The choice of a null hypothesis. J Amer Statist Assoc. 2004;99:96–104. [Google Scholar]
9.Efron B. Size, power and false discovery rates. Ann Statist. 2007;35:1351–1377. [Google Scholar]
10.Efron B. Microarrays, empirical Bayes and the two-groups model. Statist Sci. 2008;23:1–22. [Google Scholar]
11.Efron B. Simultaneous inference: When should hypothesis testing problems be combined? Ann Appl Statist. 2008;2:197–223. [Google Scholar]
12.Efron B, Tibshirani R, Storey JD, Tusher V. Empirical Bayes analysis of a microarray experiment. J Amer Statist Assoc. 2001;96:1151–1160. [Google Scholar]
13.Ferkingstad E, Frigessi A, Rue H, Thorleifsson G, Kong A. Unsupervised empirical Bayesian multiple testing with external covariates. Ann Appl Statist. 2008;2:714–735. [Google Scholar]
14.Foster DP, Stine RA. α-investing: A procedure for sequential control of expected false discoveries. J R Stat Soc Ser B Stat Methodol. 2008;70:429–444. [Google Scholar]
15.Genovese C, Wasserman L. Operating characteristic and extensions of the false discovery rate procedure. J R Stat Soc Ser B Stat Methodol. 2002;64:499–517. [Google Scholar]
16.Genovese CR, Roeder K, Wasserman L. False discovery control with p-value weighting. Biometrika. 2006;93:509–524. [Google Scholar]
17.Guindani M, Muller P, Zhang S. A Bayesian discovery procedure. J Roy Statist Soc Ser B. 2009;71:905–925. doi: 10.1111/j.1467-9868.2009.00714.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Habiger J, Peña EA. Randomized P-values and nonparametric procedures in multiple testing. J Nonparametr Stat. 2010:1–22. doi: 10.1080/10485252.2010.482154. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Ihaka R, Gentleman R. R: A language for data analysis and graphics. J Comput Graph Statist. 1996;5:299–314. [Google Scholar]
20.Jin J, Cai TT. Estimating the null and the proportional of nonnull effects in large-scale multiple comparisons. J Amer Statist Assoc. 2007;102:495–506. [Google Scholar]
21.Kang G, Ye K, Liu N, Allison D, Gao G. Weighted multiple hypothesis testing procedures. Stat Appl Genet Mol Biol. 2009;8:1–21. doi: 10.2202/1544-6115.1437. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Lan KKG, DeMets DL. Discrete sequential boundaries for clinical trials. Biometrika. 1983;70:659–663. [Google Scholar]
23.Langaas M, Lindqvist BH, Ferkingstad E. Estimating the proportion of true null hypotheses, with application to DNA microarray data. J R Stat Soc Ser B Stat Methodol. 2005;67:555–572. [Google Scholar]
24.Lehmann EL. Testing Statistical Hypotheses. 2. Springer; New York: 1997. [Google Scholar]
25.Lehmann EL, Romano JP, Shaffer JP. On optimality of stepdown and stepup multiple test procedures. Ann Statist. 2005;33:1084–1108. [Google Scholar]
26.Müller P, Parmigiani G, Robert C, Rousseau J. Optimal sample size for multiple testing: The case of gene expression microarrays. J Amer Statist Assoc. 2004;99:990–1001. [Google Scholar]
27.Neyman J, Pearson E. On the problem of the most efficient tests of statistical hypotheses. Philos Trans R Soc Ser A. 1933;231:289–337. [Google Scholar]
28.Peña E, Habiger J, Wu W. Supplement to “Power-enhanced multiple decision functions controlling family-wise error and false discovery rates”. 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Roquain E, van de Wiel MA. Optimal weighting for false discovery rate control. Electron J Stat. 2009;3:678–711. [Google Scholar]
30.Rubin D, Dudoit S, van der Laan M. A method to increase the power of multiple testing procedures through sample splitting. Stat Appl Genet Mol Biol. 2006;5:Art 19, 20. doi: 10.2202/1544-6115.1148. (electronic) [DOI] [PubMed] [Google Scholar]
31.Sarkar SK. Some probability inequalities for ordered MTP2 random variables: A proof of the Simes conjecture. Ann Statist. 1998;26:494–504. [Google Scholar]
32.Sarkar SK. Generalizing Simes’ test and Hochberg’s stepup procedure. Ann Statist. 2008;36:337–363. [Google Scholar]
33.Sarkar SK, Zhou T, Ghosh D. A general decision theoretic formulation of procedures controlling FDR and FNR from a Bayesian perspective. Statist Sinica. 2008;18:925–945. [Google Scholar]
34.Schweder T, Spjøtvoll E. Plots of P-values to evaluate many tests simultaneously. Biometrika. 1982;69:493–502. [Google Scholar]
35.Scott J, Berger J. An exploration of aspects of Bayesian multiple testing. J Statist Plann Inference. 2006;136:2144–2162. [Google Scholar]
36.Šidák Z. Rectangular confidence regions for the means of multivariate normal distributions. J Amer Statist Assoc. 1967;62:626–633. [Google Scholar]
37.Sorić B. Statistical “discoveries” and effect-size estimation. J Amer Statist Assoc. 1989:84608–610. [Google Scholar]
38.Spjøtvoll E. On the optimality of some multiple comparison procedures. Ann Math Statist. 1972;43:398–411. [Google Scholar]
39.Stevenson RL. The Strange Case of Dr Jekyll and Mr Hyde. 1. Longmans, Green and Co; London: 1886. [Google Scholar]
40.Storey J. A direct approach to false discovery rates. J R Stat Soc Ser B Stat Methodol. 2002;64:479–498. [Google Scholar]
41.Storey J. The positive false discovery rate: A Bayesian interpretation and the q-value. Ann Statist. 2003;31:2012–2035. [Google Scholar]
42.Storey J. The optimal discovery procedure: A new approach to simultaneous significance testing. J R Stat Soc Ser B Stat Methodol. 2007;69:347–368. [Google Scholar]
43.Storey J, Dai J, Leek J. The optimal discovery procedure for large-scale significance testing, with applications to comparative microarray experiments. Biostatistics. 2007;8:414–432. doi: 10.1093/biostatistics/kxl019. [DOI] [PubMed] [Google Scholar]
44.Storey JD, Taylor JE, Siegmund D. Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: A unified approach. J R Stat Soc Ser B Stat Methodol. 2004;66:187–205. [Google Scholar]
45.Sun W, Cai T. Oracle and adaptive compound decision rules for false discovery rate control. J Amer Statist Assoc. 2007;102:901–912. [Google Scholar]
46.Wasserman L, Roeder K. Technical report. Carnegie-Mellon Univ; 2006. Weighted hypothesis testing. Available at http://arxiv.org/abs/math.ST/0604172. [Google Scholar]
47.Westfall P, Troendle J. Multiple testing with minimal assumptions. Biom J. 2008;50:1–11. doi: 10.1002/bimj.200710456. [DOI] [PMC free article] [PubMed] [Google Scholar]
48.Westfall P, Young S. Resampling-Based Multiple Testing: Examples and Methods for P-Value Adjustment. Wiley; New York: 1993. [Google Scholar]
49.Westfall PH, Krishen A, Young SS. Using prior information to allocate significance levels for multiple endpoints. Stat Med. 1998;17:2107–2119. doi: 10.1002/(sici)1097-0258(19980930)17:18<2107::aid-sim910>3.0.co;2-w. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Online Supplement

NIHMS585692-supplement-Online_Supplement.pdf^{(160.6KB, pdf)}

[R1] 1.Benjamini Y, Hochberg Y. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J Roy Statist Soc Ser B. 1995;57:289–300. [Google Scholar]

[R2] 2.Benjamini Y, Yekutieli D. The control of the false discovery rate in multiple testing under dependency. Ann Statist. 2001;29:1165–1188. [Google Scholar]

[R3] 3.Bonferroni C. Teoria statistica delle classi e calcolo delle probabilita. Publ R Instit Super Sci Econ Commere Firenze. 1936;8:1–62. [Google Scholar]

[R4] 4.Cox DR, Hinkley DV. Theoretical Statistics. Chapman and Hall; London: 1974. [Google Scholar]

[R5] 5.Dudoit S, Gilbert HN, van der Laan M. Technical report. Univ. California; Berkeley: 2007. Resampling-based empirical Bayes multiple testing procedures for controlling generalized tail probability and expected value error rates: Focus on the false discovery rate and simulation study. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] 6.Dudoit S, Shaffer JP, Boldrick JC. Multiple hypothesis testing in microarray experiments. Statist Sci. 2003;18:71–103. [Google Scholar]

[R7] 7.Dudoit S, van der Laan MJ. Multiple Testing Procedures With Applications to Genomics. Springer; New York: 2008. [Google Scholar]

[R8] 8.Efron B. Large-scale simultaneous hypothesis testing: The choice of a null hypothesis. J Amer Statist Assoc. 2004;99:96–104. [Google Scholar]

[R9] 9.Efron B. Size, power and false discovery rates. Ann Statist. 2007;35:1351–1377. [Google Scholar]

[R10] 10.Efron B. Microarrays, empirical Bayes and the two-groups model. Statist Sci. 2008;23:1–22. [Google Scholar]

[R11] 11.Efron B. Simultaneous inference: When should hypothesis testing problems be combined? Ann Appl Statist. 2008;2:197–223. [Google Scholar]

[R12] 12.Efron B, Tibshirani R, Storey JD, Tusher V. Empirical Bayes analysis of a microarray experiment. J Amer Statist Assoc. 2001;96:1151–1160. [Google Scholar]

[R13] 13.Ferkingstad E, Frigessi A, Rue H, Thorleifsson G, Kong A. Unsupervised empirical Bayesian multiple testing with external covariates. Ann Appl Statist. 2008;2:714–735. [Google Scholar]

[R14] 14.Foster DP, Stine RA. α-investing: A procedure for sequential control of expected false discoveries. J R Stat Soc Ser B Stat Methodol. 2008;70:429–444. [Google Scholar]

[R15] 15.Genovese C, Wasserman L. Operating characteristic and extensions of the false discovery rate procedure. J R Stat Soc Ser B Stat Methodol. 2002;64:499–517. [Google Scholar]

[R16] 16.Genovese CR, Roeder K, Wasserman L. False discovery control with p-value weighting. Biometrika. 2006;93:509–524. [Google Scholar]

[R17] 17.Guindani M, Muller P, Zhang S. A Bayesian discovery procedure. J Roy Statist Soc Ser B. 2009;71:905–925. doi: 10.1111/j.1467-9868.2009.00714.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] 18.Habiger J, Peña EA. Randomized P-values and nonparametric procedures in multiple testing. J Nonparametr Stat. 2010:1–22. doi: 10.1080/10485252.2010.482154. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R19] 19.Ihaka R, Gentleman R. R: A language for data analysis and graphics. J Comput Graph Statist. 1996;5:299–314. [Google Scholar]

[R20] 20.Jin J, Cai TT. Estimating the null and the proportional of nonnull effects in large-scale multiple comparisons. J Amer Statist Assoc. 2007;102:495–506. [Google Scholar]

[R21] 21.Kang G, Ye K, Liu N, Allison D, Gao G. Weighted multiple hypothesis testing procedures. Stat Appl Genet Mol Biol. 2009;8:1–21. doi: 10.2202/1544-6115.1437. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] 22.Lan KKG, DeMets DL. Discrete sequential boundaries for clinical trials. Biometrika. 1983;70:659–663. [Google Scholar]

[R23] 23.Langaas M, Lindqvist BH, Ferkingstad E. Estimating the proportion of true null hypotheses, with application to DNA microarray data. J R Stat Soc Ser B Stat Methodol. 2005;67:555–572. [Google Scholar]

[R24] 24.Lehmann EL. Testing Statistical Hypotheses. 2. Springer; New York: 1997. [Google Scholar]

[R25] 25.Lehmann EL, Romano JP, Shaffer JP. On optimality of stepdown and stepup multiple test procedures. Ann Statist. 2005;33:1084–1108. [Google Scholar]

[R26] 26.Müller P, Parmigiani G, Robert C, Rousseau J. Optimal sample size for multiple testing: The case of gene expression microarrays. J Amer Statist Assoc. 2004;99:990–1001. [Google Scholar]

[R27] 27.Neyman J, Pearson E. On the problem of the most efficient tests of statistical hypotheses. Philos Trans R Soc Ser A. 1933;231:289–337. [Google Scholar]

[R28] 28.Peña E, Habiger J, Wu W. Supplement to “Power-enhanced multiple decision functions controlling family-wise error and false discovery rates”. 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R29] 29.Roquain E, van de Wiel MA. Optimal weighting for false discovery rate control. Electron J Stat. 2009;3:678–711. [Google Scholar]

[R30] 30.Rubin D, Dudoit S, van der Laan M. A method to increase the power of multiple testing procedures through sample splitting. Stat Appl Genet Mol Biol. 2006;5:Art 19, 20. doi: 10.2202/1544-6115.1148. (electronic) [DOI] [PubMed] [Google Scholar]

[R31] 31.Sarkar SK. Some probability inequalities for ordered MTP2 random variables: A proof of the Simes conjecture. Ann Statist. 1998;26:494–504. [Google Scholar]

[R32] 32.Sarkar SK. Generalizing Simes’ test and Hochberg’s stepup procedure. Ann Statist. 2008;36:337–363. [Google Scholar]

[R33] 33.Sarkar SK, Zhou T, Ghosh D. A general decision theoretic formulation of procedures controlling FDR and FNR from a Bayesian perspective. Statist Sinica. 2008;18:925–945. [Google Scholar]

[R34] 34.Schweder T, Spjøtvoll E. Plots of P-values to evaluate many tests simultaneously. Biometrika. 1982;69:493–502. [Google Scholar]

[R35] 35.Scott J, Berger J. An exploration of aspects of Bayesian multiple testing. J Statist Plann Inference. 2006;136:2144–2162. [Google Scholar]

[R36] 36.Šidák Z. Rectangular confidence regions for the means of multivariate normal distributions. J Amer Statist Assoc. 1967;62:626–633. [Google Scholar]

[R37] 37.Sorić B. Statistical “discoveries” and effect-size estimation. J Amer Statist Assoc. 1989:84608–610. [Google Scholar]

[R38] 38.Spjøtvoll E. On the optimality of some multiple comparison procedures. Ann Math Statist. 1972;43:398–411. [Google Scholar]

[R39] 39.Stevenson RL. The Strange Case of Dr Jekyll and Mr Hyde. 1. Longmans, Green and Co; London: 1886. [Google Scholar]

[R40] 40.Storey J. A direct approach to false discovery rates. J R Stat Soc Ser B Stat Methodol. 2002;64:479–498. [Google Scholar]

[R41] 41.Storey J. The positive false discovery rate: A Bayesian interpretation and the q-value. Ann Statist. 2003;31:2012–2035. [Google Scholar]

[R42] 42.Storey J. The optimal discovery procedure: A new approach to simultaneous significance testing. J R Stat Soc Ser B Stat Methodol. 2007;69:347–368. [Google Scholar]

[R43] 43.Storey J, Dai J, Leek J. The optimal discovery procedure for large-scale significance testing, with applications to comparative microarray experiments. Biostatistics. 2007;8:414–432. doi: 10.1093/biostatistics/kxl019. [DOI] [PubMed] [Google Scholar]

[R44] 44.Storey JD, Taylor JE, Siegmund D. Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: A unified approach. J R Stat Soc Ser B Stat Methodol. 2004;66:187–205. [Google Scholar]

[R45] 45.Sun W, Cai T. Oracle and adaptive compound decision rules for false discovery rate control. J Amer Statist Assoc. 2007;102:901–912. [Google Scholar]

[R46] 46.Wasserman L, Roeder K. Technical report. Carnegie-Mellon Univ; 2006. Weighted hypothesis testing. Available at http://arxiv.org/abs/math.ST/0604172. [Google Scholar]

[R47] 47.Westfall P, Troendle J. Multiple testing with minimal assumptions. Biom J. 2008;50:1–11. doi: 10.1002/bimj.200710456. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R48] 48.Westfall P, Young S. Resampling-Based Multiple Testing: Examples and Methods for P-Value Adjustment. Wiley; New York: 1993. [Google Scholar]

[R49] 49.Westfall PH, Krishen A, Young SS. Using prior information to allocate significance levels for multiple endpoints. Stat Med. 1998;17:2107–2119. doi: 10.1002/(sici)1097-0258(19980930)17:18<2107::aid-sim910>3.0.co;2-w. [DOI] [PubMed] [Google Scholar]

PERMALINK

POWER-ENHANCED MULTIPLE DECISION FUNCTIONS CONTROLLING FAMILY-WISE ERROR AND FALSE DISCOVERY RATES

Edsel A Peña

Joshua D Habiger

Wensong Wu

Abstract

1. Introduction and motivation

2. Mathematical setting

Condition (I)

3. Revisiting MP tests and p-value statistics

3.1. Decision processes, ROC functions, p-value statistics

Definition 3.1

Proposition 3.1

Definition 3.2

Proposition 3.2

4. Optimal weak FWER control

Definition 4.1

4.1. Optimization problem

4.2. Existence and uniqueness of optimal size vector

Proposition 4.1

Proposition 4.2

Proposition 4.3

Proposition 4.4

Theorem 4.1 (Existence)

Theorem 4.2 (Uniqueness)

4.3. Finding optimal size vector

Theorem 4.3

Proposition 4.5

4.4. Gaussian example for weak FWER control

Fig. 1.

4.5. A size-investing strategy

5. Restrictions, extensions and connections

5.1. On the restriction to

5.2. Families with MLR property

5.3. Connections to p-value statistics

6. Strong FWER control

Theorem 6.1

Proposition 6.1

7. Strong FDR control

Theorem 7.1

Proposition 7.1

Proposition 7.2

8. A modest simulation

Table 1.

9. Summary and concluding remarks

Supplementary Material

Acknowledgments

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases