Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2014 Jul 10.
Published in final edited form as: Ann Stat. 2011 Feb;39(1):556–583. doi: 10.1214/10-aos844

POWER-ENHANCED MULTIPLE DECISION FUNCTIONS CONTROLLING FAMILY-WISE ERROR AND FALSE DISCOVERY RATES

Edsel A Peña 1,1, Joshua D Habiger 1, Wensong Wu 1
PMCID: PMC4091923  NIHMSID: NIHMS585692  PMID: 25018568

Abstract

Improved procedures, in terms of smaller missed discovery rates (MDR), for performing multiple hypotheses testing with weak and strong control of the family-wise error rate (FWER) or the false discovery rate (FDR) are developed and studied. The improvement over existing procedures such as the Šidák procedure for FWER control and the Benjamini–Hochberg (BH) procedure for FDR control is achieved by exploiting possible differences in the powers of the individual tests. Results signal the need to take into account the powers of the individual tests and to have multiple hypotheses decision functions which are not limited to simply using the individual p-values, as is the case, for example, with the Šidák, Bonferroni, or BH procedures. They also enhance understanding of the role of the powers of individual tests, or more precisely the receiver operating characteristic (ROC) functions of decision processes, in the search for better multiple hypotheses testing procedures. A decision-theoretic framework is utilized, and through auxiliary randomizers the procedures could be used with discrete or mixed-type data or with rank-based nonparametric tests. This is in contrast to existing p-value based procedures whose theoretical validity is contingent on each of these p-value statistics being stochastically equal to or greater than a standard uniform variable under the null hypothesis. Proposed procedures are relevant in the analysis of high-dimensional “large M, small n” data sets arising in the natural, physical, medical, economic and social sciences, whose generation and creation is accelerated by advances in high-throughput technology, notably, but not limited to, microarray technology.

Key words and phrases: Benjamini–Hochberg procedure, Bonferroni procedure, decision process, false discovery rate (FDR), family wise error rate (FWER), Lagrangian optimization, Neyman–Pearson most powerful test, microarray analysis, reverse martingale, missed discovery rate (MDR), multiple decision function and process, multiple hypotheses testing, optional sampling theorem, power function, randomized p-values, generalized multiple decision p-values, ROC function, Šidák procedure

1. Introduction and motivation

The advent of modern technology, epitomized by the microarray, has led to the generation of very high-dimensional data pertaining to characteristics of a large number, M, of attributes, hereon called genes, associated with usually a small number, n, of units or subjects. Several such data sets are, for example, described in [10], and these are the inputs to so-called parallel inference problems. The most common form of inference is multiple hypotheses testing, wherein for the mth gene there are two competing hypotheses, a null hypothesis Hm0 and an alternative hypothesis Hm1, for which a decision is to be made based on the data. In such multiple decision-making, there is a need to be cognizant and cautious of the Hyde-ian nature of multiplicity, while also exploiting the Jekyll-ian potentials of multiplicity [39]. Furthermore, this entails a tenuous balance between two competing desires: controlling the rate of rejection of correct null hypotheses, while at the same time maintaining the rate of discovery of correct alternative hypotheses.

As in single-pair hypothesis testing, a type I error occurs when a correct null hypothesis is rejected, while a type II error occurs when a false null hypothesis is not rejected. Several type I errors have been proposed in multiple testing; see [6] and [7]. Our focus is on the weak family wise error rate (FWER), the probability of rejecting at least one null hypothesis when all the nulls are correct; strong FWER, the probability of rejecting at least one correct null hypothesis; and false discovery rate (FDR), the expected proportion of the number of false rejections of nulls relative to the number of rejections [1, 37]. Our type II error rate is the missed discovery rate (MDR), the expected number of false nonrejections of null hypotheses. Other type II errors have been discussed in [57, 9, 41]. The usual framework in developing multiple decision functions is to bound the chosen type I error rate, and then minimize or make small the MDR. For example, a procedure controlling weak FWER, under an independence assumption, is that of Šidák [36]; while a conservative one not requiring independence is the Bonferroni procedure [3]. For FDR control, the most common procedure is the BH procedure [1]. Control of type I error measures related to the FDR have also been discussed in [810, 12, 15, 40, 41, 45], while [20, 23, 34] focused on estimation of the proportion of correct null hypotheses.

Procedures like the Šidák, Bonferroni and BH, rely on the set of p-values of individual tests. Their validity hinges on each p-value statistic being stochastically equal to or greater than a standard uniform variable under the null hypothesis. This fails, however, with noncontinuous variables or when rank-based nonparametric tests are used. Crucially, p-value based procedures also do not exploit the power characteristics of the individual tests, contrary to Neyman and Pearson’s [27] adage that such considerations are germane in constructing optimal tests. Such p-value based procedures are fine in exchangeable settings where power characteristics of the individual tests are identical, but not in situations where genes or subclasses of genes have different structures; see [11, 13, 29].

Some papers dealing with procedures exploiting the power functions are [38, 49]. The use of weighted p-values to improve type II performance have also been explored in [16, 21, 29, 30, 46]. Other approaches for optimal procedures are those in [42, 43] which employ a Neyman–Pearson approach and [45] where oracle and adaptive compound rules were obtained. Compound rules are characterized by information borrowing from each of the genes, so a decision function for a specific gene utilizes information from other genes. Decision-theoretic and Bayesian approaches were also implemented in [10, 17, 26, 33, 35]. More recently, [11] argues for separate subclass analysis, while [13] proposed use of external covariates, with the procedures having a Bayes and empirical Bayes flavor.

The main goal of this paper is to develop better multiple testing procedures controlling weak FWER, strong FWER and FDR by taking into account the individual powers of the tests. We focus on the most fundamental setting where the null and alternative hypotheses for each gene are both simple. This is also the setting in [29]. This admits, as starting point, the Neyman–Pearson most powerful (MP) test for each pair of hypotheses. Each MP test will have a power, but we will see that it is beneficial to look at each of these powers as function of their MP test’s size, their so-called receiver operating characteristic (ROC) function.

The paper proceeds as follows. Section 2 presents the decision-theoretic elements. Section 3 reviews and reexamines MP tests, p-value statistics and ROC functions. Section 4 develops the optimal weak FWER-controlling procedure, with existence and uniqueness established in Section 4.2. Section 4.3 analytically describes the procedure for differentiable ROC functions. Section 4.4 provides a concrete example using normal distributions, while Section 4.5 discusses a size-investing strategy for optimality. Section 5 discusses limitations, extensions and connections: Section 5.1 deals with the restriction to the class of simple procedures; Section 5.2 deals with extensions to the composite hypotheses setting in the presence of the monotone likelihood ratio (MLR) property; and Section 5.3 relates the optimal procedure to weighted p-value based procedures. Section 6 develops an improved procedure which strongly controls the FWER, whereas Section 7 develops an improved procedure which controls FDR. The development of these new procedures is anchored on the weak FWER-controlling optimal procedure. We establish that the sequential Šidák and BH procedures are special cases of these more general procedures. Section 8 provides a modest simulation study demonstrating that the new FDR-controlling procedure improves on the BH procedure. Section 9 contains a summary and some concluding remarks.

To manage the length of the paper and provide more focus on the main ideas and results, technical proofs of lemmas, propositions, theorems and corollaries are all gathered in the supplemental article [28].

2. Mathematical setting

Let (Ω, Inline graphic, P) be a probability space and Inline graphic = {1, 2, …, M} an index set with M a known positive integer. For each mInline graphic, let Xm: (Ω, Inline graphic) → ( Inline graphic, Inline graphic), Inline graphic some space with σ-field of subsets Inline graphic. Form the product space ( Inline graphic, Inline graphic) with Inline graphic = Inline graphic Inline graphic and Inline graphic = σ( Inline graphic Inline graphic) so X = (X1, X2, …, XM): (Ω, Inline graphic) → ( Inline graphic, Inline graphic). The probability measure of X is Q = PX−1, while the (marginal) probability measure of Xm is Qm=PXm-1. For each mInline graphic, let Qm0 and Qm1 be two known probability measures on ( Inline graphic, Inline graphic). We assume that QInline graphic, a class of probability measures on ( Inline graphic, Inline graphic) with marginal probability measure Qm ∈ {Qm0, Qm1} for each mInline graphic. Let θ = (θ1, …, θM): Inline graphic → Θ ≡ {0, 1}M with θm(Q) = I {Qm = Qm1}, I {·} denoting indicator function. De-fine, for each QInline graphic, the subcollections Inline graphicInline graphic(Q) = {mInline graphic: θm(Q) = 0} and Inline graphicInline graphic(Q) = {mInline graphic: θm(Q) = 1}. In this paper, we shall impose an independence condition given by:

Condition (I)

(Xm, mInline graphic(Q)) is an independent collection of random entities, that is, ∀BmInline graphic, Q( Inline graphic Bm) = Inline graphic Qm(Bm).

However, the collection (Xm, mInline graphic (Q)) need not be an independent collection, but it is independent of (Xm, mInline graphic (Q)). Two extreme subcollections of Inline graphic are Inline graphic = {QInline graphic: θm(Q) = 0, ∀mInline graphic} and Inline graphic = {QInline graphic: θm(Q) = 1, ∀mInline graphic}. By Condition (I), Inline graphic is a singleton set, Q0 will denote its element; while Inline graphic need not be a singleton set. The decision problem is to determine Inline graphic(Q) and Inline graphic(Q) based on X, which is equivalent to simultaneously testing the M pairs of hypotheses Hm0: Qm = Qm0 versus Hm1: Qm = Qm1 for mInline graphic.

We adopt a decision-theoretic framework similar to [33]. The action space is Inline graphic = {0, 1}M with generic element a = (a1, a2, …, aM)tInline graphic with am = 0(1) meaning Hm0 is accepted (rejected). The parameter space is Inline graphic, though the effective parameter space is Θ = {0, 1}M with generic element θ = (θ1, θ2, …, θM)t. We introduce several loss functions, L: Inline graphic × Inline graphic → ℜ+, defined via

L0(a,Q)=I{at(1-θ(Q))1}; (2.1)
L1(a,Q)=[at(1-θ(Q))at1]I{at1>0}; (2.2)
L2(a,Q)=(1-a)tθ(Q), (2.3)

with the convention that 0/0 = 0 and 1 is an M × 1 vector of 1’s. The loss function L0(a, Q) equals 1 if and only if at least one false discovery is committed. The loss L1(a, Q) is the false discovery proportion, being the ratio between the number of false discoveries and the number of discoveries; whereas the loss L2(a, Q) is the number of missed discoveries being the number of true alternative hypotheses that were not discovered. We focus on this missed discovery number since the relevant question is how many correct alternatives [θ(Q)t 1] were missed by using the action a? See also [29] which essentially uses this loss function to induce their power metric. Other types of losses, such as the false negative proportion with (a, Q) ↦ [(1 − a)t θ(Q)]/[(1 − a)t 1]I {(1 − a)t 1 > 0}, have also been considered; see [15, 33].

A nonrandomized multiple decision function (MDF) is a δ: ( Inline graphic, Inline graphic) → ( Inline graphic, σ ( Inline graphic)), where σ ( Inline graphic) is the power set of Inline graphic. Such an MDF may be represented by δ(x) = (δ1(x), δ2(x), …, δM (x))t, where δm(x) ∈ {0, 1}. In general, each δm could be made to depend on the full data x instead of just xm. We denote by Inline graphic the class of all nonrandomized MDFs. A randomized MDF may also be considered. Denote by Inline graphic( Inline graphic) the space of all probability measures over ( Inline graphic, σ ( Inline graphic)). A randomized MDF is a δ*: ( Inline graphic, Inline graphic) → ( Inline graphic( Inline graphic), σ ( Inline graphic ( Inline graphic))). For a realization X = x, an action is chosen from Inline graphic according to the probability measure δ*(x). Denote by Inline graphic the space of all randomized MDFs. Clearly, Inline graphicInline graphic. By augmenting data X with a randomizer U ~ U (0, 1) which is independent of X, randomized MDFs could be made nonrandomized with respect to the augmented data (X, U). Henceforth, Inline graphic represents all nonrandomized MDFs δ(X, U)’s based on (X, U).

For brevity of notation, PQ{f (X, U) ∈ B} and EQ{f (X, U)} represent probability and expectation with respect to (X, U) with X ~ Q, U ~ U (0, 1) and X and U independent. For δInline graphic and the loss functions defined earlier, we have the risk functions

R0(δ,Q)=EQ{L0(δ(X,U),Q)}; (2.4)
R1(δ,Q)=EQ{L1(δ(X,U),Q)}; (2.5)
R2(δ,Q)=EQ{L2(δ(X,U),Q)}. (2.6)

Given a δ = (δ1, δ2, …, δM)t, let πδ (Q) = (πδ1 (Q), πδ2 (Q), …, πδM (Q))t with πδm (Q) = EQ{δm(X, U)} be its vector of power functions. Then (2.6) becomes R2(δ, Q) = (1 − πδ (Q))t θ(Q). In terms of these risk functions, for δInline graphic, its weak FWER is FWER(δ) = R0(δ, Q0). If each δm depends only on Xm and U, by Condition (I),

FWER(δ)=1-E{mM[1-PQm0{δm(Xm,U)=1U}]}, (2.7)

where the expectation is with respect to U. When Q = Q0 and with the mth component δm of the randomized MDF depending only on Xm, an alternative formulation is to have U = (U1, U2, …, UM) a vector of i.i.d. U (0, 1) variables which is independent of the Xm’s. The mth component may then be re-defined via δm(Xm,Um)=I{Umδm(Xm)}. Then (2.7) becomes FWER(δ) = 1 − Inline graphic [1 − PQm0 {δm(Xm, Um) = 1}].

The risk function R1(δ, Q) is the false discovery rate (FDR) of δ at Q [1]; while the risk function R2(δ, Q) will be called the missed discovery rate (MDR) of δ at Q. The adjective “rate” is somewhat misleading since R2(δ, Q) takes values in [0, | Inline graphic (Q)|] instead of [0, 1]; however, this does not cause difficulty since, given the true underlying probability measure Q of X, | Inline graphic (Q)| is constant. This risk is related to the expected number of true positives (ETP), an error measure used in [38, 42], via ETP(δ, Q) = | Inline graphic (Q)| − R2(δ, Q).

To find an optimal MDF weakly controlling FWER in a subclass Inline graphicInline graphic, a threshold α ∈ (0, 1) is specified and then we seek a δ*Inline graphic with R0(δ*, Q0) = FWER(δ*) ≤ α, and such that for any δInline graphic satisfying R0(δ, Q0) = FWER(δ) ≤ α, we have Inline graphicR2(δ*, Q) ≤ Inline graphicR2(δ, Q). This criterion has a minimax flavor. One may require only that R2(δ*, Q*) ≤ R2(δ, Q*) where Q* is the true, but unknown, probability law of X; but this may be too strong to preclude a solution to the optimization problem. However, see [42] for a situation with a different type I error and where an optimal, albeit an oracle, solution for minimizing R2(δ, Q*) is possible. Observe that for δInline graphic, by using the representation of R2(δ, Q) in terms of the powers, Inline graphicR2(δ, Q) = Inline graphic R2(δ, Q) = MInline graphic Inline graphic πδm (Q). The optimality condition on the MDR amounts therefore to maximizing Inline graphic πδm (Qm1). Interestingly, if we had standardized the loss function L2(a, Q) to take values in [0, 1] via division by | Inline graphic(Q)| = θ(Q)t 1, the minimax justification does not carry through!

For strong FWER control, we seek a compound MDF, δ*Inline graphic, with R0(δ*, Q*) ≤ α whatever the true, but unknown, probability law Q* of X is, and with mMπδm(Qm1) large, possibly maximal, among all δInline graphic satisfying R0(δ, Q*) ≤ α. For (strong) FDR-control, a threshold q* ∈ (0, 1) is specified and we seek a compound MDF, δ*Inline graphic, such that, whatever Q* is, R1(δ*, Q*) ≤ q*, and with mMπδm(Qm1) large, possibly maximal, among all δInline graphic satisfying R1(δ, Q*) ≤ q*. For discussion of weak and strong control, refer to [6, 7]. Discussion of optimality in multiple testing can be found in [25] where maximin optimality results are established for some step-down and step-up MTPs.

3. Revisiting MP tests and p-value statistics

An MDF δInline graphic whose mth component δm depends only on (Xm, Um) for every mInline graphic is called simple; otherwise, it is compound. The subclass of simple MDFs, denoted by Inline graphic, will be our initial focus in searching for an optimal weak FWER-controlling MDF. The resulting optimal MDF will then anchor our search for strong FWER- and FDR-controlling compound MDFs. Before implementing this program, we introduce the unifying concept of decision processes.

3.1. Decision processes, ROC functions, p-value statistics

First, a brief review. Let X: (Ω, Inline graphic) → ( Inline graphic, Inline graphic) and Q = PX−1. Based on X, consider testing the pair of hypotheses H0: Q = Q0 versus H1: Q = Q1, where Q0 and Q1 are two probability measures on ( Inline graphic, Inline graphic). Let q0 and q1 be versions of the densities of Q0 and Q1 with respect to some fixed dominating measure ν, for example, ν = Q0 + Q1. Recall that a test or decision function is a δ: ( Inline graphic, Inline graphic) → ([0, 1], σ [0, 1]), with σ [0, 1] the Borel sigma-field on [0, 1]. Given X = x, δ (x) is the probability of deciding in favor of H1. Its size is αδ = EQ0δ(X); it is of level α ∈ [0, 1] if αδα. Its power is πδ = EQ1 δ(X). δ* is most powerful (MP) of level α if αδ*α and for all δ with αδα, we have πδ*πδ.

Definition 3.1

A collection Δ = {δη: η ∈ [0, 1]} of test functions such that, a.e. [Q], δ0(x) = 0, δ1(x) = 1 and ηδη (x) is nondecreasing and right-continuous, is a decision process. Its size function is AΔ: [0, 1] → [0, 1] and its power function is ρΔ: [0, 1] → [0, 1], where AΔ (η) = αδη= EQ0 δη(X) and ρΔ(η) = πδη = EQ1 δη(X). Its receiver operating characteristic (ROC) curve is ROC(Δ) ≡ Graph{(AΔ (η), ρΔ (η)): η ∈ [0, 1]}. If AΔ (η) = η for all η ∈ [0, 1], ηρΔ (η) is the ROC function of Δ.

The use of the phrase power function in Definition 3.1 is atypical since we are not viewing this as a function of a parameter as is the usual meaning of this phrase. However, for lack of a better name, we shall adopt this terminology. In the sequel, δη and δ(η) will be used interchangeably to also represent δ(·; η).

Let L: ( Inline graphic, Inline graphic) → (ℜ+, σ (ℜ+)) be a version of the likelihood ratio function: L(x) = q1(x)/q0(x) a.e. [ν]. Let G0(·) and G1(·) be the distribution functions of L(X) when Inline graphic(X) = Q0 and Inline graphic(X) = Q1, where Inline graphic(X) is probability measure of X. For a monotone nondecreasing right-continuous function M(·) from ℜ into ℜ, let M−1(r) = inf{x ∈ ℜ: M(x) ≥ r} and ΔM(r) = M(r) − M(r−). By the Neyman–Pearson fundamental lemma [27], the MP test function of level η for testing H0 versus H1 is

δ(X;η)δη=I{L(X)>c(η)}+γ(η)I{L(X)=c(η)}, (3.1)

where c(η)=G0-1(1-η) and γ (η) = (G0(c(η)) − (1 − η))/ΔG0(c(η)). Let U ~ U (0, 1) be independent of X. Redefine δ* via δηδ(X,U;η)=I{Uδ(X;η)}, which is nonrandomized w.r.t. (X, U). In essence, with the aid of an auxiliary randomizer U, the MP test could always be made nonrandomized. The decision process formed from these MP tests, given by

Δ={δη:η[0,1]}={δη:η[0,1]}, (3.2)

is called the most powerful (MP) decision process. The power (at Q = Q1) of the MP test δη or δη is

ρΔ(η)πδη=πδη=1-G1(c(η))+γ(η)ΔG1(c(η)). (3.3)

It is well known [24] that πδη<1 implies αδη=η. We denote by AΔ* and ρΔ* the size and power functions of Δ*. If πδη<1 for all η < 1, then ηρΔ* (η) is the ROC function of Δ*. We present below some important properties of this function.

Before stating the proposition, we reiterate that all formal proofs of propositions, theorems, lemmas and corollaries are in the supplemental article [28].

Proposition 3.1

The function ρΔ*: [0, 1] → [0, 1] in (3.3) is concave, continuous and nondecreasing. Furthermore, ρΔ* (η) ≥ η and it is strictly increasing on the set Inline graphic ≡ {η ∈ [0, 1]: ρΔ* (η) < 1}.

Definition 3.2

Let Δ = {δη: η ∈ [0, 1]} be a decision process, where δη: ( Inline graphic × [0, 1], Inline graphicσ [0, 1]) → ({0, 1}, σ{0, 1}). Its (randomized) p-value statistic is SΔ: ( Inline graphic × [0, 1], Inline graphicσ[0, 1]) → ([0, 1], σ [0, 1]) with SΔ (x, u) = inf{η ∈ [0, 1]: δη(x, u) = 1}.

When ∀(η, x, u) : δη(x, u) = δη(x), then SΔ(X, U) is the usual p-value statistic. See also [4] for a more specialized definition of a randomized p-value statistic. We refer the reader to [18] for properties of this p-value statistic and its use in existing FDR-controlling procedures.

Proposition 3.2

Let Δ = {δη : η ∈ [0, 1]} be a decision process with p-value statistic SΔ. Then, for all s ∈ [0, 1], H0(s) ≡ PQ0 (SΔs) = AΔ(s) and H1(s) ≡ PQ1 (SΔs) = πδ(s) = ρΔ(s). Consequently, SΔ ~ U[0, 1] under Inline graphic(X) = Q0 if and only if ∀η ∈ [0, 1] : AΔ(η) = η.

4. Optimal weak FWER control

Return now to the multiple decision problem in Section 2. We extend the notion of decision processes to the multiple decision setting.

Definition 4.1

A collection Δ = (Δm : mInline graphic), where Δm = (δm(η) : η ∈ [0, 1]) is a decision process on ( Inline graphic × [0, 1]M, Inline graphicσ[0, 1]M), is a multiple decision process (MDP). It is simple if each Δm is simple; otherwise, it is compound. When simple its multiple decision size function is AΔ = (AΔm : mInline graphic) and its multiple decision ROC function is ρΔ = (ρΔm : mInline graphic), where AΔm and ρΔm are the size and ROC functions of Δm.

4.1. Optimization problem

Let Δ be a simple MDP. Then, a multiple decision size vector η = (ηm : mInline graphic) ∈ Inline graphic ≡ [0, 1]M determines from Δ an MDF δΔ(η) = (δm(ηm) : mInline graphic) ∈ Inline graphic. For this MDF, FWER(δΔ(η)) = 1 − Inline graphic[1 − AΔm (ηm)] and R2(δΔ(η), Q1) = MInline graphic ρΔm (ηm) for Q1Inline graphic. Fix an FWER-threshold α ∈ (0, 1). Suppose there exists a multiple decision size vector ηΔ(α)N such that

ηΔ(α)=argmaxηN{mMρΔm(ηm):mM[1-AΔm(ηm)]1-α}.

Then, AΔ(ηΔ(α))=(AΔm(ηΔ,m(α)):mM) is the optimal multiple decision size vector for weak FWER control at α associated with the simple MDP Δ. The associated optimal simple MDF is δΔ(AΔ(ηΔ(α))).

But, since Hm0 and Hm1 are both simple, then there exists a simple most powerful MDP, Δ=(Δm:mM), where Δm=(δm(η):η[0,1]) with δm(η) being the simple Neyman–Pearson MP test function of size η for Hm0 versus Hm1. Consider the simple MDF obtained from Δ* given by δm(AΔm(ηΔ,m(α))):mM. This will satisfy the FWER constraint, and by virtue of the MP property of each δm(AΔm(ηΔ,m(α))) for each mInline graphic,

mMρΔm(AΔm(ηΔ,m(α)))mMρΔm(AΔm(ηΔ,m(α))).

Thus, in searching for the optimal weak FWER-controlling simple MDF, it suffices to restrict to the simple most powerful MDP Δ*. Without loss of generality (wlog), we may assume AΔm(η)=η for mInline graphic and η ∈ [0, 1]. The optimization problem reduces to finding ηΔ(α)N satisfying

ηΔ(α)=argmaxηN{mMρΔm(ηm):mM(1-ηm)1-α}. (4.1)

The optimal weak FWER-controlling simple MDF is then

δW(α)(δm(ηΔ,m(α)):mM). (4.2)

Two well-known and conventional choices for the size vector η = (ηm : mInline graphic) which satisfy the weak FWER constraint are the Šidák sizes ηm = ηm(α) = 1 − (1 − α)1/M and the Bonferroni-adjusted sizes ηm = ηm(α) = α/M. The former requires the independence Condition (I) and is sharp, the latter is conservative but does not require Condition (I). Both ignore possible differences in power traits of the individual test functions.

4.2. Existence and uniqueness of optimal size vector

We establish the existence of an optimal multiple decision size vector for weak FWER control within the class Inline graphic. As pointed out in Section 4.1, it suffices to look for the optimal weak FWER-controlling simple MDF by starting with the most powerful simple MDP Δ=(Δm:mM). For brevity, ρmρΔm and Am(η)AΔm(η)=η. Recall that Inline graphic = [0, 1]M, the multiple decision size space. In a nutshell, the existence of an optimal multiple decision size vector for weak FWER control exploits convexity properties of relevant subsets of Inline graphic. This is formalized by establishing a sequence of propositions which are presented below. For α ∈ [0, 1], define the weak FWER constraint set

Cα={{ηN:mMlog(1-ηm)log(1-α)},ifα<1,N,ifα=1. (4.3)

Proposition 4.1

Cα satisfies (i) η = 0Cα; (ii) (0, αm) ∈ Cα for all mInline graphic, where (0, αm) is the zero-vector with the mth element replaced by α; and (iii) it is convex and closed.

Proposition 4.2

For η0Inline graphic let U(η0) = {ηInline graphic : ηmη0m, ∀mInline graphic}, the upper set of η0, and let U B(Cα) = {ηInline graphic : CαU(η) = {η}}, the upper boundary set of Cα. Then, for all α ∈ [0, 1), U B(Cα) = {ηInline graphic : Inline graphic log(1 − ηm) = log(1 − α)}.

Proposition 4.3

Let Inline graphic ≡ {ηInline graphic : Inline graphic ρm(ηm) ≥ Mb} for b ∈ [0, 1]. Then { Inline graphic : b ∈ [0, 1]} satisfies (i) η = 1Inline graphic, (ii) it is closed and convex, and (iii) Inline graphic = Inline graphicInline graphicInline graphic for 0 ≤ b1b2 ≤ 1.

Proposition 4.4

Let Bα = {b ∈ [0, 1] : Inline graphicCα ≠ Ø} for α ∈ [0, 1) and let bα=supBα. Then Bα=[0,bα].

Building on these intermediate results, the existence of an optimal weak FWER-controlling multiple decision size vector is obtained.

Theorem 4.1 (Existence)

Let α ∈ [0, 1). Then CαNbα. Furthermore, ηInline graphic is a weak FWER-α optimal multiple decision size vector if and only if ηCαNbα.

Theorem 4.1 guarantees existence of an optimal weak FWER multiple decision size vector, but it does not address whether the solution is unique. We present a result on this issue in the following theorem.

Theorem 4.2 (Uniqueness)

Let α ∈ [0, 1) and define Cα(m) = {ηm ∈ [0, 1] : ηCα}, called the mth section of Cα. If, for all mInline graphic, the mapping ηmρm(ηm) is strictly increasing on Cα(m), then the optimal weak FWER-α multiple decision size vector is unique and it is the η* satisfying CαNbα={η}.

It is easy to see that a sufficient condition for uniqueness of the optimal size vector is that, for all mInline graphic, ηm ∈ [0, sup Cα(ηm)) ⇒ ρm(ηm) < 1. Nonuniqueness may occur with nonregular families of densities, for example, uniform or shifted exponential, where the power of the MP test may equal one even though its size is still less than one. It occurs if the decision processes in the MDP do not satisfy the condition that ∀η ∈ [0, 1], ∀mInline graphic, Am(η) = η, which is the case with discrete data or when using nonparametric rank-based test functions with randomization not permitted.

4.3. Finding optimal size vector

Generally, without differentiability of the ROC functions as in the case with discrete distributions, linear or nonlinear programming methods are needed to obtain the optimal solution. In the case, however, where the ROC functions are twice-differentiable, the optimal size vector is in a more explicit form.

Theorem 4.3

Let Δ=(Δm,mM) be the MP MDP, and assume that the ROC functions ηmρm(ηm) are strictly increasing and twice-differentiable with first and second derivatives ρm and ρm, respectively. Given α ∈ (0, 1), the optimal weak FWER-α multiple decision size vector ηηΔ(α)=(ηm(α),mM) is the ηInline graphic satisfying (i) for some λ ∈ ℜ+, ∀mInline graphic, ρm(ηm)(1-ηm)=λ and (ii) Inline graphic log(1 − ηm) = log(1 − α).

A question arises as to whether the optimal sizes are monotonic in α. Such a property is desirable since it will imply that if at FWER size α1 we have δm(ηm(α1)) = 1, then at an FWER size α2 with α2 > α1, we will also have δm(ηm(α2)) = 1. This property will also be critical in proving a martingale property needed for the development of the FDR-controlling procedure. This issue is the content of the following proposition.

Proposition 4.5

Assume the conditions of Theorem 4.3. Then, for each mInline graphic, the mapping αηm(α) is nondecreasing and continuous.

4.4. Gaussian example for weak FWER control

For mInline graphic, let Xm~N(μm,σm02), where the μm’s are unknown and σm02’s are known. Consider the multiple hypotheses testing problem Hm0 : μm = μm0 and Hm1 : μm = μm1 with μm0 < μm1 for mInline graphic. The MP test of size ηm for Hm0 versus Hm1 is δm(Xm;ηm)δm(ηm)=I{Xmμm0+σm0Φ-1(1-ηm)}, where Φ(·) and Φ−1(·) are the cumulative distribution and quantile functions, respectively, of a standard normal variable. The mth effect size is γm = (μm1μm0)/σm0, and the ROC function of the decision process Δm=(δm(ηm):ηm[0,1]) is ρm(ηm) ≡ ρm(ηm; γm) = Φ(γm − Φ−1(1 − ηm)), clearly twice-differentiable with respect to ηm. With ϕ(·) the standard normal density function,

(ρm)(ηm)=ϕ(γm-Φ-1(1-ηm))ϕ(Φ-1(1-ηm)).

For fixed α ∈ (0, 1) and γm’s, consider the mappings dηm(d), mInline graphic, defined implicitly by the equation

ϕ(γm-Φ-1(1-ηm))ϕ(Φ-1(1-ηm))(1-ηm)-d=0. (4.4)

The optimal value of d, denoted by d*, solves the equation

mMlog(1-ηm(d))-log(1-α)=0. (4.5)

The optimal sizes of the M MP tests are then ηm(d*), mInline graphic. An R [19] implementation of this numerical problem first defines vm = 1 − Φ−1(1 − ηm), so condition (4.4) amounts to solving for vm = vm(d) the equation

logΦ(vm)+γmvm-log(d)-γm2/2=0. (4.6)

We utilized a Newton–Raphson iteration in solving for vm’s in (4.6) and the uni-root routine in the R Library to solve for d in (4.5). Upon obtaining vm(d)’s, the ηm(d)’s are computed via ηm(d) = 1 − Φ(vm(d)).

Figure 1 demonstrates the optimal sizes when M = 2,000 and for uniformly distributed effect sizes. Observe from the second panel that when the effect size is small, which converts to low power, then the optimal size for the test is also small, but also note that when the effect size is large, which converts to high power, then the optimal test size is also small. For the tests with moderate effect sizes or power, then the optimal sizes are higher. This behavior could also be seen by looking at the third panel in the figure which shows the achieved power of the tests at the optimal sizes.

Fig. 1.

Fig. 1

Optimal test sizes and powers for 2,000 MP tests of hypotheses under normality when the effect sizes were generated from a uniform[0.1, 10] distribution. Panel four shows the powers for both the optimal [solid black] and the Šidák [dashed red] tests with respect to effect sizes.

The efficiency of the optimal procedure relative to the Šidák procedure was measured via the ratio (multiplied by 100) of the average power over the M tests, defined by Inline graphic ρm(ηm)/M, of the optimal procedure and the average power of the Šidák procedure. The fourth panel in Figure 1 depicts the powers of the resulting tests versus the effect size for both procedures (solid blue = optimal; dashed red = Šidák). For these uniformly-generated effect sizes, the efficiency of the optimal procedure over the Šidák is 103.5%. This efficiency is affected by the vector of effect sizes. For instance, when we change the effect sizes in Figure 1 to be generated from a uniform over [0.1, 2], then the efficiency jumps to 181.7%, though it should also be pointed out that since the effect sizes are small, then the overall powers of both procedures are also small.

4.5. A size-investing strategy

In the preceding Gaussian example, as well as in other situations we examined, for example, with exponential and Bernoulli distributions, we observed the phenomenon where, among the M tests, those with low powers (small effect sizes) and those with high powers (large effect sizes) are allocated relatively small sizes in the weak FWER-controlling optimal procedure. The tests with larger sizes are those with moderate powers or effect sizes. This is a size-investing strategy in the multiple hypotheses testing problem, and it has intuitive content. With the overall goal of making more real discoveries while controlling the proportion of false discoveries for a pre-specified, usually small, overall size α, the optimal procedure dictates that not much size should be accorded those tests with either very low or very high powers. The former case will not lead to any discoveries anyway if the size that could be allocated is small, while the latter case will lead to discoveries even if the test sizes are made small. Thus, there is more to be gained by investing larger sizes on those tests that are of moderate powers, and an appropriate tweaking of their test sizes according to condition (i) in Theorem 4.3 improves the ability to achieve more real discoveries. However, this phenomenon is dependent on the magnitude of the overall size. If this overall size is made larger, more leeway ensues to the extent that it may then be more beneficial to allocate more size to those with low powers since those tests with moderate powers, when they had small sizes, may now have larger powers because of the consequent increase in their sizes. The precise and crucial determinant of where the differential sizes should be allocated are the rates of change of the ROC functions, with some size-attenuation. Interesting discussions of size and weight allocation strategies can also be found in [49], where the size allocation was related to the “α-spending” function of [22], in [14] which deals with α-investing in sequential procedures that control expected false discoveries, and in [16, 29] which discuss optimal weights for the p-values.

A tangential real-life manifestation of this strategy occurred during the 2008 American presidential election, with the total resources (financial, manpower, etc.) available to the candidates analogous to the overall size in the multiple testing problem. In the waning days of the campaign, the major candidates, then-Senator Barack Obama of the Democratic Party and Senator John McCain of the Republican Party, focused their campaign efforts, in terms of allocating their financial and manpower resources, in the “battleground states” of North Carolina, Virginia and Pennsylvania, while basically ignoring the “in-the-bag states” of South Carolina, then expected to vote for McCain, and California, then expected to vote for Obama. Also, by virtue of the deep resources of the Obama campaign, it was able to allocate more resources even in states that traditionally voted Republican, whereas the McCain campaign, with a relatively smaller war chest, had to “drop” some states (e.g., Michigan) in their campaign. The behaviors of the two camps somehow mirror the size-investing strategy with proper accounting of each campaign’s overall resources.

5. Restrictions, extensions and connections

5.1. On the restriction to Inline graphic

The optimization problem for weak FWER control could be construed as limited since we restricted to the subclass Inline graphic thus leading to an optimal weak FWER-controlling procedure that is still simple. In [42, 45], it was demonstrated that performance is enhanced via compound MDFs.

Examples of compound MDFs are the estimated optimal discovery procedure (ODP) in [42, 43], the FDR-controlling procedure in [1], and the oracle-based adaptive MDFs in [45].

Could we immediately start from compound MDFs in the search for an optimal weak FWER-controlling compound MDF? Let us suppose that δ = (δm : mInline graphic) is a compound MDF, so δm depends on (X, U) and not only on (Xm, Um). For such an MDF, we have

R0(δ,Q)=PQ{mM0(Q)[δm(X,U)=1]}. (5.1)

Now, even if the independence Condition (I) holds, (δm(X, U): mInline graphic(Q)) need not be an independent collection. As such no closed-form exact expression for R0(δ, Q) need exist. The right-hand side in (5.1) could be Bonferroni-bounded by

EFP(δ,Q)mM0(Q)αδm(Q), (5.2)

called the expected number of false positives in [42]. Alternatively, if a generalized positive quadrant dependence (PQD) condition holds, with

PQ{mM0(Q)[δm(X,U)=0]}mM0(Q)PQ{δm(X,U)=0},

then the right-hand side in (5.1) could be upper-bounded by

PQD(δ,Q)1-mM0(Q)[1-αδm(Q)], (5.3)

where αδm (Q) = EQδm(X, U), the size of δm when mInline graphic(Q). For this compound MDF, its MDR is R2(δ, Q) = Inline graphic [1 − πδm (Q)], where πdelta;m(Q) = EQδm(X, U) is the power of δm when mInline graphic(Q).

An optimization approach could proceed by putting an upper threshold α ∈ (0, 1) on either (5.2) or (5.3), and then finding the δ that minimizes R2(δ, Q), or equivalently, maximizes ETP(δ, Q) ≡ Inline graphic πδm(Q), the latter quantity referred to as the expected number of true positives in [42]. The MDFs in [38] and [42] were both obtained through this program. The MDF in [38] is

δSPJ(α)=argmaxδD0{ETP(δ,Q1):EFP(δ,Q0)α}, (5.4)

where Q0Inline graphic and Q1Inline graphic; whereas the optimal discovery procedure (ODP) in [42] is

δSTO(α;Q)=argmaxδD{ETP(δ,Q):EFP(δ,Q)α}, (5.5)

where Q is the true probability measure of X. The use of EFP as type I error measure in [42] enabled a calculus of variations optimization to obtain the ODP. This has a particularly interesting structure when we utilize as its input the vector of p-value statistics ( Sm(xm,um):mM) from the MP MDP Δ=(Δm:mM) with multiple decision size function AΔ={(Am(η):η[0,1]):mM} and multiple decision ROC function ρΔ={ρm(η):η[0,1]):mM} and with Am(·) and ρm(·) both differentiable with derivatives (Am)(·) and (ρm)(·). The significance thresholding function Inline graphic: ([0, 1], σ [0, 1]) → (ℜ, σ (ℜ)) utilized in the ODP becomes

S(s;Q)=mM1(Q)(ρm)(s)mM0(Q)(Am)(s), (5.6)

a consequence of Lemma 2 in [42] and Proposition 3.2. The ODP δSTO = (δm,STO: mInline graphic) has a single-thresholding structure with components

δm,STO(Sm(xm,um);Q)=I{S(Sm(xm,um);Q)λ},mM,

where λ ∈ [0, ∞) is chosen so the size constraint on EFP(δSTO(α; Q), Q) is approximately satisfied. Observe that each of these components is still of simple-type, unless λ is determined in a data-dependent manner using the full data (x, u). Note also that δSTO was derived under complete knowledge of the unknown Q, or more specifically, the sets Inline graphic(Q) and Inline graphic(Q), as can be seen in (5.6), hence is referred to as an oracle MDF. For the simple null versus simple alternative hypotheses case, the size functions Am(·)’s and the ROC functions ρm(·)’s will be known, but with composite hypotheses they will be unknown. To implement δSTO, it was proposed in [42, 43] that these unknown quantities, sets, functions, or significance thresholding function, be estimated using the data (x, u). This will make the estimated ODP of compound type. But note that through this plug-in approach the exact optimality property of the ODP need not anymore hold for the estimated version; see also [13, 45]. In contrast, δSPJ is determined only by the two classes of extreme probability measures, Inline graphic and Inline graphic, so the marginal probability measures, Qm’s, are completely known, and not by the unknown true probability measure Q governing X. This fact was criticized in [42] as a “potentially problematic optimality” criterion. More importantly, it should be recognized that both δSPJ and δSTO need not be the optimal weak or strong FWER- or FDR-controlling MDFs since the Bonferroni upper bound for R0(δ, Q) utilized in their derivations is hardly a sharp upper bound.

The criticism leveled against δSPJ could also be invoked against our optimal weak FWER-controlling procedure since we also relied on a criterion determined only by the extreme classes Inline graphic and Inline graphic. However, note that each component of the optimal weak FWER-controlling multiple decision size vector, and consequently each component of δW(α), uses all of the Qm0’s and Qm1’s, analogously to the ODP, though the MDF δW(α) is still neither adaptive nor compound. Our development of this simple MDF, which is optimal in the class Inline graphic, is a prelude to our development of adaptive and compound MDFs strongly-controlling FWER and FDR. The MDF δW(α) will be the anchor for these FWER and FDR strongly-controlling compound MDFs. These new MDFs are discussed in Section 6 for strong FWER-control and in Section 7 for FDR control. Our approach to obtaining these strongly-controlling MDFs is indirect, whereas that in [42] is direct. There is also an intrinsic difference in the problems considered since our focus is on the type I error risk functions R0 and R1, whereas in [38, 42] the simpler type I error metric of EFP was utilized. Looking forward, though our starting point is the optimal weak FWER-controlling simple MDF δW(α), there is confidence in the viability of our indirect approach to generate good MDFs since we will establish later that both the sequential Šidák procedure and the BH procedure are special cases of our new MDFs under exchangeability.

5.2. Families with MLR property

The initial simplification to the simple null versus simple alternative hypotheses for each mInline graphic could be perceived as a limitation because of the need to know the Qm1’s to determine the ROC functions. However, this approach, which was also implemented in [29, 38, 42], is natural and historically-justified by the Neyman–Pearson framework. We surmise that in this multiple decision problem, the solution to the simple null versus simple alternative hypotheses setting will play a prominent role in solving the composite hypotheses setting, since it appears that for an MDF to possess optimality, it will require knowledge, either in exact, approximate, or estimated forms, of the alternative hypotheses distributions. We touch on this aspect in the presence of the monotone likelihood ratio (MLR) property; see [24].

Suppose that for each mInline graphic, the density function qm belongs to a one-dimensional parametric family Inline graphic = {qm(·; ξm) : ξm ∈ Γm ⊂ ℜ} which possesses the MLR property. A typical pair of hypotheses to be tested would be Hm0:ξmξm0 versus Hm1:ξm>ξm0, where ξm0 is known. With the MLR property, a uniformly most powerful (UMP) test function δm(Xm, Um; ηm) of size ηm exists, with this UMP test identical to the MP test of size ηm for the simple null hypothesis Hm0 : ξm = ξm0 versus the simple alternative hypothesis Hm1 : ξm = ξm1, with ξm1 > ξm0. When dealing with the single-pair hypothesis testing problem, recall that exact knowledge of the value of ξ1 is not necessary since the critical constants of the size-η MP test for H0 : ξ = ξ0 versus H1 : ξ = ξ1 can be made independent of ξ1. In contrast, for the multiple decision problem, to determine the optimal size allocations for each of the M MP tests, the powers of the tests at the ξm1’s are required, hence the need to know the values of the ξm1’s. When M is large, such information may not be so forthcoming. The default procedure is the simplistic approach of simply assuming that the (Qm0, Qm1) is invariant in m, which is the exchangeable setting. However, this exchangeable assumption is most likely wrong as a consequence of varied effect sizes or different test functions utilized. See, for instance, [11] for real situations where exchangeability do not hold. We propose two possible solutions to this dilemma.

The first approach is to solicit from the scientific investigator the values of the ξm1’s for which the powers are of most interest. Such values may coincide with those that are scientifically different from the ξm0’s. Such elicitation, which may not be very feasible in practice if M is large, but which may be made possible by forming subclasses or clusters of the M genes as in [11], amounts to specifying effect sizes. Formation of such clusters must be made in close consultation with the investigator, or perhaps guided by the result of a preliminary cluster analysis using data independent of that used in the decision functions. For the specified ξm1’s, the ROC functions in the determination of the optimal weak FWER-controlling multiple size vector become ρm(η)=πδm(η)(ξm1) for mInline graphic, where δm(η) is the simple MP test of size η for testing Hm0 : ξm = ξm0 versus Hm1 : ξm = ξm1, and πδm(η)(ξm1) is the power of δm(η) (at ξm = ξm1). In the clustered situation with M=k=1KMk, we may denote by ρ̄k(η) and ζk, respectively, the common ROC function and size for the decision functions in cluster Inline graphic. Under second-order differentiability of ρ̄k(η)’s, by Theorem 4.3, the optimal weak FWER-α controlling multiple size vector ζ(α) = (ζ1(α), ζ2(α),, ζK (α)) is the ζ = (ζ1, ζ2,, ζK) that solves the set of equations k=1,2,,K:ρ¯k(ζk)(1-ζk)=λ for some λ ∈ ℜ+ with k=1KMklog(1-ζk)=log(1-α).

The second approach, analogous to those in [21, 30, 42, 43, 45, 49] is to estimate or approximate the underlying values of the ξm’s either using the observed data x, possibly via shrinkage-type estimators, or through the use of prior information which could be informed by external covariates as in [13]. Addressing this same restriction of requiring knowledge of the simple null and simple alternative hypotheses and advocating this second approach, [29], page 679, stated: “although leading to oracle procedures, it can be used in practice as soon as the null and alternative distributions are estimated or guessed reasonably accurately from independent data.” By “independent data” is meant in [29] as data different from that used in performing the actual tests. However, such external data need not always be used for estimating or imputing the unknown parameters. For example, suppose that for each mInline graphic, data xm could be partitioned into (vm, wm). We may then use ξ̃m(vm) = max{ξm0, ξ̂m(vm)}, where ξ̂m(vm) is the maximum likelihood estimate of ξm based on vm, and proceed as in the preceding paragraph with ξm1 set to ξ̃m(vm) for each mInline graphic, and with the component data wm used in the test functions. The resulting MDF will be of an adaptive type, possibly also compound as in [45] if shrinkage estimators are used for estimating the ξm’s using the vm components. Observe that if for some m0Inline graphic, ξ̃m(vm0) and ξm00 are very close or identical, then a relatively small size will be allocated to the MP test for component m0. This amounts to downgrading the testing problem for this component, a fact of importance since a criticism of multiple hypotheses testing, especially when using FDR, is that an unscrupulous investigator may keep adding irrelevant genes. When using the adaptive MDF arising from the optimal multiple decision size vector, this investigator’s strategy will backfire since the adaptive MDF will automatically downgrade the irrelevant genes. This second approach still requires deeper study. For instance, there is the issue of how to partition each xm into the vm and wm components. Furthermore, the impact of a misspecified ξm1, possibly arising from the estimation procedure, needs to be ascertained.

5.3. Connections to p-value statistics

Proposition 3.2 indicates that the ROC function ηρm(η) is differentiable if and only if the distribution function of the p-value statistic Sm(Xm, Um) under Hm1 : Qm = Qm1 is differentiable. In this case, ρm(·) coincides with hm(·), the density function of Sm(Xm, Um) under Hm1 : Qm = Qm1. Condition (i) in Theorem 4.3 is equivalent to the constancy in m of hm(ηm)(1− ηm). This is surprising since it indicates that it is not enough to simply find the sizes that maximize these hm(·)’s, as dictated by the Neyman–Pearson lemma when dealing with a single pair of null and alternative hypotheses. Rather, in the multiple hypotheses testing scenario, there is attenuation in that larger sizes incur penalties. Condition (i) in Theorem 4.3 governs the interactions among the M tests regarding their size allocations to achieve the best overall result, in terms of overall type II error, among themselves.

The optimal weak FWER-controlling MDF can be converted to a procedure based on the p-value statistics. If η(α)=(ηm(α),mM) is the optimal weak FWER-α multiple decision size vector and (Sm(xm, um), mInline graphic) is the vector of computed p-value statistics, the decision based on data (x, u) = ((xm, um), mInline graphic) is δ(x,u)=(I{Sm(xm,um)ηm(α)},mM), an MDF based on weighted p-values. This is related to the approach in several papers using weighted p-values such as [16, 21, 29, 30, 46]. In our case, the weights are tied-in to the optimal sizes.

6. Strong FWER control

Let Δ=(Δm,mM) be the MP MDP with Δm=(δm(η):η[0,1]) the MP decision process for Hm0 : Qm = Qm0 versus Hm1 : Qm = Qm1 based on (Xm, Um). Wlog, assume that the size function Am(·) of Δm satisfies Am(η) = η. Define η : [0, 1] → [0, 1]M such that η(α) =(ηm(α), mInline graphic) is the optimal weak FWER-controlling multiple decision size vector at level α. Assume that each component of this mapping is nondecreasing and continuous, which is the case when the ROC functions of Δ* are twice-differentiable as established in Proposition 4.5.

For a weak FWER threshold of α ∈ [0, 1], the optimal MDF in Inline graphic is δW(α)=(δm(ηm(α)),mM), as given in (4.2). Associated with this MDF is the generalized multiple decision p-value statistic W = (Wm, mInline graphic), where

WmWm(Xm,Um)=inf{α[0,1]:δm(ηm(α))=1}. (6.1)

The wm = Wm(xm, um) is the smallest weak FWER size leading to rejection of Hm0 when using δW(α) given data (x, u) = ((xm, um), mInline graphic). The usual p-value statistic Sm [see (3.2)] for δm is related to Wm via

mM:Sm(Xm,Um)=ηm(Wm(Xm,Um)). (6.2)

Now, a lá [42, 45], suppose an Oracle knows Q, the true underlying probability measure of X. For the MDF δW(α), its FWER is

R0(δW(α),Q)=1-mM[1-ηm(α)]1-θm(Q).

This is nondecreasing and continuous in α since the mappings αηm(α) for each mInline graphic are nondecreasing and continuous. If the Oracle desires to control this type I error rate at a value q* ∈ [0, 1] and also minimize the MDR given by R2(δW(α),Q)=M1(Q)-mM1(Q)ρm(ηm(α)), where ρm(ηm(α)) is the power of δm(ηm(α)), then she should choose the largest α ∈ [0, 1] such that R0(δW(α),Q)=q. Owing to the continuity and nondecreasing properties of R0(δW(α),Q) in α, the Oracle’s optimal α could also be expressed via

α(q;Q)=inf{α[0,1]:mM[1-ηm(α)]1-θm(Q)<1-q}.

However, there is no Oracle and Q is not known, else there is no multiple decision problem. Thus, α(q*; Q) is not observable. A natural idea is to estimate the unknown θm(Q), the state of the mth pair of hypotheses. An intuitive and simple estimator of θm(Q) for a fixed value of α is

θ^m(Q)=δm(ηm(α)-)δm(Xm,Um;ηm(α)-). (6.3)

In turn, we obtain a step-down estimator α(q*) ≡ α(X, U ; q*) of the Oracle-based α(q*; Q) given by

α(q)=inf{α[0,1]:mM[1-ηm(α)]1-δm(ηm(α)-)<1-q}. (6.4)

This determines a compound MDF δS(q)δS(X,U;q)D, where

δS(q)=(δm(ηm(α(q))),mM). (6.5)

By virtue of the optimal choice of the ηm(α)’s and the use of the MP tests, we expect δS(q) to possess excellent, if not optimal, MDR-properties. By taking the infimum over the weak FWER-size α coupled with the estimation of θm(Q) by δm(ηm(α)-) in (6.4), there occurs an adaptive downweighting of components whose Hm0’s are most likely correct as dictated by the data (x, u). Theorem 6.1 below establishes that δS(q) in (6.5) does strongly control the FWER.

Theorem 6.1

Let q* ∈ [0, 1]. Then, ∀QInline graphic, R0(δS(q),Q)q.

Next, we reexpress δS(q) in terms of the generalized p-value statistic W. This is achieved by defining the random variable

J(q)=max{jM:m=iM[1-η(m)(W(i))]1-q,i=1,2,,j}.

Since α(q*) ∈ [W(J(q*)), W(J(q*)+1)), then

δS(q)=(δm(ηm(W(J(q)))),mM).

The next result shows that the sequential step-down Šidák MDF, which strongly controls FWER, is a special case of δS(q) under exchangeability.

Proposition 6.1

If the M ROC functions are identical, then δS(q) coincides with the sequential Šidák step-down FWER-controlling MDF.

7. Strong FDR control

Assume the same framework as in Section 6. Our idea in obtaining an FDR-controlling MDF builds on the development of the BH MDF, specifically the rationale of Theorem 2 in [1]. Let q* ∈ [0, 1] be the desired FDR threshold and Q be the underlying probability measure of X. We introduce two stochastic processes: T0 = {T0(α; Q) : α ∈ [0, 1]} and T = {T(α): α ∈ [0, 1]}, where

T0(α;Q)=mM0(Q)δm(ηm(α))andT(α)=mMδm(ηm(α)).

For the MDF δW(α), its FDR is

R1(δW(α),Q)=EQ{T0(α;Q)T(α)I{T(α)>0}}.

By the definition of the generalized p-value statistics Wm’s in (6.1), we have for α ∈ [W(m), W(m+1)) that T(α) = m, whereas

EQ{T0(α;Q)}=mM(1-θm(Q))ηm(α)mMηm(α). (7.1)

Focus now on an α ∈ [W(m), W(m+1)). If Inline graphic ηj (W(m)) ≤ mq*, then the best α in this interval will be the largest value satisfying Inline graphic ηj (α) ≤ mq*, since by increasing α, the MDR decreases as argued in the development of δS(q) in Section 6. This motivates our definition of α*(q*) = α*(X, U ; q*) as the step-up estimator

α(q)=sup{α[0,1]:mMηm(α)qmMδm(ηm(α))}. (7.2)

This induces a compound MDF δF(q)δF(X,U;q)D given by

δF(q)=(δm(ηm(α(q))),mM). (7.3)

Theorem 7.1 establishes that δF(q) does control the FDR at q*. Interestingly, the proof of this theorem, which can be found in [28], employs a reverse martingale argument.

Theorem 7.1

Let q*∈ [0, 1]. If, ∀QInline graphic\{Q0} and ∀α ∈ (0, 1), | Inline graphic(Q)| Inline graphic ηm(α) ≤ Inline graphic ηm(α), then R1(δF(q),Q)qq* for ∀QInline graphic.

Some remarks are in order regarding the condition in Theorem 7.1. Clearly, the Šidák multiple decision size vector, which is the optimal multiple decision size vector when the ROC functions are identical, always satisfies this condition. When not in this exchangeable setting, this condition induces some control on the differences of the ROC functions. The next proposition establishes that the BH procedure is a special case of δF(q) under exchangeability.

Proposition 7.1

If the ROC functions are identical, then δF(q) is the FDR-q* controlling MDF in [1].

Examination of the proof of Proposition 7.1 as presented in [28] shows that the BH MDF δBH(q*) coincides with the Šidák-size based MDF δS(q*). The martingale proof for Theorem 7.1 thus carries over to establishing FDR control by δBH(q*). We mention that a martingale-based proof of FDR control by δBH(q*) has also been presented in [44].

We also provide an alternative form of δF(q) in terms of the generalized p-value statistics Wm’s, a form analogous to the conventional formulation of the BH procedure. Define

J(q)J(X,U;q)=max{mM:jMηj(W(m))qm}. (7.4)

Then, it is easy to see that δF(q) rejects H(m)0 for m ∈ {1, 2,, J *(q*)} and accepts H(m)0 for m ∈ {J *(q*) + 1, J *(q*) + 2,, M}.

Finally, let us examine further the generalized p-value statistics Wm’s. Focusing on W(1), under Q0, we have that, for a ∈ (0, 1),

PQ0(W(1)>a)=PQ0{mM[δm(ηm(a))=0]}=mM[1-ηm(a)]=1-a,

the second equality obtained by using the independence of the δm’s under Q0. Thus, W(1) is standard uniform when all null hypotheses are correct. Using this uniformity result and Lemma D.2 presented in [28] dealing with lower and upper bounds of η for ηU B(Cα), we obtain in Proposition 7.2 presented below a lower bound for R1(δF(q),Q0), the FDR when all the null hypotheses are correct.

Proposition 7.2

q* ∈ [0, 1], 1-(1-q/M)MR1(δF(q),Q0)q.

8. A modest simulation

We compared through computer simulations the performances of δF and δBH in terms of FDR and MDR. The simulation model utilized is similar to the Gaussian example illustrating the optimal weak FWER-controlling procedure in Section 4.4. In this model, the observables are Xm ~ N(μm, 1) for each mInline graphic, which are independent of each other. The mth pair of hypotheses is Hm0 : μm ≤ 0 versus Hm1 : μm > 0. The UMP size-ηm test is δm(Xm;ηm)=I{Xm>Φ-1(1-ηm)}. The true values of the means μm’s are μm = ξmθm, mInline graphic, with θm ~ Ber(p) and effect sizes ξm ~ |N(ν, 1)|, again independently generated from each other. The parameter combinations were induced by taking M ∈ {20, 50, 100}, p ∈ {0.1, 0.2, 0.4} and ν ∈ {1, 2, 4}. The FDR-threshold utilized were q* ∈ {0.05, 0.10}. Since the computational implementation of δF takes time, for each combination of (q*, M, ν, p), we limited our simulations to 1,000 replications. The simulated FDR and MDR* were the averages of the false discovery proportions, L1(a, Q)’s, and the standardized missed discovery proportions, L2(a, Q)/| Inline graphic(Q)|, over the 1,000 replications. We used this standardized MDR since, for each replicate, a Q is generated, hence | Inline graphic(Q)| differs over the replications. In essence, we are comparing the averages of R2(δF,Q)/M1(Q) and R2(δBH, Q)/| Inline graphic(Q)|, where the averaging is with respect to the mechanism generating the Q’s over the simulation replications.

We only report results for q* = 0.10 in Table 1 since results for q* = 0.05 lead to similar conclusions. From this table, we observe that both δF and δBH fulfill the FDR-constraint, and in a conservative manner, which is expected from theory. More importantly, the MDR-performance of δF is better compared to that of δBH, with this dominance holding for all twenty-seven parameter combinations. Observe that as M is increased with (ν, p) remaining the same, there is an increase in their MDR*’s; whereas, when ν is increased, which increases the effect sizes, their MDR*’s decrease. Interestingly, the impact of a change of value in p, the proportion of true alternative hypotheses, did not necessarily translate into a monotone change in their MDR*’s, especially when M = 20, though for the larger M-values, the change in MDR* appears monotonically decreasing.

Table 1.

Comparison of the false discovery rate (FDR) and standardized missed discovery rate (MDR*) performance of MDFs δF and δBH under a variety of simulation parameters. This table is for q* = 0.10. The FDR and MDR* are in percentages. The number of replications is 1,000

q* M ν p
δF-FDR
δF-MDR
δBH-FDR δBH-MDR*
1 0.1 20 1 0.1 8.03 70.80 8.43 72.64
2 0.1 20 1 0.2 7.55 79.64 8.77 81.99
3 0.1 20 1 0.4 6.05 77.47 6.65 80.30
4 0.1 20 2 0.1 7.70 54.42 8.43 55.80
5 0.1 20 2 0.2 7.39 56.32 7.59 57.31
6 0.1 20 2 0.4 6.47 47.82 6.21 49.38
7 0.1 20 4 0.1 9.14 8.62 9.48 10.30
8 0.1 20 4 0.2 7.80 7.34 6.97 9.20
9 0.1 20 4 0.4 6.15 3.58 5.65 5.53
10 0.1 50 1 0.1 8.83 84.87 9.26 87.05
11 0.1 50 1 0.2 7.11 83.49 7.14 86.65
12 0.1 50 1 0.4 6.45 78.91 6.42 82.30
13 0.1 50 2 0.1 8.36 63.36 8.99 65.04
14 0.1 50 2 0.2 8.74 57.30 8.73 58.93
15 0.1 50 2 0.4 5.80 48.71 5.93 50.21
16 0.1 50 4 0.1 8.84 10.28 8.93 12.09
17 0.1 50 4 0.2 7.93 6.91 7.81 8.79
18 0.1 50 4 0.4 6.34 3.40 6.07 5.68
19 0.1 100 1 0.1 9.14 87.10 9.02 90.02
20 0.1 100 1 0.2 8.21 84.05 8.78 87.38
21 0.1 100 1 0.4 5.92 80.12 5.88 83.73
22 0.1 100 2 0.1 9.79 66.10 9.24 67.93
23 0.1 100 2 0.2 7.68 58.25 7.94 59.93
24 0.1 100 2 0.4 5.74 49.29 6.10 50.90
25 0.1 100 4 0.1 8.37 10.44 8.62 12.36
26 0.1 100 4 0.2 7.72 5.93 7.81 8.22
27 0.1 100 4 0.4 5.69 3.80 6.14 5.72

It may appear from this simulation study that the standardized improvement of δF over δBH is minuscule. However, note that when translated to overall number of discoveries, when M is large, δF will lead to many more discoveries than δBH while still maintaining desired FDR control. Such an increase in the number of discoveries may have important practical implications, such as enlarging the number of genes to be explored in consequent studies. This may translate to enhanced chances of discovering crucial and important genes without sacrificing the type I error rate.

9. Summary and concluding remarks

This paper provides some resolution on the role of the individual powers of test or decision functions, more appropriately their ROC functions, in multiple hypotheses testing problems. The importance and relevance of these problems have arisen because of the proliferation of high-dimensional “large M, small n” data sets in the natural, medical, physical, economic and social sciences. Such data sets are being created or generated due to advances in high-throughput technology, the latter fueled by speedy developments in computer technology and miniaturization.

Almost a century ago, Neyman and Pearson demonstrated the need to take into account the power function and the alternative hypothesis configuration when seeking an optimal test procedure in single-pair hypothesis testing. Their work led to a divorce from the then-existing significance or p-value approach. Currently, many multiple hypotheses testing procedures, epitomized by the Šidák procedures for weak and strong FWER control and by the Benjamini–Hochberg (BH) procedure for FDR control, are based on the p-values of the individual tests and do not consider differences in the power traits of the individual tests. They are appropriate in so-called exchangeable settings wherein power characteristics of the individual tests are identical. Such settings, however, are more the exception than the rule, since nonidentical power characteristics easily arise due to differences in the effect sizes, the dispersion parameters, or the test functions that are employed.

This paper examined whether differences in power characteristics of the individual tests could be exploited to improve on existing procedures for FWER and FDR control. Procedures were developed under the historically most fundamental scenario where the null and the alternative hypotheses are simple. First, an optimal MDF within the class of simple MDFs was shown to exist for weak FWER control. This MDF is better than the Šidák weak FWER-controlling MDF, though the latter is a special case of the optimal MDF under exchangeability. Optimality also informs us of an optimal size-investing strategy. Second, by using this optimal, though still restricted, MDF as an anchor, a compound MDF strongly controlling FWER was obtained. The sequential Šidák MDF is a special case of this MDF under exchangeability. Third, we developed a compound MDF that controls FDR. The BH procedure obtains from this MDF under exchangeability. By construction, these new MDFs have smaller MDRs relative to those that did not exploit power differences. The improvement was demonstrated through a modest simulation study by comparing the new FDR-controlling MDF and the BH MDF.

Though the proposed MDFs do improve on existing ones, we could not claim that they are optimal among all compound MDFs for strong FWER or FDR control. This question of global optimality is a difficult and elusive one. So far none of the existing compound MDFs, such as the estimated ODP in [42], could claim global optimality. In our case, the possible drawback is that in constructing the new MDFs, we started with the class of simple MDFs. The resulting MDFs are indeed compound, but establishing global optimality is not transparent. A question even arise as to whether there truly exists an optimal MDF among all compound MDFs that, say, control FDR. One thing certain about our MDFs is that they do control FWER or FDR. This is in contrast to some MDFs that are obtained from oracle MDFs via plugging-in of estimates for unknown quantities. Even though the oracle MDF, which are unimplementable, satisfies the type I error rate control, the plug-in step will usually invalidate such control. See [45] where optimality was in an asymptotic sense and with the type I error rate being the mFDR, as well as [13,29] for more discussions on these issues.

A natural layer to add in the decision-theoretic formulation of the problem is a Bayesian layer where a prior measure is specified on the unknown probability measure Q or, alternatively, on θ(Q). There is a possibility that through this Bayesian approach, one may be able to obtain a characterization of the class of optimal MDFs controlling type I error rates, or when the two types of error rates are combined, for example, via a weighted linear combination. The papers [10, 11, 26, 33] which employ Bayes or empirical Bayes approaches are highly relevant on this front.

Finally, we mention that there are still other aspects of the multiple decision problem not dealt with in this paper. First is the extension to situations with composite null and alternative hypotheses. We indicated some ideas in Section 5.2 for distributional models possessing the MLR property, but further and more extensive studies are needed. Second are possible dependencies among the components in (Xm, mInline graphic(Q)). We have assumed that this is an independent collection, but it is certainly of theoretical and applied relevance to examine dependent settings. Potential results in such scenarios will extend those in [2, 31, 32]. In these composite hypotheses and dependent data settings, we expect that resampling-based ideas and approaches, such as those in [47, 48], will be central.

Supplementary Material

Online Supplement

Acknowledgments

The first author is grateful to Dr. James Berger for facilitating his sabbatical leave visit at the Statistical and Applied Mathematical Sciences Institute (SAMSI) during Fall 2008 as this afforded him quality time for generating ideas relevant to this project. As such this work was partially supported by the National Science Foundation (NSF) under Grant DMS-0635449 to SAMSI. However, any opinions, findings and conclusions or recommendations expressed in this paper are those of the authors and do not necessarily reflect the views of the National Science Foundation. He is also grateful to Prof. Odd Aalen and Prof. Bo Lindqvist for facilitating his visits to the University of Oslo and the Norwegian University of Science and Technology (NTNU) which led to critical ideas for this project. The authors are highly grateful to the two reviewers, Associate Editor and the Editors for their comments, suggestions and criticisms. Special thanks to Prof. Sanat Sarkar and Prof. Lan Wang for a careful reading of an earlier version of the manuscript, and thank the following for comments or for pointing out references: Prof. J. Lynch, Dr. A. McLain, Prof. G. Rempala, Prof. J. Sethuraman, Prof. G. Taraldsen, Prof. A. Vidyashankar, Prof. L. Wasserman and Prof. P. Westfall. We also thank Dr. M. Peña for discussions about microarrays.

Footnotes

SUPPLEMENTARY MATERIAL

Supplement to “Power-Enhanced Multiple Decision Functions Controlling Family-Wise Error and False Discovery Rates” (DOI: 10.1214/10-AOS844SUPP;.pdf). The proofs of lemmas, propositions, theorems and corollaries are provided in this supplemental article [28].

References

  • 1.Benjamini Y, Hochberg Y. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J Roy Statist Soc Ser B. 1995;57:289–300. [Google Scholar]
  • 2.Benjamini Y, Yekutieli D. The control of the false discovery rate in multiple testing under dependency. Ann Statist. 2001;29:1165–1188. [Google Scholar]
  • 3.Bonferroni C. Teoria statistica delle classi e calcolo delle probabilita. Publ R Instit Super Sci Econ Commere Firenze. 1936;8:1–62. [Google Scholar]
  • 4.Cox DR, Hinkley DV. Theoretical Statistics. Chapman and Hall; London: 1974. [Google Scholar]
  • 5.Dudoit S, Gilbert HN, van der Laan M. Technical report. Univ. California; Berkeley: 2007. Resampling-based empirical Bayes multiple testing procedures for controlling generalized tail probability and expected value error rates: Focus on the false discovery rate and simulation study. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Dudoit S, Shaffer JP, Boldrick JC. Multiple hypothesis testing in microarray experiments. Statist Sci. 2003;18:71–103. [Google Scholar]
  • 7.Dudoit S, van der Laan MJ. Multiple Testing Procedures With Applications to Genomics. Springer; New York: 2008. [Google Scholar]
  • 8.Efron B. Large-scale simultaneous hypothesis testing: The choice of a null hypothesis. J Amer Statist Assoc. 2004;99:96–104. [Google Scholar]
  • 9.Efron B. Size, power and false discovery rates. Ann Statist. 2007;35:1351–1377. [Google Scholar]
  • 10.Efron B. Microarrays, empirical Bayes and the two-groups model. Statist Sci. 2008;23:1–22. [Google Scholar]
  • 11.Efron B. Simultaneous inference: When should hypothesis testing problems be combined? Ann Appl Statist. 2008;2:197–223. [Google Scholar]
  • 12.Efron B, Tibshirani R, Storey JD, Tusher V. Empirical Bayes analysis of a microarray experiment. J Amer Statist Assoc. 2001;96:1151–1160. [Google Scholar]
  • 13.Ferkingstad E, Frigessi A, Rue H, Thorleifsson G, Kong A. Unsupervised empirical Bayesian multiple testing with external covariates. Ann Appl Statist. 2008;2:714–735. [Google Scholar]
  • 14.Foster DP, Stine RA. α-investing: A procedure for sequential control of expected false discoveries. J R Stat Soc Ser B Stat Methodol. 2008;70:429–444. [Google Scholar]
  • 15.Genovese C, Wasserman L. Operating characteristic and extensions of the false discovery rate procedure. J R Stat Soc Ser B Stat Methodol. 2002;64:499–517. [Google Scholar]
  • 16.Genovese CR, Roeder K, Wasserman L. False discovery control with p-value weighting. Biometrika. 2006;93:509–524. [Google Scholar]
  • 17.Guindani M, Muller P, Zhang S. A Bayesian discovery procedure. J Roy Statist Soc Ser B. 2009;71:905–925. doi: 10.1111/j.1467-9868.2009.00714.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Habiger J, Peña EA. Randomized P-values and nonparametric procedures in multiple testing. J Nonparametr Stat. 2010:1–22. doi: 10.1080/10485252.2010.482154. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Ihaka R, Gentleman R. R: A language for data analysis and graphics. J Comput Graph Statist. 1996;5:299–314. [Google Scholar]
  • 20.Jin J, Cai TT. Estimating the null and the proportional of nonnull effects in large-scale multiple comparisons. J Amer Statist Assoc. 2007;102:495–506. [Google Scholar]
  • 21.Kang G, Ye K, Liu N, Allison D, Gao G. Weighted multiple hypothesis testing procedures. Stat Appl Genet Mol Biol. 2009;8:1–21. doi: 10.2202/1544-6115.1437. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Lan KKG, DeMets DL. Discrete sequential boundaries for clinical trials. Biometrika. 1983;70:659–663. [Google Scholar]
  • 23.Langaas M, Lindqvist BH, Ferkingstad E. Estimating the proportion of true null hypotheses, with application to DNA microarray data. J R Stat Soc Ser B Stat Methodol. 2005;67:555–572. [Google Scholar]
  • 24.Lehmann EL. Testing Statistical Hypotheses. 2. Springer; New York: 1997. [Google Scholar]
  • 25.Lehmann EL, Romano JP, Shaffer JP. On optimality of stepdown and stepup multiple test procedures. Ann Statist. 2005;33:1084–1108. [Google Scholar]
  • 26.Müller P, Parmigiani G, Robert C, Rousseau J. Optimal sample size for multiple testing: The case of gene expression microarrays. J Amer Statist Assoc. 2004;99:990–1001. [Google Scholar]
  • 27.Neyman J, Pearson E. On the problem of the most efficient tests of statistical hypotheses. Philos Trans R Soc Ser A. 1933;231:289–337. [Google Scholar]
  • 28.Peña E, Habiger J, Wu W. Supplement to “Power-enhanced multiple decision functions controlling family-wise error and false discovery rates”. 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Roquain E, van de Wiel MA. Optimal weighting for false discovery rate control. Electron J Stat. 2009;3:678–711. [Google Scholar]
  • 30.Rubin D, Dudoit S, van der Laan M. A method to increase the power of multiple testing procedures through sample splitting. Stat Appl Genet Mol Biol. 2006;5:Art 19, 20. doi: 10.2202/1544-6115.1148. (electronic) [DOI] [PubMed] [Google Scholar]
  • 31.Sarkar SK. Some probability inequalities for ordered MTP2 random variables: A proof of the Simes conjecture. Ann Statist. 1998;26:494–504. [Google Scholar]
  • 32.Sarkar SK. Generalizing Simes’ test and Hochberg’s stepup procedure. Ann Statist. 2008;36:337–363. [Google Scholar]
  • 33.Sarkar SK, Zhou T, Ghosh D. A general decision theoretic formulation of procedures controlling FDR and FNR from a Bayesian perspective. Statist Sinica. 2008;18:925–945. [Google Scholar]
  • 34.Schweder T, Spjøtvoll E. Plots of P-values to evaluate many tests simultaneously. Biometrika. 1982;69:493–502. [Google Scholar]
  • 35.Scott J, Berger J. An exploration of aspects of Bayesian multiple testing. J Statist Plann Inference. 2006;136:2144–2162. [Google Scholar]
  • 36.Šidák Z. Rectangular confidence regions for the means of multivariate normal distributions. J Amer Statist Assoc. 1967;62:626–633. [Google Scholar]
  • 37.Sorić B. Statistical “discoveries” and effect-size estimation. J Amer Statist Assoc. 1989:84608–610. [Google Scholar]
  • 38.Spjøtvoll E. On the optimality of some multiple comparison procedures. Ann Math Statist. 1972;43:398–411. [Google Scholar]
  • 39.Stevenson RL. The Strange Case of Dr Jekyll and Mr Hyde. 1. Longmans, Green and Co; London: 1886. [Google Scholar]
  • 40.Storey J. A direct approach to false discovery rates. J R Stat Soc Ser B Stat Methodol. 2002;64:479–498. [Google Scholar]
  • 41.Storey J. The positive false discovery rate: A Bayesian interpretation and the q-value. Ann Statist. 2003;31:2012–2035. [Google Scholar]
  • 42.Storey J. The optimal discovery procedure: A new approach to simultaneous significance testing. J R Stat Soc Ser B Stat Methodol. 2007;69:347–368. [Google Scholar]
  • 43.Storey J, Dai J, Leek J. The optimal discovery procedure for large-scale significance testing, with applications to comparative microarray experiments. Biostatistics. 2007;8:414–432. doi: 10.1093/biostatistics/kxl019. [DOI] [PubMed] [Google Scholar]
  • 44.Storey JD, Taylor JE, Siegmund D. Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: A unified approach. J R Stat Soc Ser B Stat Methodol. 2004;66:187–205. [Google Scholar]
  • 45.Sun W, Cai T. Oracle and adaptive compound decision rules for false discovery rate control. J Amer Statist Assoc. 2007;102:901–912. [Google Scholar]
  • 46.Wasserman L, Roeder K. Technical report. Carnegie-Mellon Univ; 2006. Weighted hypothesis testing. Available at http://arxiv.org/abs/math.ST/0604172. [Google Scholar]
  • 47.Westfall P, Troendle J. Multiple testing with minimal assumptions. Biom J. 2008;50:1–11. doi: 10.1002/bimj.200710456. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Westfall P, Young S. Resampling-Based Multiple Testing: Examples and Methods for P-Value Adjustment. Wiley; New York: 1993. [Google Scholar]
  • 49.Westfall PH, Krishen A, Young SS. Using prior information to allocate significance levels for multiple endpoints. Stat Med. 1998;17:2107–2119. doi: 10.1002/(sici)1097-0258(19980930)17:18<2107::aid-sim910>3.0.co;2-w. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Online Supplement

RESOURCES