Statistical properties of an early stopping rule for resampling-based multiple testing

Hui Jiang; Julia Salzman

doi:10.1093/biomet/ass051

. 2012 Oct 3;99(4):973–980. doi: 10.1093/biomet/ass051

Statistical properties of an early stopping rule for resampling-based multiple testing

Hui Jiang ¹, Julia Salzman ²

PMCID: PMC3629857 PMID: 23843675

Summary

Resampling-based methods for multiple hypothesis testing often lead to long run times when the number of tests is large. This paper presents a simple rule that substantially reduces computation by allowing resampling to terminate early on a subset of tests. We prove that the method has a low probability of obtaining a set of rejected hypotheses different from those rejected without early stopping, and obtain error bounds for multiple hypothesis testing. Simulation shows that our approach saves more computation than other available procedures.

Keywords: Bootstrap, Early stopping, False discovery rate control, Multiple hypothesis testing, Resampling

1. Introduction

When testing hypotheses on complex datasets, applied statisticians typically use resampling schemes to estimate significance levels. Tests applied to massive data often require either a nontrivial time for each resampling step, computation of hundreds or thousands of individual significance tests, or both. When the time required to estimate resampled p-values is on the order of days, this computation can prevent efficient data analysis. For examples of prohibitive computation time in genetics, see Siegmund et al. (2011), McLean et al. (2010) and Lin & Tang (2011). This raises the question: can the computation time for testing hypotheses be reduced without compromising the power and level of a test of significance? Under natural assumptions, we answer this question affirmatively by developing an adaptive approach for early termination of p-value estimation simulations. Application of the rule to a large dataset substantially reduced the time required to compute p-values.

There is a rich literature on methods that improve the precision of bootstrap quantile estimators or their bias, such as balance or importance resampling (Efron & Tibshirani, 1993; Davison & Hinkley, 1997, Ch. 9), and work that addresses replication size or the power of such estimators (Bickel & Freedman, 1981; Hall, 1986; Efron, 1987; Davidson & MacKinnon, 2000; Davison & Hinkley, 1997, Ch. 4). Lin & Tang (2011) use a stagewise procedure in which all hypotheses are first tested with small numbers of replications, after which small p-values are refined by further replications. Guo & Peddada (2008) use prespecified replication sizes to define candidate early stopping times. Then confidence intervals for bootstrap quantiles to determine stopping times are used to compute the false discovery rate using the procedure in Benjamini & Hochberg (1995).

In this paper, we introduce an adaptive stopping rule to reduce the computation time in resampling-based approaches, such as the bootstrap, when either false discovery rate control or precise estimates for small p-values are sought. Central to our procedure is a mapping of the p-value estimator to a stochastic process, which we use to provide theoretically tractable bounds on testing errors and computational savings. Compared to previous methods, our procedure does not require prespecified stagewise replication sizes and has greater flexibility and theoretical development. We also demonstrate greater computational savings of our approach in examples.

2. Early stopping procedure

For clarity, we will explain our methodology for the case of simple bootstrap p-value computations, although the results apply to other resampling methods; see § 5. Following the notation of Davidson & MacKinnon (2000), consider the case of a parameter of interest τ for which we have a point estimate τ̂. If the distribution of τ̂ under the null hypothesis H₀ is unknown, the significance level for a test based on τ̂ can be estimated using simulation, for example with the bootstrap. For j = 1, . . . , n, $τ_{j}^{*}$ is the j th bootstrapped value of τ̂ under H₀. The estimated bootstrap p-value is

{\hat{p}}^{*} (\hat{τ}) = n^{- 1} \sum_{j = 1}^{n} I (τ_{j}^{*} > \hat{τ}),

where I (·) is the indicator function. As n → ∞, p̂^*(τ̂), under regularity conditions, will tend to the ideal bootstrap p-value p^*(τ̂), which is unknown and conditioned on the data (Davidson & MacKinnon, 2000). For simplicity, we will use p̂^* and p^* to denote the estimated and ideal bootstrap p-values throughout this paper. The use of a point estimate τ̂ here is for ease of exposition, but use of an adjusted or pivotal estimator would generally be preferable in practice.

Modern datasets often require m hypotheses to be tested, so m bootstrap p-values are to be estimated, one for each null hypothesis H_0,i (i = 1, . . . , m). In many genomic applications, m is in the thousands or tens of thousands, as each gene requires the testing of an individual hypothesis. When m is large, even moderate values of n can make computation of bootstrap p-values prohibitive. In the context of multiple testing, the early stopping procedure makes the following assumptions.

Assumption 1. For each null hypothesis H_0,i, τ̂_i is a point estimate of τ_i, the parameter of interest. To estimate the p-value based on τ̂_i, the distribution for τ̂_i under H_0,i is generated by a bootstrap simulation. The simulation produces n independent bootstrapped values $τ_{i, 1}^{*}, \dots, τ_{i, n}^{*}$ .

Assumption 2. For each H_0,i and for each k = 1, . . . , n, define ${\hat{p}}_{i, k}^{*} = k^{- 1} \sum_{j = 1}^{k} I (τ_{i, j}^{*} > {\hat{τ}}_{i})$ . The standard estimated bootstrap p-value of the ith hypothesis is hence ${\hat{p}}_{i, n}^{*}$ .

An example where Assumption 1 holds is when $τ_{i, j}^{*}$ (j = 1, . . . , n) are simulated τ̂_i under H_0,i obtained by random sampling with replacement from the original dataset. Another example is parametric bootstrapping. The assumption of independence can be relaxed, but doing so is tangential to our main results.

Under the assumptions above, is it possible to bound the deviation of ${\hat{p}}_{i, k}^{*}$ from its expectation as a function of k? Since the answer is yes, stopping the simulation before n iterations can save computation time, and the error made in estimating p-values can be quantified. We will show that since ${\hat{p}}_{i, k}^{*}$ can be mapped to a stochastic process, the early stopping procedure below can be used to accurately estimate bootstrap p-values when they are smaller than a prespecified threshold p₀, an upper bound for the prespecified significance level of a statistical test performed, and stop if there is evidence from early simulations that ${\hat{p}}_{i, n}^{*}$ will exceed p₀. The procedure for early stopping is outlined below:

Early stopping procedure. For a fixed hypothesis i, let p₀ be a p-value threshold of interest. Let a and c be constants independent of i satisfying a > 0 and c > p₀/(1 − p₀). For k = 1, . . . , n, stop the simulation at the kth iteration if ${\hat{p}}_{i, k}^{*}$ > (a/k + c)/(1 + c), in which case denote ${\hat{p}}_{i, k}^{*}$ by ${\hat{p}}_{i}^{*, S}$ . Otherwise, continue until k = n and let ${\hat{p}}_{i}^{*, S} = {\hat{p}}_{i, n}^{*}$ .

Remark 1. If the procedure stops before n iterations, ${\hat{p}}_{i}^{*, S} > p_{0}$ .

Remark 2. The constants a, c, p₀ and n are assumed to be the same for all i.

Proposition 1. The procedure can be mapped to a stochastic process. For k = 1, . . . , n, let $ξ_{k} = I (τ_{i, k}^{*} > {\hat{τ}}_{1}) - c I (τ_{i, k}^{*} ⩽ {\hat{τ}}_{i})$ and $X_{k} = \sum_{j = 1}^{k} ξ_{j}$ . As ξ_k and X_k will depend on i, there are m such independent random processes. The stopping criterion for the procedure ${\hat{p}}_{i, k}^{*}$ > (a/k + c)/(1 + c) reduces to X_k > a. For the rest of the paper, we let T_a ∧ n denote the time at which the simulation stops. That is, T_a ∧ n = min{argmin_k (X_k > a), n}.

The early stopping procedure operates independently on each individual hypothesis. We state the procedure in the multiple hypothesis framework because it is in this scenario that the procedure’s statistical properties and computational savings are of greatest interest. Future work will focus on extensions of this procedure to settings of dependence.

3. Statistical properties of early stopping

3.1. Statistical properties for hypothesis testing

Proposition 2. For constants p₀ and c defined in the early stopping procedure, choose θ > 0 such that p₀ exp(θ) + (1 − p₀) exp(−cθ) = 1. For a fixed hypothesis i, suppose the ideal bootstrap p-value $p_{i}^{*} ⩽ p_{0}$ . Then the probability that the early stopping procedure will terminate before n iterations is at most exp(−aθ).

Remark 3. In this paper, we specify c implicitly by specifying δ > 0 satisfying c = (1 + δ) p₀/(1 − p₀). Guidelines for choosing δ so that a unique θ in Proposition 2 exists are given in § 3.3.

The connection between a bootstrap p-value and the stochastic process shown in Proposition 1 is especially useful for investigating the computational savings in the context of multiple hypothesis testing, where bootstrapping is applied to jointly test a family of multiple hypotheses of similar purpose, and some control over false rejections is desired. In this paper, we consider two widely used multiple testing procedures for the control of false rejections: the control of the false discovery rate fdr = E{I (S + V > 0)V/(S + V)} and the control of the familywise error rate fwer = pr(V ⩾ 1), where V and S are the number of true null and true alternative hypotheses rejected. The false discovery rate controls the expected proportion of false rejections (Benjamini & Hochberg, 1995), and the familywise error rate controls the probability of making one or more false rejections; see Simes (1986), Rom (1990), Hommel (1988) and Hochberg (1988).

Proposition 3. For i = 1, . . . , m, let ${\hat{p}}_{i}^{*} = {\hat{p}}_{i, n}^{*}$ denote the p-value that would be computed if the simulation were run to completion. Without loss of generality, assume that ${{\hat{p}}_{i}^{*}}_{i = 1}^{m}$ are in increasing order. For hypothesis j, if ${\hat{p}}_{j}^{*} ⩽ p_{0}$ , then ${\hat{p}}_{(j)}^{*, S} ⩾ {\hat{p}}_{j}^{*}$ , where ${\hat{p}}_{(j)}^{*, S}$ denotes the jth smallest ${\hat{p}}_{j}^{*, S}$ .

Remark 4. Proposition 3 implies that when rejected hypotheses have estimated p-values less than p₀, the familywise error rate is no larger when applied to estimated p-values from early stopping compared to those from running the simulation to completion.

Proposition 4. Suppose that a multiple hypothesis testing procedure applied to ${H_{i}}_{i = 1}^{m}$ rejects the ith hypothesis if ${\hat{p}}_{i}^{*}$ < α and that this procedure controls the false discovery rate at level q. If the early stopping rule is employed with p₀ ⩾ α and the ith hypothesis is rejected if ${\hat{p}}_{i}^{*, S}$ < α, then the false discovery rate is controlled at level q + m exp(−aθ).

Straightforward adaptations of the argument used in Proposition 4 can be made to prove an analogous proposition for a large class of step-up procedures in Benjamini & Hochberg (1995) and Benjamini & Yekutieli (2001), or adaptations of the Benjamini–Hochberg procedures such as two-stage step-up procedures (Storey, 2002).

Proposition 5. Suppose that m hypotheses are tested and the significance levels ${α_{i}}_{i = 1}^{m}$ corresponding to rejected hypotheses are such that α_i ⩽ p₀ and result in control of the false discovery rate at level q. With the early stopping rule, the false discovery rate is controlled at level q + m exp(−aθ).

3.2. Computational savings

Proposition 6. For a fixed hypothesis i, suppose that the ideal bootstrap p-value $p = p_{i}^{*}$ is a random variable uniformly distributed on the interval [0, 1]. Then

E_{p} {E (T_{a}^n)} ⩽ n p^{'} + \frac{(a + 1) (1 - p_{0})}{1 + p_{0} δ} log (\frac{n}{a}),

(1)

where T_a ∧ n denotes the stopping time defined in Proposition 1, p′ = {a(1 − p₀)/n + p₀(1 + δ)}/(1 + p₀δ), δ is defined in Remark 3, and n, the maximum number of simulations, is fixed. The outer expectation is taken with respect to p.

Taking n = 1000, δ = 1, a = 10, the right-hand side of (1) is roughly 150 when p₀ = 0.05 and roughly 79 when p₀ = 0.01, reducing the computation by a factor of 7 or 13, respectively. Further, (1) shows that the asymptotic factor for computation reduction using the stopping rule is (1 + p₀δ)/{ p₀(1 + δ)}.

3.3. Choosing parameters

This section addresses how to choose parameters that govern decision rules for the early stopping p-values. There is a trade-off between computational savings and false rejections. For given p₀ and δ, to compute θ, we need to solve the equation p₀ exp(θ) + (1 − p₀) exp{− p₀(1 + δ)θ/(1 − p₀} = 1. While a closed form solution to this equation involves solving a high degree polynomial equation, convenient numerical solutions are straightforward, and the existence, uniqueness and bounds for θ can be easily established.

Proposition 7. Fix 0 < p₀ < 1/3 and 0 < δ < 1. If

δ < \frac{1 - p_{0}}{p_{0}} {log}_{2} \frac{1 - p_{0}}{1 - 2 p_{0}} - 1,

then there exists a unique solution θ > 0 satisfying the equation p₀ exp(θ) + (1 − p₀) exp{− p₀(1 + δ)θ/(1 − p₀)} = 1 with δ/2 ⩽ θ ⩽ 2δ.

For p₀ = 0.05, the above condition requires δ < 0.48, or δ < 0.45 for p₀ = 0.01. The precise value of θ can be computed numerically. Figure 1 in the Supplementary Material shows the values and lower/upper bounds for θ under different combinations of p₀ and δ provided by Proposition 7.

4. Simulation and application

4.1. Simulation

The early stopping procedure in this paper is, to the best of our knowledge, the first early stopping procedure with theoretical finite sample guarantees in errors of single-hypothesis testing and on false discovery rate control for multiple testing. Thus, there is no way to compare its theoretical error rates to those of other methods in the literature. Here, we use simulations to compare the early stopping procedure with that of Guo & Peddada (2008), in which a number of prespecified replication sizes B₀ ⩽ ⋯ ⩽ B_K for a bootstrap are used for candidate early stopping times. For each k = 0, . . . , K, B_k bootstrap replications are simulated, Clopper–Pearson confidence intervals for estimated bootstrap p-values are computed, and the procedure in Benjamini & Hochberg (1995) is applied to the upper and lower bounds of the confidence intervals, respectively. Hypotheses that are accepted or rejected for both applications of the Benjamini–Hochberg procedure are excluded from further bootstrapping and their current bootstrap p-value estimates are reported. All other hypotheses are included in the next round of B_k+1 bootstrap replications. No theoretical properties of the computational savings due to the Guo–Peddada procedure are available.

To compare the two procedures, we simulate the testing of m = 1000 hypotheses. Among all the hypotheses i = 1, . . . , m, 95% are assumed to be true null with ideal bootstrap p-value $p_{i}^{*}$ simulated from Un(0, 1). The remaining 5% of the tests are assumed to be true alternative with ideal bootstrap p-value $p_{i}^{*}$ simulated from Un(0, 0.0001). We do not simulate any specific bootstrap procedure in which data are resampled and test statistics are calculated because both the Guo–Peddada and the early stopping procedures are independent of the specific implementation. For the ith hypothesis and for the kth simulated bootstrap iteration, we draw a Bernoulli random variable with success probability $p_{i}^{*}$ , resembling the event $τ_{i, k}^{*} > {\hat{τ}}_{i}$ . To evaluate the errors made by the early stopping procedure in the same manner as those made by the Guo–Peddada procedure, the Benjamini–Hochberg procedure is run on the p-values estimated by the early stopping procedure, testing at false discovery rate levels α = 0.01 and 0.05. These and all other parameters are set as suggested in the simulations in Guo & Peddada (2008). The confidence level for the Clopper–Pearson exact confidence intervals used in the Guo–Peddada procedure is set to be 0.99. The prespecified replication sizes used in the Guo–Peddada procedure are set to be B₀ ⩽ ⋯ ⩽ B_K with B₀ = 125, 250 or 500, B_K = 2000 and B_k₊₁ = 2B_k (k = 0, . . . , K − 1). In the early stopping procedure, we set n = 2000, p₀ = α, a = 5 or 10 and δ = 0.04 or 0.07.

The simulation is repeated 100 times, the average performance of the two approaches is compared using the three criteria defined in Guo & Peddada (2008) and the simulation results are reported in Table 1. The Guo–Peddada and the early stopping procedures agree closely. For the conditions in the simulations, the Guo–Peddada procedure has a 3–6-fold reduction in computational requirements and the early stopping procedure has 7–12-fold reduction.

Table 1.

Comparison between the Guo–Peddada procedure and the early stopping procedure over 100 replications

		Guo–Peddada			Early stopping
		B₀ = 125	B₀ = 250	B₀ = 500	δ = 0.4	δ = 0.4	δ = 0.7
		B₀ = 125	B₀ = 250	B₀ = 500	a = 5	a = 10	a = 5
α = 0.01	AveB	330	442	661	159	188	166
	Fold red.	6.1	4.5	3.0	12.6	10.6	12.0
	Degree con.	98%	98%	98%	97%	97%	98%
α = 0.05	AveB	355	461	679	250	282	277
	Fold red.	5.6	4.3	2.9	8.0	7.1	7.2
	Degree con.	98%	98%	98%	98%	98%	98%

Open in a new tab

AveB is the average number of bootstrap replications for each test using the Guo–Peddada procedure or the early stopping procedure. Fold red., the ratio between the replication size 2000 of a full bootstrap procedure and AveB; Degree con., 1 − |R₁ − R₂|/ R₂, where R₁ and R₂ are respectively the numbers of rejected hypotheses using the Guo–Peddada procedure or the early stopping procedure and the Benjamini–Hochberg procedure on the ideal bootstrap p-values (Guo & Peddada, 2008).

4.2. Genomic application

The following application demonstrates the effect of the proposed early stopping rule in reducing computation on a genomic dataset generated from a high-throughput sequencing experiment. The method developed in Salzman et al. (2011) was used to estimate isoform-specific gene expression in two conditions. To detect differential isoform usage of a gene, let β̂₁ and β̂₂ be the vectors of point estimates of isoform-specific gene expression in two conditions, respectively. The test statistic for differential isoform usage is

T = {‖ \frac{{\hat{β}}_{1}}{{‖ {\hat{β}}_{1} ‖}_{1}} - \frac{{\hat{β}}_{2}}{{‖ {\hat{β}}_{2} ‖}_{1}} ‖}_{1},

where ‖ · ‖₁ denotes the vector L₁-norm. The estimates of β̂₁ and β̂₂ are computed by maximizing a Poisson likelihood, and the Poisson parameters are a function of a matrix of constants, called the sampling rate matrix. Details can be found in Salzman et al. (2011).

Proper detection of differential isoform usage depends on accurate estimation of the null distribution of T on a per gene basis, as this distribution varies as a function of the gene being estimated. To accurately estimate the null distribution of T, resampling of read counts needs to be performed on a per gene basis and for each bootstrap iteration a nontrivial computation must be made for the estimation of $β_{1}^{*}$ and $β_{2}^{*}$ by maximizing the likelihood function using convex optimization. The computation for the null distribution of T is intensive, since a small p-value of 10⁻³ necessarily requires thousands of bootstrap simulations. The scheme for resampling is given in the Supplementary Material.

In our experiments, using the conventional approach without early stopping, it takes five hours to estimate the p-values for 258 genes, each with 5000 simulations. The set of 258 genes are preselected because the computation would be prohibitive if it were run on the full set of over 15 000 expressed genes. Using our early stopping approach with parameters p₀ = 0.05, a = 10 and δ = 1, the computing time is reduced to only 25 minutes, a 12-fold reduction. The computation is performed on a computer with a 2.66 GHz processor and 32 GB memory. Figure 2 in the Supplementary Material compares the estimated p-values with and without early stopping. Using a significance level of p₀ = 0.05, the same 19 genes are called significant in both cases.

5. Discussion

The bounds in this paper are not tight because our purpose is to show how a simple rule and basic statistical theory can be applied to reduce computation time substantially with provable statistical properties. Other applications of our methodology, for example reducing computation times for small p-values, are also possible.

Our results extend to other resampling schemes where the sequentially generated p-value estimates from a simulation can be formulated as a random process, provided the early stopping theory we use here can be applied. This includes p-values from permutation tests, assuming that appropriate regularity conditions are met. Further efficiencies may be achieved by modifying parameter choices in our procedure, for example, by letting n, c, a, p₀ depend on i. Such modifications could have applied utility, and studying their applied and theoretical properties will be of future interest.

Supplementary material

Supplementary material available at Biometrika online includes figures and the resampling scheme used in the genomic example in § 4.2.

Acknowledgments

We thank Jamie Bates and Patrick O. Brown for providing the sequencing data, and Wing Hung Wong, Xiaoquan Wen and Raymond Chen for critical reading of the manuscript. This research was partially supported by grants from the National Institutes of Health and National Science Foundation, U.S.A.

Appendix

Proofs

Lemma A1. Under the conditions in Proposition 2, M_k (θ) = exp(θX_k) is a supermartingale.

Proof. Since $X_{k} = \sum_{j = 1}^{k} ξ_{j}$ , it suffices to show that for any j, E{exp(θξ_j)} ⩽ 1. By definition of θ, 1 = p₀ exp(θ) + (1 − p₀) exp(−cθ) ⩾ $p_{i}^{*}$ exp(θ) + (1 − $p_{i}^{*}$ ) exp(−cθ) = E{exp(θξ_j)}.

Proof of Proposition 2. Let T_a ∧ n be the stopping time defined in Proposition 1. It suffices to show pr(X_{T_a∧n}) ⩽ exp(−aθ). Since θ > 0 and T_a ∧ n is bounded, Markov’s inequality yields pr(X_{T_a∧n} > a) = pr(θ X_{T_a∧n} > aθ) ⩽ E{exp(θ X_{T_a∧n})} exp(−aθ) ⩽ exp(−aθ), where the second inequality is due to Lemma A1 and the optional stopping theorem.

Proof of Proposition 3. Recall that ${τ_{i, k}^{*}}_{k = 1}^{n}$ are independently and identically distributed bootstrapped values of τ̂_i under null hypothesis H_0,i in the i th hypothesis test. The simulations producing ${\hat{p}}_{i}^{*}$ and ${\hat{p}}_{i}^{*, S}$ can be coupled so that they are defined on the same probability space: for each hypothesis test i where the sampling stops after n_i iterations,

{\hat{p}}_{i}^{*, S} = \frac{1}{n_{i}} \sum_{k = 1}^{n_{i}} I (τ_{i, k}^{*} ⩾ {\hat{τ}}_{i}), {\hat{p}}_{i}^{*} = \frac{1}{n} \sum_{k = 1}^{n} I (τ_{i, k}^{*} ⩾ {\hat{τ}}_{i}) (i = 1, \dots, m) .

Without loss of generality, assume that ${{\hat{p}}_{i}^{*}}_{i = 1}^{m}$ are in increasing order. Let l = argmax_1⩽l⩽m ${\hat{p}}_{l}^{*}$ ⩽ p₀. As stated in Remark 1, if the simulation stops early, then ${\hat{p}}_{i}^{*, S} > p_{0}$ , which establishes that for all i = 1, . . . , l, ${\hat{p}}_{i}^{*, S} ⩾ {\hat{p}}_{i}^{*}$ . Let ${\hat{p}}_{(j)}^{*, S}$ be the jth smallest p-value using the early stopping procedure. Since ${{\hat{p}}_{i}^{*}}_{i = 1}^{m}$ are assumed to be in increasing order, it follows that ${\hat{p}}_{j}^{*, S} ⩾ {\hat{p}}_{j}^{*} ⩾ {\hat{p}}_{t}^{*}$ for all j = 1, . . . , l, t = 1, . . . , j. Hence, ${\hat{p}}_{(j)}^{*, S} ⩾ {\hat{p}}_{j}^{*}$ .

Proof of Proposition 4. Recall that the false discovery rate is defined as fdr = E{I (S + V > 0) V/(S + V)}, where V and S are the number of true null and true alternative hypotheses rejected using bootstrap p-values estimated from all the n iterations. Let V ′ and S′ be the number of true null and true alternative hypotheses rejected using bootstrap p-values estimated from the early stopping rule. Suppose l₁ of the true alternative and l₂ of the true null hypotheses, rejected using bootstrap p-values estimated from all the n iterations, are no longer rejected using bootstrap p-values estimated from the early stopping rule. By Proposition 3, no new hypotheses will be rejected using the early stopping rule. Then

\begin{array}{l} FD R^{'} = E {I (S^{'} + V^{'} > 0) \frac{V^{'}}{S^{'} + V^{'}}} \\ = E {I (S^{'} + V^{'} > 0) I (l_{1} = 0) \frac{V^{'}}{S^{'} + V^{'}}} + E {I (S^{'} + V^{'} > 0) I (l_{1} > 0) \frac{V^{'}}{S^{'} + V^{'}}} \\ = E {I (S + V - l_{1} - l_{2} > 0) I (l_{1} = 0) \frac{V - l_{2}}{S + V - l_{1} - l_{2}}} + E {I (l_{1} > 0)} \\ ​ ⩽ E {I (S + V > 0) \frac{V}{S + V}} + E {I (l_{1} > 0)} \\ ⩽ q + m exp (- a θ), \end{array}

where E{I (l₁ > 0)} ⩽ m exp(−aθ) by Proposition 2 since the p-value of the rejected alternative hypothesis is assumed to be at most p₀.

Proof of Proposition 5. The proof uses the same argument as in the proof of Proposition 4 to bound E{I (S′ + V′ > 0)V ′/(S′ + V ′)}. It is assumed that the step-up procedure with significance levels ${α_{i}}_{i = 1}^{m}$ has the property that if the ith hypothesis is rejected, ${\hat{p}}_{i}^{*} ⩽ α_{i} ⩽ p_{0}$ . On the event that no simulation for a p-value with the property that ${\hat{p}}_{i}^{*} ⩽ α_{i}$ stops early, ${\hat{p}}_{i}^{*} ⩽ p_{0}$ , ${\hat{p}}_{i}^{*, S} ⩽ p_{0}$ and ${\hat{p}}_{i}^{*, S} ⩽ {\hat{p}}_{i}^{*}$ . On this event, l₁ = 0. By the same argument as in Proposition 4, pr(l₁ > 0) = E{I (l₁ > 0)} ⩽ m exp(−aθ).

Proof of Proposition 6. Let p′ satisfy the condition in the statement of the proposition. The expectation on the left-hand side of (1), as an integral with respect to p, can be broken into the integrals over [0, p′] and (p′, 1]. The integral over [0, p′] is bounded by np′ since T_a ∧ n is bounded by n. To bound the integral over (p′, 1], consider the martingale X_k − k(p + pc − c), where p = $p_{i}^{*}$ = pr( $τ_{i, k}^{*}$ > τ̂_i) = pr(ξ_k = 1). Since p′ > p₀(1 + δ)/(1 + p₀δ), for p ∈ (p′, 1], E (ξ_k) = p + pc − c > 0. Since T_a ∧ n is a bounded stopping time, the optional stopping theorem implies that 0 = E{X_{T_a∧n} − (T_a ∧ n)(p + pc − c)} ⩽ (a +1) − E (T_a ∧ n) (p + pc − c), thus E (T_a ∧ n) ⩽ (a + 1)/(p + pc − c). Integrating this expectation with respect to p over the interval (p′, 1] yields the result.

Proof of Proposition 7. Let f (θ) = p₀ exp(θ) + (1 − p₀) exp{− p₀(1 + δ)θ/(1 − p₀)} − 1. We have f(0) = 0, $f^{'} (0) = p_{0} exp (θ) + (1 - p_{0}) exp {- p_{0} (1 + δ) θ / (1 - p_{0})} {{- \frac{p_{0} (1 + δ)}{1 - p_{0}}} |}_{θ = 0} = - p_{0} δ < 0$ .

The conditions of the proposition imply that f (log 2) > 0, establishing the existence of a solution θ ∈ (0, log 2) by the mean value theorem. Further, the conditions 0 < p₀ < 1/3 and 0 < δ < 1 imply that δ ⩽ (1 − 2p₀)/ p₀, that is, p₀(1 + δ)/(1 − p₀) ⩽ 1. Therefore, both θ and p₀(1 + δ)θ /(1 − p₀) are within (0, log 2). Taylor’s theorem implies that there exists ξ ∈ (0, x) so that exp(x) = 1 + x + (x²/2) exp(∈). Thus, if 0 ⩽ x ⩽ log 2, then 1 + x + x²/2 ⩽ exp(x) ⩽ 1 + x + x². Similarly, if 0 ⩽ x ⩽ log 2, then 1 − x + x²/4 ⩽ exp(−x) ⩽ 1 − x + x²/2. Using the above inequalities derived from Taylor’s theorem to bound the terms exp(θ) and exp{− p₀(1 + δ) /(1 − p₀)} in f (θ) yields

δ {1 + \frac{{(1 + δ)}^{2} p_{0}}{2 (1 - p_{0})}}^{- 1} ⩽ θ ⩽ 2 δ {1 + \frac{{(1 + δ)}^{2} p_{0}}{2 (1 - p_{0})}}^{- 1} ⩽ 2 δ .

(A1)

Combining (A1) with the inequality

{1 + \frac{{(1 + δ)}^{2} p_{0}}{2 (1 - p_{0})}}^{- 1} ⩾ {1 + \frac{(1 + δ)}{2}}^{- 1} ⩾ \frac{1}{2}

implies that δ/2 ⩽ θ ⩽ 2δ.

References

Benjamini Y, Hochberg Y. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. R. Statist. Soc. 1995;B 57:289–300. [Google Scholar]
Benjamini Y, Yekutieli D. The control of the false discovery rate in multiple testing under dependency. Ann Statist. 2001;29:1165–88. [Google Scholar]
Bickel PJ, Freedman DA. Some asymptotic theory for the bootstrap. Ann Statist. 1981;9:1196–217. [Google Scholar]
Davidson R, MacKinnon JG. Bootstrap tests: How many bootstraps? Economet Rev. 2000;19:55–68. [Google Scholar]
Davison AC, Hinkley DV. Bootstrap Methods and Their Application. Cambridge: Cambridge University Press; 1997. [Google Scholar]
Efron B. Better bootstrap confidence intervals. J Am Statist Assoc. 1987;82:171–200. [Google Scholar]
Efron B, Tibshirani RJ. An Introduction to the Bootstrap. Boca Raton: Chapman and Hall/CRC; 1993. [Google Scholar]
Guo W, Peddada S. Adaptive choice of the number of bootstrap samples in large scale multiple testing. Statist Appl Genet Molec Biol. 2008;7 doi: 10.2202/1544-6115.1360. Art 13, 21. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hall P. On the number of bootstrap simulations required to construct a confidence interval. Ann Statist. 1986;14:1453–62. [Google Scholar]
Hochberg Y. A sharper Bonferroni procedure for multiple tests of significance. Biometrika. 1988;75:800–2. [Google Scholar]
Hommel G. A stagewise rejective multiple test procedure based on a modified Bonferroni test. Biometrika. 1988;75:383–6. [Google Scholar]
Lin D-Y, Tang Z-Z. A general framework for detecting disease associations with rare variants in sequencing studies. Am J Hum Genet. 2011;89:354–67. doi: 10.1016/j.ajhg.2011.07.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
McLean CY, Bristor D, Hiller M, Clarke SL, Schaar BT, Lowe CB, Wenger AM, Bejerano G. GREAT improves functional interpretation of cis-regulatory regions. Nat Biotechnol. 2010;28:495–501. doi: 10.1038/nbt.1630. [DOI] [PMC free article] [PubMed] [Google Scholar]
Rom DM. A sequentially rejective test procedure based on a modified Bonferroni inequality. Biometrika. 1990;77:663–5. [Google Scholar]
Salzman J, Jiang H, Wong WH. Statistical modeling of RNA-Seq data. Statist Sci. 2011;26:62–83. doi: 10.1214/10-STS343. [DOI] [PMC free article] [PubMed] [Google Scholar]
Siegmund DO, Yakir B, Zhang NR. Detecting simultaneous variant intervals in aligned sequences. Ann Appl Statist. 2011;5:645–68. [Google Scholar]
Simes RJ. An improved Bonferroni procedure for multiple tests of significance. Biometrika. 1986;73:751–4. [Google Scholar]
Storey JD. A direct approach to false discovery rates. J. R. Statist. Soc. 2002;B 64:479–98. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary material available at Biometrika online includes figures and the resampling scheme used in the genomic example in § 4.2.

[b1-ass051] Benjamini Y, Hochberg Y. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. R. Statist. Soc. 1995;B 57:289–300. [Google Scholar]

[b2-ass051] Benjamini Y, Yekutieli D. The control of the false discovery rate in multiple testing under dependency. Ann Statist. 2001;29:1165–88. [Google Scholar]

[b3-ass051] Bickel PJ, Freedman DA. Some asymptotic theory for the bootstrap. Ann Statist. 1981;9:1196–217. [Google Scholar]

[b4-ass051] Davidson R, MacKinnon JG. Bootstrap tests: How many bootstraps? Economet Rev. 2000;19:55–68. [Google Scholar]

[b5-ass051] Davison AC, Hinkley DV. Bootstrap Methods and Their Application. Cambridge: Cambridge University Press; 1997. [Google Scholar]

[b6-ass051] Efron B. Better bootstrap confidence intervals. J Am Statist Assoc. 1987;82:171–200. [Google Scholar]

[b7-ass051] Efron B, Tibshirani RJ. An Introduction to the Bootstrap. Boca Raton: Chapman and Hall/CRC; 1993. [Google Scholar]

[b8-ass051] Guo W, Peddada S. Adaptive choice of the number of bootstrap samples in large scale multiple testing. Statist Appl Genet Molec Biol. 2008;7 doi: 10.2202/1544-6115.1360. Art 13, 21. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b9-ass051] Hall P. On the number of bootstrap simulations required to construct a confidence interval. Ann Statist. 1986;14:1453–62. [Google Scholar]

[b10-ass051] Hochberg Y. A sharper Bonferroni procedure for multiple tests of significance. Biometrika. 1988;75:800–2. [Google Scholar]

[b11-ass051] Hommel G. A stagewise rejective multiple test procedure based on a modified Bonferroni test. Biometrika. 1988;75:383–6. [Google Scholar]

[b12-ass051] Lin D-Y, Tang Z-Z. A general framework for detecting disease associations with rare variants in sequencing studies. Am J Hum Genet. 2011;89:354–67. doi: 10.1016/j.ajhg.2011.07.015. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b13-ass051] McLean CY, Bristor D, Hiller M, Clarke SL, Schaar BT, Lowe CB, Wenger AM, Bejerano G. GREAT improves functional interpretation of cis-regulatory regions. Nat Biotechnol. 2010;28:495–501. doi: 10.1038/nbt.1630. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b14-ass051] Rom DM. A sequentially rejective test procedure based on a modified Bonferroni inequality. Biometrika. 1990;77:663–5. [Google Scholar]

[b15-ass051] Salzman J, Jiang H, Wong WH. Statistical modeling of RNA-Seq data. Statist Sci. 2011;26:62–83. doi: 10.1214/10-STS343. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b16-ass051] Siegmund DO, Yakir B, Zhang NR. Detecting simultaneous variant intervals in aligned sequences. Ann Appl Statist. 2011;5:645–68. [Google Scholar]

[b17-ass051] Simes RJ. An improved Bonferroni procedure for multiple tests of significance. Biometrika. 1986;73:751–4. [Google Scholar]

[b18-ass051] Storey JD. A direct approach to false discovery rates. J. R. Statist. Soc. 2002;B 64:479–98. [Google Scholar]

PERMALINK

Statistical properties of an early stopping rule for resampling-based multiple testing

Hui Jiang

Julia Salzman

Summary

1. Introduction

2. Early stopping procedure

3. Statistical properties of early stopping

3.1. Statistical properties for hypothesis testing

3.2. Computational savings

3.3. Choosing parameters

4. Simulation and application

4.1. Simulation

Table 1.

4.2. Genomic application

5. Discussion

Supplementary material

Acknowledgments

Appendix

Proofs

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Statistical properties of an early stopping rule for resampling-based multiple testing

Hui Jiang

Julia Salzman

Summary

1. Introduction

2. Early stopping procedure

3. Statistical properties of early stopping

3.1. Statistical properties for hypothesis testing

3.2. Computational savings

3.3. Choosing parameters

4. Simulation and application

4.1. Simulation

Table 1.

4.2. Genomic application

5. Discussion

Supplementary material

Acknowledgments

Appendix

Proofs

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases