Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2021 Mar 1.
Published in final edited form as: Genet Epidemiol. 2019 Nov 11;44(2):139–147. doi: 10.1002/gepi.22268

A flexible and nearly optimal sequential testing approach to randomized testing: QUICK-STOP

Julian Hecker 1,3,*, Ingo Ruczinski 2, Michael Cho 3, Edwin Silverman 3, Brent Coull 1, Christoph Lange 1,3
PMCID: PMC7028451  NIHMSID: NIHMS1063602  PMID: 31713269

Abstract

In the analysis of current life science datasets, we often encounter scenarios in which the application of asymptotic theory to hypothesis testing can be problematic. Besides improved asymptotic results, permutation/simulation-based tests are a general approach to address this issue. However, these randomized tests can impose a massive computational burden, for example, in scenarios in which large numbers of statistical tests are computed, and the specified significance level is very small. Stopping rules aim to assess significance with the smallest possible number of draws while controlling the probabilities of errors due to statistical uncertainty. In this communication, we derive a general stopping rule, QUICK-STOP, based on the sequential testing theory that is easy to implement, controls the error probabilities rigorously, and is nearly optimal in terms of expected draws. In a simulation study, we show that our approach outperforms current stopping approaches for general randomized tests by factor 10 and does not impose an additional computational burden. We illustrate our approach by applying our stopping rule to a single variant analysis of a whole-genome sequencing study for lung function.

Keywords: association p-value, next-generation sequencing, permutation, randomized test, sequential testing

Introduction

The analysis in genetic studies often involves association testing of single variants, genetic regions, gene-gene-interaction, or gene expression data. Due to decreasing sequencing costs, study sample sizes and the number of available variants have increased by several magnitudes over the last few years. For many studies, this has created a large computational burden and a severe problem of multiple testing. Given the number of tests computed, very small alpha levels must be achieved to establish that test results are considered significant.

Furthermore, the evaluation of p-values itself can be problematic for many study designs and test statistics. Rare genetic variants, imbalanced case control ratios, phenotypic outliers (Stranger et al., 2005), or non-normally distributed phenotypes can lead to scenarios in which standard asymptotic theory does not provide reliable results. For some test statistics of interest, for example in gene-based/region-based analysis (Chen, Hsu, Gamazon, Cox, & Nicolae, 2012; Lee, Abecasis, Boehnke, & Lin, 2014; Liu et al., 2010; Mishra & Macgregor, 2015), the asymptotic distribution of the test statistic is even not tractable and permutation/simulation procedures have to be applied which can be computationally challenging (Malik et al., 2018; Sugasawa, Noma, Otani, Nishino, & Matsui, 2017).

Researchers have proposed various approaches and strategies to overcome the described issues regarding the assessment of statistical significance. These approaches include the transformation of the data to normality so that standard methodology can be applied, more specialized asymptotic results/approximations for particular test statistics, the application of permutation/simulation-based tests, approximations of the permutation-based p-values, and efficient resampling algorithms. In the setting where non-normally distributed traits are tested for association, for example, lung volume and blood markers, or in the presence of phenotypic outliers, a proposed strategy is to apply inverse normal transformations to constrain the phenotype information to a normal distribution form (Beasley, Erickson, & Allison, 2009). Then, the standard methodology can be applied to test the phenotype data for association. This procedure was applied and investigated in several recent publications (Boueiz et al., 2016; Vavoulis, Taylor, Schuh, & Bar-Joseph, 2017; Zhang, Xie, Liang, & Xiong, 2016). Although these transformations can solve the problem of unreliable asymptotic p-values for some scenarios, they can also lead to a loss of power (Beasley et al., 2009). In Supplementary Material A, we included simulations that indicate a loss of power in regression analysis when rank transformations are applied.

While the corresponding theory for approaches that approximate the permutation/resampling-based p-values or derive more specific asymptotic results is much more involved than the straightforward permutation/simulation-based procedure, it is also restricted to specific scenarios/test statistics. The approaches are not universally applicable, and many epidemiological studies and association test statistics nevertheless require randomized testing approaches based on permutations or simulations (e.g., Chen et al., 2012; Malik et al., 2018; Zhu, Zhang, & Sha, 2018).

As explained above, if randomized testing is applied in a large-scale setting, the computational burden is enormous. To reduce this computational burden, some simple stopping rules were proposed (Chang et al., 2015; Che, Jack, Motsinger-Reif, & Brown, 2014; Hasegawa et al., 2016). Instead of estimating the empirical p-value based on a pre-specified number of permutations/simulations per test, these approaches aim to stop randomization testing early if the corresponding p-value is clearly nonsignificant and therefore reduce the number of randomized draws and the computational time.

Closely related is the methodology of the so-called sequential Monte Carlo testing (Besag & Clifford, 1991; Gandy, 2009). The approach by Gandy (2009), called SIMCTEST, describes a stopping rule that guarantees to control the probability of a wrong decision, but the implementation requires recursions and additional tuning parameters.

In this communication, we develop a general stopping rule for randomized tests, QUICK-STOP (QS), that is based on sequential testing theory, and decides as fast as possible if the unknown p-value is below a specific significance level of interest. In contrast to the sequential Monte Carlo testing literature (Gandy, 2009), we introduce an arbitrary small indifference region between both hypotheses in which both decisions are acceptable. This leads to much simpler derivations and an intuitive stopping rule. For applications, the indifference region can be selected as arbitrarily small so that it is not of practical relevance but guarantees a finite runtime. The approach reduces the computational burden substantially compared to the confidence interval-based (CI) stopping rule in the popular genetic analysis tool PLINK1.9 (Chang et al., 2015).

Furthermore, we can utilize the existing theoretical sequential testing results to show that our procedure is nearly optimal in terms of the number of expected replicates, minimizing the computation burden. It is important to note that our methodology can be applied to any randomized test, is not restricted to a specific scenario, and does not impose a significant computational burden. Our approach combines computational efficiency and generality with an intuitive approach.

Methods

We consider a genetic epidemiological study where many statistical tests are performed, and significance of findings is declared based on an appropriate significance level psignificance, that is corrected for multiple testing. For example, psignificance=5*108 in a GWAS. We assume that each statistical test is based on a suitable association test statistic T that can be computed based on observed data, but significance cannot be evaluated by asymptotic theory. We assume that one can draw random permutations (e.g. in a regression model) or simulate the corresponding null distribution (e.g. a suitable normal distribution as in gene-based tests (Liu et al., 2010; Mishra & Macgregor, 2015)) such that we can interpret the procedure of comparison between the randomized test statistic and the observed statistic as a sequence xi{0,1} of independent Bernoulli draws with parameter p that corresponds to the true, unknown p-value. After n permutations/simulations, the empirical p-value is usually estimated by p^MLE=1nr=1nxr. Given unrestricted computational resources and time, all p-values could be estimated with high accuracy based on a very large number of permutations/simulations, and significant results identified. To avoid this infeasible computational burden, a general stopping rule aims to decide between the hypotheses

H1:ppsignificanceandH2:p>psignificance

based on as few permutations/simulations as possible. Since p is estimated based on permutations/simulations, a rigorous stopping rule needs to control the probability of a wrong decision due to statistical uncertainty in the estimation. In the following, we will refer to a type I decision error if the hypothesis H2 is true and the stopping rule chooses H1 A type II decision error occurs if the hypothesis H1 is true and the stopping rule chooses H2.

An important consequence is that, if the probability of a type I or type II decision error by the stopping rule is negligibly small, randomized testing based on the statistic T in combination with the stopping rule basically maintains the corresponding type 1 error and power of T but reduces the computational burden.

QUICK-STOP

In this section, we will introduce the setting and objects of our stopping rule QUICK-STOP and describe the theoretical properties. QUICK-STOP is based on the so-called Adaptive Sequential Likelihood Ratio Test (ASLRT) derived by Pavlov (Pavlov, 1991) and Tartakovsky (Tartakovsky, 2014).

The two hypotheses that we want to distinguish between are:

H1:pp1andH2:pp2=p1+d=psignificance

where the indifference parameter d>0 is a technical parameter that is chosen very small and separates the two hypotheses. Within the indifference region (p1,p2), we assume that both hypotheses are acceptable. For example, d=108 would correspond to the resolution level of 108 permutations/simulations but can be chosen arbitrary small. We will elaborate on the theoretical aspects of the indifference region/parameter d in more detail below.

Besides the parameters p1 and d, our stopping rule requires the specification of the parameters α1 and α2 As we will show, these two parameters control the type I and the type II decision error probabilities introduced above. Since we usually we would like to avoid such wrong decisions in practice, we suggest very small values, for example α1=α2=1010. We introduce the objects

τ1(α1):=min{n:πnsuppD1bn(p,xn)α11}

and

τ2(α2):=min{n:πnsuppD2bn(p,xn)α21},

where xn=(x1,,xn), πn:=r=1nb(p^r1,xr),bn(p,xn)=r=1nb(p,xr) and D1=[0,p1], D2=[p2,1]. Here, b(p,x) is the probability distribution of a Bernoulli random variable with success probability p,x{0,1}. The estimate p^r1 is defined by p^r1=k=1r1xk+12r, a slightly modified version of the maximum likelihood estimator for the success probability pthat is still asymptotically consistent. Our stopping rule is denoted by (N,δ), where δ=1 or δ=2 defines the selected hypothesis, and N is the number of permutations/simulations computed until this decision is made. If τ1(α1)τ2(α2), we set δ=2 and N=τ1(α1). If τ2(α2)<τ1(α1), we set δ=1 and N=τ2(α2).

An illustrative example

The intuition behind our stopping rule QUICK-STOP is that the objects πnsuppD1bn(p,xn) and πnsuppD2bn(p,xn) are similar to likelihood ratios between the currently estimated p-value and the best explaining parameter in the respective hypothesis bin.

To illustrate our stopping rule and the corresponding objects, we demonstrate an application with the following parameters α1=α2=1010, p1=4.99*108 and p2=5*108, implying d=1010. First, we consider the case p=0.1, a clearly non-significant p-value. As described above, the sequence x1,x2, can be thought to be generated by independent Bernoulli(p) draws. A simulated example gives us the observed sequence x1=0,x2=0,x3=1,x4=0,x5=0,x6=0,x7=1,. According to the definitions above, for n = 6 this results in π6=(112)*(114)*16*(138)*(1310)*(1312)0.02051. At this point, we obtain the values π6suppD1b6(p,x6)=410978.3 and π6suppD2b6(p,x6)=0.30618 ; therefore, both objects did not reach their corresponding threshold. After n = 7 draws, we have π7=(112)*(114)*16*(138)*(1310)*(1312)*3140.004394531 and we get π7suppD1b7(p,x7)=1.76*1012 and π7suppD2b7(p,x7)=0.2895, showing that the first object reaches the threshold of 1α1=1010, indicating that the observed sequence is extremely unlikely under hypothesis H1. Our stopping rule chooses δ=2 and sets n = 7. Therefore, QUICK-STOP used 7 permutations/simulations to decide the unknown p-value is non-significant and chose hypothesis H2.

For a second example, we consider p=109. Here, we expect to draw many permutations/simulations without observing a more extreme test statistic. A simulated example resulted in xi=0 for i763193033,i17652977, and x17652977=1. A straightforward computation for n=763193033 gives πnsuppD1bn(p,xn)=2.78*105 and πnsuppD2bn(p,xn)1010=1α2, that means the point where QUICK-STOP stops and selects hypothesis δ=1 with N=763,193,033.

Theorem 1 summarizes two important properties of our stopping rule.

Theorem 1.

  • (i)

    Pp[δ=2]α1 for pD1, and Pp[δ=1]α2 for pD2

  • (ii)

    Let K(α,t1,t2) be the class of all stopping rules (N,δ) such that property (i) is fulfilled for α1:=αt1 and α2:=αt2. Then,

Ep[N]inf(N,)K(α,t1,t2)Ep[N]=1+o(1)asα0for all p[0,1].

Theorem 1 states that for fixed parameters p1and p2 (and so d), our stopping rule QUICK-STOP guarantees to control the error probabilities at the arbitrary, pre-specified rates α1 and α2. The key for the proof of this result is the usage of a one-step delayed estimator p^r1 that allows utilizing Doob’s martingale inequality (Pavlov, 1991; Tartakovsky, 2014). Besides, for small rates α1 and α2 that are of interest in practice, the expected number of permutations/simulations N of QUICK-STOP approaches the theoretical lower bound among all comparable stopping rules that also control the error probabilities at the same levels. The theoretical lower bound depends on p, p1, p2 (and therefore d), α1 and α2. Therefore, our procedure is nearly optimal regarding the number of expected draws. The explicit terms for the lower bound are described in Theorem 1 in the Supplementary Material B. The indifference parameter d determines a tradeoff between worst-cased expected runtime and the interpretability of the results due to the indifference region. In contrast to the scenario without separated hypotheses, the introduction of the indifference region implies that the procedure stops eventually (Gandy, 2009), and the expected number of permutations/simulations is finite. It is important to note that our result includes the deterministic cases p{0,1} (Supplementary Material B). After the decision, the current empirical p-value provides an accurate p-value estimate for tests with p-values close to the thresholds p1 and p2. If one is also interested in an accurate estimation of p-values that are clearly not significant, one can combine the stopping rule with a minimum number of permutations/simulations.

Application and numerical study

In this section, we describe the results of a simulation study and the application of our approach to a whole-genome sequencing study for lung function.

Simulation study

In this simulation study, we compare QUICK-STOP with a different, simple stopping rule that is based on the adaptive permutation procedure in the popular genetic analysis tool PLINK1.9 (Chang et al., 2015). Both stopping rules can be applied to any randomized test. We directly apply both stopping rules to a sequence x1,x2, of independent Bernoulli variables with unknown parameter p, mimicking a general scenario of randomized testing. We assessed how fast both rules could decide which hypothesis is true and how the rates of wrong decisions behave empirically. The PLINK1.9 related stopping rule is based on an asymptotic confidence interval approximation. It aims to decide between the two hypotheses

H1:ppsignificanceandH2:p>psignificance

After n permutations/simulations, the p-value is estimated by the proportion of randomized test statistics that were more extreme than the observed statistic, that means: p^MLE=1nr=1nxi. Based on the corresponding estimated standard error p^MLE(1p^MLE)n, a 1α confidence interval utilizing the asymptotic normal distribution is constructed. If this confidence interval does not contain the significance level psignificance, the stopping rule draws the decision based on the location of the estimated p-value below or above the significance level. We will refer to this approach by CI. We note that the estimation of the standard error requires an estimated p-value that is not 0 or 1.

For QUICK-STOP, we chose α1=α2=1010 and for CI we considered a (11010) confidence interval (based on asymptotic normal distribution assumption).

In Tables 1 - 4, we report the average number of draws required by both approaches for multiple combinations of p, psignificance and d. For QUICK-STOP, we considered two different indifference parameters d, a scenario with a very small value and a scenario where the indifference region might be of practical impact (see the difference of p2 and p1 in Tables 14).

Table 1.

Averaged number of draws until the decision for CI and QUICK-STOP (QS). The significance level is chosen as psiginificace=0.005. Results are based on 10,000 replicates.

Significance cutoff 0.5 0.1 0.05 10−2 p
10−3
10−4 10−5 10−6 10−7 10−8

CI psiginificace=0.005 22 379 937 16453 2666 10012 100480 976758 9.98e06 9.81e07
QS p1/p2=4.999*103/5*103 13 124 372 14311 11374 6051 5617 5574 5569 5569
p1/p2=4.9*103/5*103 13 123 367 13584 11374 6051 5617 5574 5569 5569

Table 4.

Averaged number of draws until the decision for CI and QUICK-STOP (QS). The significance level is chosen as psiginificace=109. Results for p107 are based on 10,000 replicates, results for p=108 are based on 1,000 replicates due to computational reasons.

Significance cutoff 0.5 0.1 0.05 10−2 p
10−3
10−4 10−5 10−6 10−7 10−8

CI psiginificace=1*109 22 343 767 4155 41975 420480 4.20e06 4.20e07 4.30e08 5.06e09
QS p1/p2=0.999*109/1*109 4 20 39 200 2690 29952 400473 5.62e06 9.35e07 2.35e09
p1/p2=0.9*109/1*109 4 20 39 200 2658 29860 398335 5.53e06 9.09e07 2.21e09

Compared to CI, QUICK-STOP generally reduces the number of draws dramatically in most scenarios. In Tables 1 and 2, we see a huge reduction for very small p, since the CI approach cannot estimate the confidence interval before the estimated p-value is non-zero. The parameter d does not have a strong impact on the QUICK-STOP results, indicating that this parameter should be chosen very small in practice, reducing the indifference region. However, if p is of the same magnitude as psignificance, there are scenarios where the CI approach does require fewer draws than QUICK-STOP. The reason for this is that the probability of a wrong decision by the CI approach is much larger than the (11010) confidence level suggests.

Table 2.

Averaged number of draws until the decision for CI and QUICK-STOP (QS). The significance level is chosen as psiginificace=5*104. Results are based on 10,000 replicates.

Significance cutoff 0.5 0.1 0.05 10−2 p
10−3
10−4 10−5 10−6 10−7 10−8

CI psiginificace=5*104 22 345 784 4594 167198 26682 101565 976847 9.98e06 9.81e07
QS p1/p2=4.999*104/5*104 9 61 148 1344 149838 119452 63133 58647 58191 58158
p1/p2=4.9*104/5*104 9 61 147 1330 142424 119452 63133 58647 58191 58158

We analyzed the empirical wrong decisions rates for psignificance=5*103 and psignificance=5*104. We considered values for p close to these significance levels; the results are reported in Table 5. As we can see, the empirical rates for wrong decisions are increased for the CI approach. Since the CI approach does not consider the ‘trajectory’ along the draws and relies on an asymptotic distribution, the confidence interval does not correspond to an error rate of 10−10. In comparison, we did not observe a wrong decision by QUICK-STOP. This is expected since we chose α1=α2=1010 and according to Theorem 1, the error rates are controlled by these parameters.

Table 5.

Empirical analysis of the probability of a wrong decision for multiple combinations of p, psignificance and d. Results are based on 107 replications (*107 replications, ** 104replications, *** 103replications, due to computational restrictions).

p
Significance cutoff 10−2 5.1 * 10−2 10−3 5.1 * 10−3 10−4 10−5

CI psignificance=5*103 4.2e-06 0.004** 1.0092e-03 5.003e-04 1.029e-4 9.6e-06
QS p1/p2=4.999*103/5*103 0 0** 0 0 0 0
p1/p2=4.9*103/5*103 0 0** 0 0 0 0

CI psignificance=5*104 0 0 3e-06* 0.002*** 1.021e-4 9.6e-06
QS p1/p2=4.999*104/5*104 0 0 0* 0*** 0 0
p1/p2=4.9*104/5*104 0 0 0* 0*** 0 0

The increased error rate of CI can also be computed directly. For example, for p=5*108, CI stops if, after 149,400,000 simulations, only 1 success was observed. The corresponding estimated p-value is 6.69*109<psignificance. If we assume that the true p-value is p=5.01*108>psignificance, the probability of observing exactly 1 success after 149,400,000 simulations is 0.0042. The strength of QUICK-STOP is that it guarantees the probability of an error, even if the true p-value is very small and close to the significance level.

Application to COPD whole-genome sequencing study

In the simulation study above, we analyzed the performance of QUICK-STOP and the CI-based approach for fixed p. To illustrate the application and advantages of our approach in a real data example where the true, unknown p's are drawn from a realistic distribution, we considered a genome-wide single variant association analysis for a whole-genome sequencing dataset from the COPDGene study (Regan et al., 2010).

The COPDGene Study consists of >10,000 current or former smokers with and without chronic obstructive pulmonary disease (COPD). Subjects were of non-Hispanic White or African-American ancestry and age between 45 and 80 years. Also, a minimum of 10 pack-years of smoking and no lung disease (other than COPD or asthma) were ascertainment criteria. The Boston Early-Onset COPD study (Silverman et al., 1998) (BEOCOPD) is an extended pedigree study with probands age below 53 and severe COPD (defined as forced expiratory volume in one second (FEV1) < 40% predicted). As part of the National Heart, Lung, and Blood Institute Trans-Omics in Precision Medicine (TOPMed) project, 2000 severely affected cases from BEOCOPD and COPDGene, and controls with normal spirometry from COPDGene were selected (Prokopenko et al., 2018) for whole-genome sequencing. Our analysis dataset consisted of 51,715,479 genetic variants and 1,794 samples, after selecting variants with non-missingness, no lower MAF cutoff, and the quality control described in Prokopenko et al. (2018). Using covariates for pack-years and 10 eigenvectors from Jaccard population stratification analysis (Prokopenko et al., 2016), we computed the covariate-adjusted quantitative lung function trait (FEV1 percent predicted) via linear regression. To test for association between this adjusted phenotype and genotype, we also consider a linear regression model. However, corresponding asymptotic p-values are not reliable since the phenotype distribution in the sample is highly skewed, and most variants in a WGS study are rare. Therefore, we evaluated significance by permutation of the phenotype information.

We compare our approach QUICK-STOP with the adaptive, confidence interval-based permutation stopping rule implemented in the popular genetic analysis tool PLINK1.9 (Chang et al., 2015). This approach corresponds to the stopping rule CI in the previous simulation study. For QUICK-STOP, we chose α1=α2=1010 and ran the analysis with p1=5*108,p2=6*108 as the first scenario and with p1=109,p2=2*109 as the second scenario.

For the analysis with PLINK1.9, we applied the aperm command, which requires the specification of six parameters (Chang et al., 2015). We specified the minimum and maximum number of permutations by 2 and 109 (maximum possible value), respectively, and that variants can be pruned out after every permutation. We set the significance level to 5*108 in the first scenario, and 10−19 in the second. In addition, we chose β=0.005, such that PLINK1.9 computes a (10.00551715479)(11010) confidence interval (based on the normal distribution approximation) and stops if this confidence interval does not contain the significance level.

In Table 6, we report the results of this analysis. For each lower cutoff for p-values, we extracted all variants for which the final p-value estimate provided by PLINK1.9 was above this threshold. We report the corresponding number of variants and the overall number of computed permutations along these single nucleotide polymorphisms (SNPs) for both methods. We observe that our approach reduces the number of required permutations by a factor around 10 along most SNPs. In addition, we observe that most of the permutations are performed on a small set of ‘interesting/promising’ SNPs. For example, in the first scenario, our approach utilizes more than 50% of the overall number of permutations for 278 variants. The corresponding fraction for the confidence interval-based approach is only approximately 23%.

Table 6:

Analysis of the COPD dataset with our sequential testing approach QUICK-STOP and PLINK1.9.

Scenario Lower p-
value cutoff
Number of
variants
QUICK-STOP PLINK1.9 Ratio PLINK1.9/
QUICK-STOP

1  0.05 49,226,335 3.96e08 3.99e09 10.08
 10−2 51,244,126 7.70e08 7.43e09 9.65
 10−3 51,676,557 1.17e09 1.27e10 10.85
 10−4 51,712,482 1.58e09 1.68e10 10.63
 10−5 51,715,193 1.98e09 1.98e10 10.00
 10−6 51,715,445 2.60e09 2.28e10 8.77
   – 51,715,471 4.37e09 2.57e10 5.88

2  0.05 49,226,335 3.81e08 3.99e09 10.47
 10−2 51,244,126 6.22e08 7.43e09 11.95
 10−3 51,676,557 9.16e08 1.27e10 13.87
 10−4 51,712,482 1.24e09 1.68e10 13.55
 10−5 51,715,193 1.50e09 1.98e10 13.2
 10−6 51,715,446 1.79e09 2.27e10 12.68
   – 51,715,479 3.06e09 2.87e10 9.38

For each p-value lower cutoff, we considered all variants where the final p-value estimate of PLINK1.9 was above this cutoff. We excluded the genetic variants where PLINK1.9 reached the maximum of 109 permutations.

We excluded variants from this comparison where PLINK1.9 reached the maximum number of permutations (109), since this number is truncated by the software implementation. This only affected the first scenario, where we excluded eight genetic variants. One of these genetic variants was reported to be significant by our approach with an estimated p-value of p^5.24*108. In the second scenario, neither PLINK1.9 nor our approach required 109 permutations for a variant, and no genetic variant was reported to be genome-wide significant with respect to this significance level.

Overall, this real data example demonstrates the practical advantages of our approach; the results are in line with our simulation study above.

Discussion

The analysis of recent datasets in the field of life science often encounters scenarios in which asymptotic theory cannot be applied, and the general approach to significance testing in these scenarios is permutation or simulation.

To avoid the huge computational effort, researchers have proposed several methods to approximate permutation-based p-values for particular test statistics or designed efficient resampling procedures that utilize special properties of common application scenarios.

On the other hand, a general approach that aims to reduce the computational burden of permutation/simulation-based testing without losing flexibility and robustness is the utilization of stopping rules. The goal of such stopping rules is to decide as fast as possible if the unknown p-value is below a pre-specified level of significance.

Here, we proposed such a general and intuitive sequential testing-based stopping rule, that is easy to implement, requires almost no additional computational effort, and controls the error probabilities rigorously. Based on the theory for sequential testing, it can be shown that our stopping rule is nearly optimal in terms of the expected number of permutations/simulations. In a simulation study, we investigated the performance of our approach and showed that our stopping rule reduces the number of permutations/simulations substantially compared to the current methodology. In an application to a whole-genome sequencing study for lung function, we demonstrated the implementation and the practical value of our stopping rule. An implementation scheme and a simulation tool for QUICK-STOP can be found at https://github.com/julianhecker/QUICK-STOP.

Supplementary Material

Supp infoA
Supp infoB
supp_inf_3

Table 3.

Averaged number of draws until the decision for CI and QUICK-STOP (QS). The significance level is chosen as psiginificace=5*108. Results for p107 are based on 10,000 replicates, results for p=108 are based on 1,000 replicates due to computational reasons.

Significance cutoff 0.5 0.1 0.05 10−2 p
10−3
10−4 10−5 10−6 10−7 10−8

CI psiginificace=5*108 22 343 765 4176 41955 420576 4.29e06 4.68e07 1.67e09 2.72e08
QS p1/p2=4.999*108/5*108 4 20 46 294 3547 47337 737914 1.57e07 1.73e09 1.41e09
p1/p2=4.9*108/5*108 4 20 45 294 3535 47246 735007 1.56e07 1.64e09 1.41e09

Acknowledgments

This work was supported by Cure Alzheimer’s Fund; the National Human Genome Research Institute [R01HG008976]; and the National Heart, Lung, and Blood Institute [U01HL089856, U01HL089897, P01HL120839, P01HL132825].

Grant Numbers

Cure Alzheimer’s Fund; the National Human Genome Research Institute [R01HG008976]; and the National Heart, Lung, and Blood Institute [U01HL089856, U01HL089897, P01HL120839, P01HL132825].

Whole genome sequencing (WGS) for the Trans-Omics in Precision Medicine (TOPMed) program was supported by the National Heart, Lung and Blood Institute (NHLBI). WGS for “NHLBI TOPMed: Genetic Epidemiology of COPD” (phs000951) was performed at the Broad Institute of MIT and Harvard (HHSN268201500014C), and at the University of Washington Northwest Genomics Center (3R01HL089856-08S1). Centralized read mapping and genotype calling, along with variant quality metrics and filtering were provided by the TOPMed Informatics Research Center (3R01HL-117626-02S1; contract HHSN268201800002I). Phenotype harmonization, data management, sample-identity QC, and general study coordination were provided by the TOPMed Data Coordinating Center (3R01HL-120393-02S1; contract HHSN268201800001I). We gratefully acknowledge the studies and participants who provided biological samples and data for TOPMed.

The COPDGene project described was supported by Award Number U01 HL089897 and Award Number U01 HL089856 from the National Heart, Lung, and Blood Institute. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Heart, Lung, and Blood Institute or the National Institutes of Health. The COPDGene project is also supported by the COPD Foundation through contributions made to an Industry Advisory Board comprised of AstraZeneca, Boehringer Ingelheim, GlaxoSmithKline, Novartis, Pfizer, Siemens, and Sunovion. A full listing of COPDGene investigators can be found in Supplementary Information C.

The TOPMed Banner Authorship List is provided in Supplementary Information C.

Footnotes

Data availability

The data that support the findings of this study are/will be available in dbGaP at https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000179.v6.p2, reference number phs000179 (Genetic Epidemiology of COPD (COPDGene). https://www.ncbi.nlm.nih.gov/projects/gap/cgi‐bin/study.cgi?study_id=phs000179.v6.p2), and the software example is available at https://github.com/julianhecker/QUICK‐STOP.

Declaration of Interests

The authors declare no competing interests.

Supplementary Material

Supplementary Material A and B contain a simulation regarding rank-based transformations and a detailed version of Theorem 1. Supplementary Material is available online.

References

  1. Beasley TM, Erickson S, & Allison DB (2009). Rank-Based Inverse Normal Transformations are Increasingly Used, But are They Merited? Behavior Genetics, 39(5), 580–595. 10.1007/s10519-009-9281-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Besag J, & Clifford P. (1991). Sequential Monte Carlo p-Values. Biometrika, 78(2), 301–304. 10.2307/2337256 [DOI] [Google Scholar]
  3. Boueiz A, Lutz SM, Cho MH, Hersh CP, Bowler RP, Washko GR, … DeMeo DL (2016). Genome-Wide Association Study of the Genetic Determinants of Emphysema Distribution. American Journal of Respiratory and Critical Care Medicine, 195(6), 757–771. 10.1164/rccm.201605-0997OC [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Chang CC, Chow CC, Tellier LC, Vattikuti S, Purcell SM, & Lee JJ (2015). Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience, 4, 7 10.1186/s13742-015-0047-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Che R, Jack JR, Motsinger-Reif AA, & Brown CC (2014). An adaptive permutation approach for genome-wide association study: evaluation and recommendations for use. BioData Mining, 7, 9 10.1186/1756-0381-7-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Chen LS, Hsu L, Gamazon ER, Cox NJ, & Nicolae DL (2012). An exponential combination procedure for set-based association tests in sequencing studies. American Journal of Human Genetics, 91(6), 977–986. 10.1016/j.ajhg.2012.09.017 [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Gandy A. (2009). Sequential Implementation of Monte Carlo Tests With Uniformly Bounded Resampling Risk. Journal of the American Statistical Association, 104(488), 1504–1511. 10.1198/jasa.2009.tm08368 [DOI] [Google Scholar]
  8. Hasegawa T, Kojima K, Kawai Y, Misawa K, Mimori T, & Nagasaki M. (2016). AP-SKAT: highly-efficient genome-wide rare variant association test. BMC Genomics, 17, 745 10.1186/s12864-016-3094-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Lee S, Abecasis GR, Boehnke M, & Lin X. (2014). Rare-Variant Association Analysis: Study Designs and Statistical Tests. The American Journal of Human Genetics, 95(1), 5–23. 10.1016/j.ajhg.2014.06.009 [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Liu JZ, McRae AF, Nyholt DR, Medland SE, Wray NR, Brown KM, … Macgregor S. (2010). A versatile gene-based test for genome-wide association studies. American Journal of Human Genetics, 87(1), 139–145. 10.1016/j.ajhg.2010.06.009 [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Malik R, Chauhan G, Traylor M, Sargurupremraj M, Okada Y, Mishra A, … Dichgans M. (2018). Multiancestry genome-wide association study of 520,000 subjects identifies 32 loci associated with stroke and stroke subtypes. Nature Genetics, 50(4), 524–537. 10.1038/s41588-018-0058-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Mishra A, & Macgregor S. (2015). VEGAS2: Software for More Flexible Gene-Based Testing. Twin Research and Human Genetics: The Official Journal of the International Society for Twin Studies, 18(1), 86–91. 10.1017/thg.2014.79 [DOI] [PubMed] [Google Scholar]
  13. Genetic Epidemiology of COPD (COPDGene). Retrieved from https://www.ncbi.nlm.nih.gov/projects/gap/cgi‐bin/study.cgi?study_id=phs000179.v6.p2
  14. Pavlov I. (1991). Sequential Procedure of Testing Composite Hypotheses with Applications to the Kiefer–Weiss Problem. Theory of Probability & Its Applications, 35(2), 280–292. 10.1137/1135036 [DOI] [Google Scholar]
  15. Prokopenko D, Hecker J, Silverman EK, Pagano M, Nöthen MM, Dina C, … Fier HL (2016). Utilizing the Jaccard index to reveal population stratification in sequencing data: a simulation study and an application to the 1000 Genomes Project. Bioinformatics (Oxford, England), 32(9), 1366–1372. 10.1093/bioinformatics/btv752 [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Prokopenko D, Sakornsakolpat P, Loehlein Fier H, Qiao D, Parker MM, McDonald M-LN, … COPDGene Investigators, NHLBI TOPMed Investigators. (2018). Whole Genome Sequencing in Severe Chronic Obstructive Pulmonary Disease. American Journal of Respiratory Cell and Molecular Biology. 10.1165/rcmb.2018-0088OC [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Regan EA, Hokanson JE, Murphy JR, Make B, Lynch DA, Beaty TH, … Crapo JD (2010). Genetic epidemiology of COPD (COPDGene) study design. COPD, 7(1), 32–43. 10.3109/15412550903499522 [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Silverman EK, Chapman HA, Drazen JM, Weiss ST, Rosner B, Campbell EJ, … Speizer FE (1998). Genetic epidemiology of severe, early-onset chronic obstructive pulmonary disease. Risk to relatives for airflow obstruction and chronic bronchitis. American Journal of Respiratory and Critical Care Medicine, 157(6 Pt 1), 1770–1778. 10.1164/ajrccm.157.6.9706014 [DOI] [PubMed] [Google Scholar]
  19. Stranger BE, Forrest MS, Clark AG, Minichiello MJ, Deutsch S, Lyle R, … Dermitzakis ET (2005). Genome-Wide Associations of Gene Expression Variation in Humans. PLOS Genetics, 1(6), e78. 10.1371/journal.pgen.0010078 [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Sugasawa S, Noma H, Otani T, Nishino J, & Matsui S. (2017). An efficient and flexible test for rare variant effects. European Journal of Human Genetics, 25(6), 752–757. 10.1038/ejhg.2017.43 [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Tartakovsky AG (2014). Nearly optimal sequential tests of composite hypotheses revisited. Proceedings of the Steklov Institute of Mathematics, 287(1), 268–288. 10.1134/S0081543814080161 [DOI] [Google Scholar]
  22. Vavoulis DV, Taylor JC, Schuh A, & Bar-Joseph Z. (2017). Hierarchical probabilistic models for multiple gene/variant associations based on next-generation sequencing data. Bioinformatics, 33(19), 3058–3064. 10.1093/bioinformatics/btx355 [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Zhang F, Xie D, Liang M, & Xiong M. (2016). Functional Regression Models for Epistasis Analysis of Multiple Quantitative Traits. PLOS Genetics, 12(4), e1005965. 10.1371/journal.pgen.1005965 [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Zhu H, Zhang S, & Sha Q. (2018). A novel method to test associations between a weighted combination of phenotypes and genetic variants. PLOS ONE, 13(1), e0190788. 10.1371/journal.pone.0190788 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supp infoA
Supp infoB
supp_inf_3

RESOURCES