Skip to main content
Biostatistics (Oxford, England) logoLink to Biostatistics (Oxford, England)
. 2015 Sep 11;17(1):1–15. doi: 10.1093/biostatistics/kxv033

An efficient resampling method for calibrating single and gene-based rare variant association analysis in case–control studies

Seunggeun Lee 1,*, Christian Fuchsberger 1, Sehee Kim 2, Laura Scott 3
PMCID: PMC4692986  PMID: 26363037

Abstract

For aggregation tests of genes or regions, the set of included variants often have small total minor allele counts (MACs), and this is particularly true when the most deleterious sets of variants are considered. When MAC is low, commonly used asymptotic tests are not well calibrated for binary phenotypes and can have conservative or anti-conservative results and potential power loss. Empirical Inline graphic-values obtained via resampling methods are computationally costly for highly significant Inline graphic-values and the results can be conservative due to the discrete nature of resampling tests. Based on the observation that only the individuals containing minor alleles contribute to the score statistics, we develop an efficient resampling method for single and multiple variant score-based tests that can adjust for covariates. Our method can improve computational efficiency Inline graphic1000-fold over conventional resampling for low MAC variant sets. We ameliorate the conservativeness of results through the use of mid-Inline graphic-values. Using the estimated minimum achievable Inline graphic-value for each test, we calibrate QQ plots and provide an effective number of tests. In analysis of a case–control study with deep exome sequence, we demonstrate that our methods are both well calibrated and also reduce computation time significantly compared with resampling methods.

Keywords: Rare variants, Next generation sequencing, Resampling methods

1. Introduction

Recent advances in sequencing technologies have made it possible to investigate the role of rare variants in complex diseases, and numerous statistical methods have been developed to identify rare variant associations. Many of the currently popular gene- or region-based multiple variants tests are based on individual variant score statistics, which provide rapid computation and natural adjustment for covariates (Lee and others, 2014). For example, variance component tests use the weighted sum of squared individual variant score statistics as in C-alpha (Neale and others, 2011), SSU (Pan, 2009), and SKAT (Wu and others, 2011). Many versions of burden tests (Li and Leal, 2008; Lin and Tang, 2011; Madsen and Browning, 2009) are essentially equivalent to collapsing the individual variant score statistics. Other examples include SKAT-O (Lee, Emond, and others, 2012; Lee, Wu, and others, 2012) and Fisher method (Derkach and others, 2012; Sun and others, 2013).

For a given gene or region, the number of variants tested together and the total of their minor allele counts (MACs) can vary due to the sequence or genotyping coverage of the gene, the class of variants tested, and the sample size. In the context of gene-based tests, we use MAC to refer to the total MAC of all variants in a tested set (i.e. the sum of the MAC of the rare, low, and common frequency variants in the set) and in the context of single variant tests, MAC refers to the single variant MAC.

In exome-sequencing studies, one approach (among many) is to test disruptive or predictively damaging variants (Zuk and others, 2014). These tend to be very rare and tests based on these variants often have sets of variants with very small total MACs Inline graphic. Asymptotic-based score tests for a single variant with small MAC, however, yield conservative results under a balanced case–control design, and anti-conservative results under an unbalanced case–control design (Ma and others, 2013). This lack of calibration can lead to lack of calibration in gene- or region-based asymptotic score tests. A moment-based adjustment (MA) was developed to improve the Type I error control when testing variant sets with low MAC; however, this approach is also based on the asymptotic properties of the tests (Lee, Emond, and others, 2012; Lee, Wu, and others, 2012) and may be less well calibrated, when testing for very low MAC variant sets. An alternative approach would be to perform experiment-wise permutation to control family wise error rate by obtaining the empirical distribution of asymptotic Inline graphic-values across variant sets (Kiezun and others, 2012). However, because the degree of miscalibration for asymptotic Inline graphic-values can vary by MAC, this approach may have reduced power to detect specific classes of causal multiple variant sets.

Resampling methods, such as permutation tests, do not rely on the asymptotic properties of the test (Efron and Tibshirani, 1994). Permutation tests for genetic data often permute case and control status without regard to differential odds of individual being a case based on covariates. This approach can result in the inflated Type I error rates in the presence of confounding covariates, such as population stratification (Epstein and others, 2012). In a more nuanced approach, permutations can be performed within strata of one or more covariates, such as geographical region, so the underlying null distribution provides a better match to the observed test statistic (Purcell and others, 2007). In the presence of continuous covariates, such as principal components which are used to adjust for population stratification, Fisher's noncentral hypergeometric distribution-based permutation can be performed, allowing for individuals to have different odds of being selected as a case (Efron and Tibshirani, 1994; Fog, 2008). The major limitation of the permutation approach is that disease status is permuted across all study participants, requiring significant computational cost, which increases as sample sizes become larger. Adaptive permutation procedures can reduce computational time for the estimation of large or moderate Inline graphic-values (Efron and Tibshirani, 1994), but substantial time is still required to estimate highly significant Inline graphic-values. In addition, permutation Inline graphic-values tend to be conservative for binary traits with small MAC, since test statistics are discrete (Lancaster, 1961).

In this paper, we develop an efficient resampling (ER) method for score statistic-based single and multiple variant tests that improves computational efficiency. Our method is based on the insight that only individuals with minor alleles (assuming the minor allele is coded as one) contribute to the score test. Instead of permuting case–control status across all individuals, resampling can be performed by resampling the case–control status of individuals with a minor allele at a given variant (for a single variant test), and similarly, individuals with minor alleles at any included variants (for a multiple variant test). Within the group of individuals with minor alleles, we allow for covariate adjustment through the use of Fisher's noncentral hyper-geometric distribution (Epstein and others, 2012; Fog, 2008). The computational time for the ER method increases as the MAC increases, so we developed a method for moderate to high variant set MAC Inline graphic in which quantiles of the test statistics are estimated through ER (based on a more limited number of permutations) and then used to better-calibrate our moment-matching approximation quantile adjustment (QA).

Furthermore, we develop statistical approaches to calibrate the discrete nature of test statistics. Using the ER method, we obtain mid-Inline graphic-values (Lancaster, 1961). We estimate the lower limit of Inline graphic-values for each variant set (minimum achievable Inline graphic-values (MAP)) (Kiezun and others, 2012), using the exact resampling distribution. We use the MAP to estimate the effective number of tests and to calibrate quantile–quantile (QQ) plots. Through simulation-based work and analysis of deep exome sequencing data, we demonstrate that the ER-based methods and calibration approaches are computationally efficient, control the false-positive rate (FPR) and can improve power.

2. Methods

2.1. Statistical model and rare variant tests

To understand currently used rare variant tests, suppose that Inline graphic subjects are sequenced with Inline graphic diseased individuals. The region being tested has Inline graphic variant loci. For the Inline graphicth subject, let Inline graphic denote a binary phenotype, Inline graphic the number of copies of the minor allele (Inline graphic), and Inline graphic the covariates. MAC is defined as the sum of all genotype values, Inline graphic. To relate genotypes to binary phenotypes, we posit the logistic regression model, Inline graphic, where Inline graphic is a disease probability, Inline graphic is the intercept, and Inline graphic and Inline graphic are regression coefficients of covariates and genetic variants, respectively. A score statistic from a marginal model of variant Inline graphic is

2.1. (2.1)

where Inline graphic is an estimate of Inline graphic under the null model Inline graphic. For single variant tests, Inline graphic is the score test statistic of variant Inline graphic and follows a (scaled) Inline graphic distribution with Inline graphic. Many popular gene- or region-based tests are also based on Inline graphic. For example, Burden and SKAT test statistics can be written as a weighted linear and quadratic sum of Inline graphic

2.1.

where Inline graphic is a weight for variant Inline graphic. SKAT-O combines Burden test and SKAT using the following framework as Inline graphic. Since the optimal Inline graphic is not known in prior, SKAT-O uses the minimum Inline graphic-values over a grid of Inline graphic as a test statistic.

2.2. ER method

In this section, we present the ER method for rare variant score tests with binary traits. We describe the generation of Inline graphic resamples to estimate the following four probabilities of the gene or region-based association test statistic Inline graphic, which is a function of Inline graphic (Inline graphic), given genotypes (Inline graphic), phenotypes (Inline graphic) and covariates (Inline graphic):

  1. ER Inline graphic-value: Inline graphic

  2. ER mid-Inline graphic-value: Inline graphic

  3. ER minimum achievable Inline graphic-value: Inline graphic

  4. ER minimum achievable mid-Inline graphic-value: Inline graphic

where Inline graphic is a test statistic from the original phenotype, and Inline graphic is the maximum of all possible permutation test statistics. Let Inline graphic (Inline graphic) be the number of individuals with minor alleles in the gene or region, Inline graphic, where Inline graphic is an indicator function. It is apparent that Inline graphic is smaller than or equal to MAC. From Equation (2.1), only individuals with minor alleles contribute to Inline graphic, since the remaining individuals have zero genotype values for all of their loci. This observation allows us to reduce the computation time by restricting resampling to the case–control status of those Inline graphic individuals only, rather than using all Inline graphic individuals. To estimate ER Inline graphic-values, we use a two-step approach that is based on the fact that Inline graphic-value can be factorized as

2.2.

where Inline graphic is the number of cases among Inline graphic individuals carrying a minor allele in the tested region.

Step 1 is to estimate Inline graphic. If there are no covariates to adjust for, Inline graphic follows the central-hypergeometric distribution. When there are covariates to adjust for, we use Fisher's noncentral hypergeometric distribution, which allows each individual to have different odds of being a case (Fog, 2008). Since estimating Inline graphic while allowing all individuals to have different odds is computationally challenging, we propose to stratify the Inline graphic individuals into groups based on Inline graphic and to assume an average common odds for all individuals within the same stratum. The Inline graphic individuals without variants are treated as a single group (Supplementary Appendix A). The only use of this stratification is to estimate Inline graphic for the Inline graphic individuals in Step 1. We used 10 strata for the Inline graphic individuals, for a total of 11 strata.

In Step 2, we estimate Inline graphic by generating Inline graphic permutations of the case–control status of Inline graphic individuals. Suppose Inline graphic is the Inline graphicth resample of Inline graphic given Inline graphic, and Inline graphic is the resulting test statistic Inline graphic. Examples of Inline graphic include the resampled Burden and SKAT test statistics, Inline graphic and Inline graphic. The probability for the Inline graphicth resample given Inline graphic, say Inline graphic, is also calculated using Fisher's noncentral hypergeometric distribution at the level of each individual in Inline graphic (rather than the level of strata as in Step 1). Then the estimator of Inline graphic is Inline graphic and the ER Inline graphic-value is

2.2.

The estimator of ER-mid Inline graphic-value is

2.2.

where the second term is an estimator of the tie probability. Suppose Inline graphic is the maximum of over all Inline graphic and Inline graphic (i.e. Inline graphic). Then, estimators of Inline graphic and Inline graphic are

2.2.

The detailed derivations of Steps 1 and 2 are given in Supplementary Appendix A.

The computational complexity of the proposed method is Inline graphic(Bmp) for SKAT and SKAT-O, and Inline graphic(Bm) for single variant and Burden tests, respectively. The computation complexity can be further reduced if the total number of configurations of case–control status (Inline graphic is small. For example, the total number of configurations of case–control status is 1024 when Inline graphic, indicating that we only need to evaluate 1024 possible configurations to obtain the exact resampling distribution. We note that we estimate MAPs when the exact resampling distribution is obtained (i.e. Inline graphic); otherwise, the MAP estimates are not accurate. Since the computational cost of ER increases as Inline graphic increases, it may not be practical to use ER for variant sets with moderate or large MAC. We develop ER-based QA moment matching (Supplementary Appendix B) for these variant sets, which produces more accurate Inline graphic-values than the moment matching adjustment and yet provides fast computation for moderate or large MAC variant sets.

Because Bonferroni correction and QQ plots assume that Inline graphic-values have a uniform distribution, they cannot correctly account for the fact that resampling Inline graphic-values have lower limits, i.e., the MAPs. Kiezun and others (2012) proposed a heuristic approach in which to first identify variant sets with Inline graphic, and to count only these variant sets as the effective number of tests. We developed an alternative statistical approach to estimate the effective number of test and calibrating QQ plots using MAP (Supplementary Appendix C).

2.3. Numerical simulations

We generated 10 000 sequence haplotypes for an Inline graphic250 kbps region using a coalescent simulator FTEC (Reppell and others, 2012) with a faster-than-exponential growth model. In order to make variant sets having wide-ranges of MAC, we randomly selected a regions ranging from 125 to 12 500 bps, and then generated genotypes of variant sets using the simulated haplotypes. Three different case–control ratios were considered (1000:1000, 500:1500, and 500:1500). The binary phenotypes were generated from the logistic regression model:

2.3. (2.2)

where Inline graphic is a genotype vector containing causal variants, Inline graphic is a vector of genetic effect coefficients, Inline graphic was a binary covariate of Bernoulli (0.5), and Inline graphic was a continuous covariate of Inline graphic. The intercept Inline graphic was chosen for the disease prevalence of 0.05. The non-genetic covariate coefficients Inline graphic and Inline graphic were 0 without covariates and 0.5 with covariates.

We applied five different methods to compute Inline graphic-values for each of the Burden, SKAT and SKAT-O tests: (i) ER with a Inline graphic-value (ER); (ii) ER with a mid-Inline graphic-value (ER-mid); (iii) QA moment matching; (iv) moment matching adjustment (MA); and (v) unadjusted (UA) asymptotic tests. To verify that ER and the whole-sample permutation methods produce essentially identical Inline graphic-values, we generated 20 000 variants sets and compared the Inline graphic-values from ER and the permutation methods with and without covariates by generating 10Inline graphic resamples (Supplementary Appendix E). We also compared computation times of SKAT-ER with whole-sample permutation for Inline graphic and total sample sizes ranging from 100 to 50 00 0 (Supplementary Appendix E).

To compare the FPR for different ranges of total MAC, we considered six total MAC bins: Inline graphic; Inline graphic; Inline graphic; Inline graphic; Inline graphic; and Inline graphic. For each bin, we used ranges of the number of variant sets Inline graphic to 20 000, corresponding to candidate gene studies to genome-wide studies. In addition to FPR simulations, we carried out simulations to evaluate the power of ER and other tests. Details of FPR and power simulations can be found in Supplementary Appendix E.

3. Results

3.1. Numerical simulations

We examine the FPR control, power, and computational time of two existing approaches, the MA and UA Inline graphic-value, and three newly developed ER-based methods, ER with Inline graphic-value (ER), ER with mid-Inline graphic-value (ER-mid), and the ER-based quantile adjustment (QA) for single variant and multiple variant tests across a range of MAC and case–control imbalance. For simulation-based data, we generated sequence haplotypes with a European demographic model that mimics the MAF spectrum and linkage-disequilibrium (LD) structure of the current European population (Reppell and others, 2012). The MAF spectrum of simulated haplotypes was similar to that observed for the GoT2D exome sequencing data (Supplementary Figure S1).

3.1.1. Comparison of p-values obtained using ER or whole-sample permutations

We compared SKAT Inline graphic-values for 20 000 variant sets with total Inline graphic using the ER method to those obtained from whole-sample-based permutation, either in the absence of covariates (permutation of case–control status) or in the presence of covariates (using Fishers noncentral hypergeometric distribution). The Inline graphiclog 10 Inline graphic-values were very highly correlated (Inline graphic) for tests with and without covariates, indicating that the ER-based results mirror those obtained from whole-sample-based permutation methods (Figure 1). We observed equally concordant Inline graphic-values for Burden and SKAT-O tests (data not shown).

Fig. 1.

Fig. 1.

Comparison of SKAT Inline graphic-values obtained using ER or whole-sample permutations. In the absence of covariates, SKAT Inline graphic-values were obtained through ER or whole-sample permutation (Perm) of disease status (top panel). In the presence of covariates, SKAT Inline graphic-value were obtained through ER or Fisher's noncentral hypergeometric distribution based whole-sample permutation (FNHPerm) implemented in the BiasedUrn R-package (bottom panel). From left to the right, the plots consider case:Inline graphic:1000, 500:1500, and 200:1800, respectively. The Inline graphic-axis represents Inline graphic SKAT-ER Inline graphic-values and Inline graphic-axis represents Inline graphic SKAT-Perm or SKAT-FNHPerm Inline graphic-values. Variant sets were randomly simulated, 20 000 sets with Inline graphic selected, and 10Inline graphic resamples were generated to compute Inline graphic-values for each method.

3.1.2. Comparison of computational times for the estimation of a significant gene-based p-value

To compare the computation times for a significant gene-based Inline graphic-value (0.05/20 000 genes), we generated 10Inline graphic resamples for each method for a single variant set. This allows us to estimate a Inline graphic with a standard error Inline graphic0.2 of Inline graphic. When 40 individuals have minor alleles (MAC equal or slightly higher than 40), SKAT-ER with no covariates ran in Inline graphic10 s and the computation times were invariant to sample size (100–50 000 samples). In contrast, for SKAT whole-sample permutations (SKAT-Perm), the computation time increased linearly with total sample size, from 0.35 to 10 h for 2000 and 50 000 samples, respectively (Figure 2(a)). With covariates, SKAT-ER also ran in Inline graphic10 s and was invariant to sample size, whereas SKAT Fisher's noncentral hypergeometric distribution-based whole-sample permutations (SKAT-FNHPerm) using the BiasedUrn R-package took Inline graphic10 h for 2000 samples (Figure 2(b)). The running times for SKAT-ER-mid were nearly identical to those for SKAT-ER (data not shown). In existing programs, 10Inline graphic resamples of 2000 (50 000) samples with no covariates took 6 min (3.6 h) for C-alpha in PLINK/SEQ (and substantially longer for SKAT), and with covariates, took 6.4 h (Inline graphic240 h) in SCORE-Seq using the offered set of 5 gene-based tests (Supplementary Table S1).

Fig. 2.

Fig. 2.

Comparison of computation times for the estimation of a significant gene-based Inline graphic-value using ER and existing methods. Estimated computation time for 10Inline graphic resamples of a single variant set for 40 individuals with minor alleles (Inline graphic) and varying numbers of total samples (balanced case:control) using SKAT-ER or SKAT-Perm in the absence of covariates (a) or using SKAT-ER or SKAT-FNHPerm in the presence of covariates (b). The BiasedUrn R-package was used for SKAT-FNHPerm. Estimated computation time for 10Inline graphic resamples of a single variant set for 2000 samples (balanced case:control) in the presence of covariates for SKAT-O, SKAT, or Burden test for Inline graphic individuals with minor alleles using ER (c) or for Inline graphic individuals with minor alleles using ER and QA (d). Each point represents a median of 10 experiments. When Inline graphic, the number of all possible configurations of the case–control status of individuals with minor alleles was smaller than 10Inline graphic; ER, therefore, obtained the exact resampling Inline graphic-values. The number of variant loci was 30 when Inline graphic, otherwise, it was the same as Inline graphic.

In contrast to the invariance by sample size, the computation time for ER increased with increasing number of individuals with minor alleles. For a single test with covariates, when the number of individuals with minor alleles Inline graphic, 40, 100, and 500, SKAT-ER took 0.01, 10, 58, and 310 s; the burden test was faster and SKAT-O slower (Figures 2(c) and (d); Supplementary Table S2). When Inline graphic, computation took substantially less time because the total number of configurations of cases and controls among those Inline graphic individuals was Inline graphic. The increase in computation time with increasing Inline graphic led us to develop a substantially faster (Inline graphic6- to 18-fold) QA asymptotic method based on ER (QA) (Figure 2(d) and Supplementary Table S2). QA was essentially linear in Inline graphic and invariant to sample size (data not shown). For comparison, with covariates for Inline graphic and sample size of 2000, the existing MA method for Burden, SKAT and SKAT-O took Inline graphic0.2 s (and was invariant to Inline graphic), and UA for Burden, SKAT and SKAT-O took Inline graphic0.02 s (and was invariant to Inline graphic) (data not shown).

3.1.3. FPRs for existing and ER-based methods

We compared empirical FPRs for variant sets for these five methods. We define the best-calibrated test as the one that had the FPR closest to but, at most, slightly exceeding the expected FPR at the Bonferroni corrected level Inline graphic. Figure 3 shows the FPRs for SKAT in the presence of covariates using Bonferroni corrected Inline graphic for 5–20 000 sets of variants and Inline graphic. Over the MAC and case–control imbalance scenarios, ER-mid had the best-calibrated FPRs, though it was conservative when Inline graphic for balanced case–control studies. ER was slightly more conservative than ER-mid when Inline graphic, but otherwise behaved similarly. QA was designed to speed the computation for moderate or large MAC. For MAC between 10 and 40 QA was conservative for balanced studies, and slightly anti-conservative for imbalanced studies. MA had conservative or anti-conservative FPRs depending on the scenario, and UA was both the most conservative for balanced studies at Inline graphic, and the most anticonservative for imbalanced studies. We observed similar trends for the Burden test (Supplementary Figure S2) and SKAT-O (Supplementary Figure S3).

Fig. 3.

Fig. 3.

False positive rates (FPRs) for SKAT using ER-based and existing methods to compute Inline graphic-values for variant sets with Inline graphic. From top to bottom the plots show variant sets with Inline graphic; Inline graphic and Inline graphic. From left to the right, the plots consider case:Inline graphic:1000, 500:1500, and 200:1800. In each plot, the Inline graphic-axis is the number of variant sets (Inline graphic) and their corresponding Bonferroni corrected level Inline graphic, and the Inline graphic-axis is the empirical FPRs divided by the expected FPR. A well-calibrated test should have empirical/expected Inline graphic (gray dashed line).

ER-mid based Inline graphic-values are conservative for variant sets with Inline graphic because many of the variant sets cannot reach Bonferroni-corrected thresholds. To improve the calibration of ER-mid, we used a mixture model (Supplementary Appendix C) to estimate the effective number of tests (Inline graphic) defined as the number of independent tests that yields the expected Bonferroni corrected FPR (Figure 4). For SKAT-ER-mid, when Inline graphic, Inline graphic was substantially smaller than the number of variant sets, especially for balanced studies. The Inline graphic-based Bonferroni correction had a slightly anti-conservative FPR for balanced case–control samples but well-calibrated FPRs for imbalanced case–control samples. The computation time for the Inline graphic-based multiple test adjustment are essentially the sum of the computation time to test each variant set, as fitting the mixture model requires little additional computation. We observed similar patterns of results for Burden test (Supplementary Figure S4) and SKAT-O (Supplementary Figure S5).

Fig. 4.

Fig. 4.

Estimated effective number of tests (Inline graphic) and FPRs for SKAT-ER-mid for variant sets with Inline graphic. Variant sets with Inline graphic (top row) and Inline graphic (bottom row) are shown. From left to the right, the plots consider case:Inline graphic:1000, 500:1500, and 200:1800. In each plot, the top panel shows a bar plot of the estimated effective number of tests (Inline graphic) divided by the number of variant sets (Inline graphic), and the bottom panel shows the empirical false positive rate (FPR) divided by the expected FPR of SKAT-ER-mid based on Inline graphic (square) or Inline graphic (circle). A well-calibrated test should have empirical/expected Inline graphic (black dashed line). The Inline graphic-axis shows the number of variant sets (Inline graphic).

Next, we examined the FPRs for sets of variants with Inline graphic in the presence of covariates. SKAT-ER-mid was generally well calibrated, although it was slightly conservative or anti-conservative at Inline graphic (Supplementary Figure S6). SKAT-QA was slightly conservative for balanced studies and slightly anti-conservative for studies with case–control imbalance. SKAT-MA was well calibrated or slightly anti-conservative for balanced studies, and was anti-conservative for imbalanced studies. SKAT-UA was not well calibrated in any of these scenarios. For Burden tests, all methods had close to the expected FPRs for balanced studies and Burden-QA was best calibrated for unbalanced studies (Supplementary Figure S7). We observed similar patterns of results for SKAT-O (Supplementary Figure S8).

Overall, the results were quantitatively the same in the absence of covariates or when, instead of testing a set of variants, we tested single variants (a test which very similar to a Burden test with equal weights for all variants) (data not shown). To test for the robustness of our methods in the presence of population stratification, we simulated African American and European ancestry samples with a differential disease risk and adjusted for stratification in the analysis. The Type 1 error rates (Supplementary Appendix F and Supplementary Figures S9–S11) were quantitatively similar to those in Figure 3 and Supplementary Figures S2, S3 for European ancestry only.

Over a range of MAC and case–control ratios, no approach yielded an optimal mix of control of FPR and efficient computation. Based on our findings, we propose an ER-based hybrid approach (ER-mid when variant set Inline graphic; MA when variant set Inline graphic and balanced case–control; and QA when variant set Inline graphic and imbalanced case–control) to provide a balance of well-calibrated FPRs and computation time.

3.1.4. Comparison of power to identify associations between low MAC variant sets and binary phenotypes

We next compared power for the ER-based hybrid approach using either experiment-wide permutations of the total sample or the effective number of tests (Inline graphic) based Bonferroni correction, and power for the MA or UA tests using experiment-wide permutations. We estimated the power to detect one causal variant set Inline graphic out of a background of 19 999 non-causal variant sets with the MAC distribution of Inline graphic damaging variants observed in NHLBI ESP data (Supplementary Appendix D and Table 1). Our causal variant set had 50% causal variants, either all increasing risk or with half the variants increasing and half decreasing risk. Over the different gene-based tests approaches and varying case control ratios, we observed similar power for ER-based hybrid approach using experiment wide permutations or Inline graphic-based Bonferroni correction (Supplementary Figure S12). For SKAT and SKAT-O, the ER-based hybrid approach had higher power than MA or UA. For the burden test, MA or UA had similar or slightly higher power to the ER-hybrid approach, but neither test was consistently higher power. We observed similar trends for causal Inline graphic (Supplementary Figure S13).

Table 1.

Number of genes by MAC of selected variants in NHLBI-ESP whole-exome data and in chromosome 2 GoT2D-exome data

Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Total
NHLBI ESP
Disruptive 7261 (62%) 1425 (12%) 1313 (11%) 1306 (11%) 485 (4%) 11 790
Inline graphic damaging 4250 (25%) 2636 (15%) 3135 (18%) 4034 (23%) 3185 (18%) 17 240
All nonsynonymous 1699 (9%) 1579 (9%) 2568 (14%) 4791 (27%) 7371 (41%) 18 008
GoT2D Chr2
Disruptive 312 (92%) 17 (5%) 5 (1%) 6 (2%) 0 (0%) 340
Inline graphic damaging 481 (46%) 174 (17%) 186 (18%) 161 (15%) 37 (4%) 1039
All nonsynonymous 284 (26%) 165 (15%) 208 (19%) 330 (30%) 123 (11%) 1110

Each cell has the number (percent) of genes in each MAC bin for genes with Inline graphic1 variant. “Total” indicates the total number of genes with Inline graphic1 variant. Nonsense, splicing, and frame-shift variants are classified as “disruptive” variants, and possibly and probably damaging variants by Polyphen2 and disruptive variants together are classified as “Inline graphic damaging” variants.

3.2. GoT2D data analysis

We performed single and multiple variant tests using GoT2D chromosome 2 deep exome sequence data (1326 cases and 1331 controls) (Supplementary Appendix G). 35 576 (84%) of 42 045 chromosome 2 variants had Inline graphic (corresponding Inline graphic). For single variant tests of Inline graphic variants, the estimated effective number of tests (Inline graphic) was 2762, giving an order of magnitude less stringent threshold than the family-wise error rate 0.05. No variants were significant at Inline graphic-based Bonferroni-corrected Inline graphic. The unadjusted QQ plot for single variant results showed a substantial Inline graphic-value deflation compared with the expected Inline graphic-value (Figure 5(a)); though the deflation was less pronounced when testing was restricted to variants with Inline graphic (Figure 5(b)). In contrast, in QQ plots based on a mixture model of the minimum achievable Inline graphic-values, no Inline graphic-value deflation was observed (Figures 5(a) and (b)).

Fig. 5.

Fig. 5.

MAP-adjusted and un-adjusted QQ plots of single variant and SKAT-ER-hybrid Inline graphic-values from analysis of GoT2D chromosome 2 exome data. QQ plots of single variant tests with all rare variants Inline graphic (a) and rare variants with Inline graphic (b). QQ plots of ER-hybrid SKAT Inline graphic-values with disruptive variants (c) and Inline graphic damaging variants (d). In each plot, the Inline graphic-axis is the MAP-adjusted or un-adjusted expected quantile of Inline graphiclogInline graphic Inline graphic-values, and the Inline graphic-axis is observed quantiles of Inline graphiclogInline graphic Inline graphic-values. Observed Inline graphic-values are plotted against the MAP-adjusted expected quantiles (black dots) and un-adjusted expected quantiles (gray dots). The dashed line represents a 95% confidence band based on 500 random draws from the MAP-based mixture distribution.

In the chromosome 2 GoT2D data, 334 of 340 (98%) genes with at least one disruptive variant had Inline graphic, and 841 of 1039 (81%) genes with at least one Inline graphic damaging variant had Inline graphic (Table 1). Even in the whole-exome data from the larger NHLBI-ESP sample, 85% and 58% of genes with at least one disruptive or Inline graphic disruptive variant, respectively, had Inline graphic (Supplementary Appendix D and Table 1). We used SKAT-ER-hybrid to perform gene-based tests for disruptive and Inline graphic damaging variants (Inline graphic and 540, respectively) in the chromosome 2 GoT2D exome data. No gene was significant at the Inline graphic-based Bonferroni corrected Inline graphic. In unadjusted QQ plots, we observed deflation of the gene-based Inline graphic-values, whereas in MAP adjusted QQ plots the Inline graphic-values were not deflated and results for disruptive variants were near the upper 95% confidence bound (Figures 5(c) and (d)). We observed similar results for ER-hybrid Burden and SKAT-O tests (Supplementary Figures S14 and S15).

Within the Inline graphic damaging variant tests, YSK4 Sps1/Ste20-related kinase homolog (YSK4) was the most significant gene for the Burden-ER-mid test (Inline graphic, Inline graphic) and the second most significant gene for SKAT-O-ER-mid (Inline graphic). Recent large-scale meta-analysis has shown that a common variant in YSK4 is associated with fasting insulin (Scott and others, 2012).

To assess the ER method using dosage data, we compared the results of ER and whole-sample permutations for variant set-based testing using dosage data from non-exomal GOT2D low-pass sequencing and found very similar Inline graphic-values (Supplementary Appendix H and Supplementary Figure 16).

4. Discussion

In this paper, we develop an ER method for binary traits for score statistic-based tests of variant sets with low MAC that allows inclusion of covariates in analysis. The ER methods are necessary because the existing asymptotic (UA) or asymptotic-based adjustment methods (MA) have poor calibration of FPRs at lower MAC and imbalanced case control ratios. As in whole-sample permutations, the ER method preserves the correlation structure or LD among variants in the tested set. Across almost all tested MAC bins and case–control ratios, we found that one or more of the ER-based methods were well calibrated. Based on these observations and the computational time considerations, we recommend a hybrid approach using ER-mid for small variant set MAC Inline graphic; MA for moderate or large variant set MAC with balanced case–control and QA for moderate or large variant set MAC with unbalanced case–control. Use of a threshold of Inline graphic is a practical compromise between computational time and Type 1 error rate; a slightly lower threshold would result in faster computation time but at the risk of slightly higher Type 1 error rate, particularly for the SKAT and SKAT-O. If a permutation approach is desired, then ER-mid is (substantially) faster than whole-sample permutations even for large MAC.

Estimation of the effective number of tests, Inline graphic, using MAP is a simple and fast alternative to performing experiment-wise permutation of the total sample to control the family-wise error rate. One limitation of the MAP approach is that it cannot account for correlations among tests, and may result in conservative FPRs in the presence of the strong correlations of variants between genes. However, we expect that gene-based tests will be less correlated than single variant tests, since they involve multiple variants and genes located further away from each other than individual variants.

When MAC is extremely small, MAP is unlikely to reach genome-wide significance. One approach to increase power would be to construct larger sets by combining adjacent regions or including more classes of potentially functional variants.

The ER method can be used for imputed dosage, as well as genotype data; permutations are performed within the individuals with non-zero genotype or dosage values. If many individuals have very small dosage values (e.g. Inline graphic0.1), the number of individuals with minor alleles can be larger than MAC (i.e. Inline graphic). Thus, for the same MAC, computational time can be higher with dosage data than with genotype data; however, the ER method still takes substantially less time than whole-sample permutation method.

QQ plots comparing observed vs. expected Inline graphic-value distributions are used in genetic association studies to assess both the presence of confounding (or misimplimented/misspecified test) and the presence of significant association signals. However, when MAC is small, the expected Inline graphic-value distribution of the resampling-based test is not uniform (0,1), and hence the (unadjusted) QQ plot cannot be used to accurately assess the concordance (or departure) of the observed Inline graphic-value distribution from the expected. In the spirit of experiment wide permutations (Kiezun and others, 2012), we use the MAP-adjusted Inline graphic-value distribution to model the expected distribution of ER-hybrid Inline graphic-values. In the MAP-adjusted QQ plot, the GoT2D gene-based Inline graphic-value distribution for disruptive variants lies near the top of the 95% confidence band. This view allows better assessment of potentially interesting results than the unadjusted QQ plot in which the Inline graphic-value distribution is deflated.

Most of variant sets in whole-exome or whole-genome data will not require 10Inline graphic resampling since their Inline graphic-values will be substantially higher than exome-wide (or genome-wide) significant levels. Hence, an adaptive resampling procedure, which reduces the number of resamples when a test has a moderate or large Inline graphic-value, can substantially reduce computation time and has been implemented for the ER method. However, the use of adaptive resampling precludes the calculation of the effective number of test and the use of MAP-adjusted QQ plots, and thus we recommend the adaptive resampling procedure only for the case where case–control combinations among individuals with minor alleles are substantially larger than the number of resamples performed (for example, Inline graphic for 10Inline graphic resamples).

Our work has focused on providing well-calibrated gene-based tests for single studies across a range of MAC and case–control imbalance. Meta-analysis of gene-based tests can increase the power to detect genes of interest, but meta-analysis is sensitive to the calibration of the underlying tests (Ma and others, 2013), and may be particularly sensitive to the inclusion of studies with highly imbalanced case–control ratios. Further work will be needed to determine how best to combine results or data from across studies with a variety of case–control ratios.

5. Software

ER-mid, ER, QA, and MA methods are implemented in the SKAT R-package.

Supplementary material

Supplementary Material is available at http://biostatistics.oxfordjournals.org.

Funding

This work was supported by grants R00 HL113264 (S.L.), the Austrian Science Fund (F.W.F.) grant J-3401 (C.F.), R01 HG000376 and RC2 DK088389 (L.S.).

Supplementary Material

Supplementary Data

Acknowledgments

We thank investigators of GoT2D project for access to the chromosome 2 exome sequence data. We also thank M. Boehnke for discussion and insightful comments and Phoenix Kwan for her initial insights into the behavior of gene-based tests in the GOT2D data. Conflict of Interest: None declared.

References

  1. Derkach A., Lawless J. F., Sun L. (2012). Robust and powerful tests for rare variants using Fisher's method to combine evidence of association from two or more complementary tests. Genetic Epidemiology 37, 110–121. [DOI] [PubMed] [Google Scholar]
  2. Efron B., Tibshirani R. J. (1994) An Introduction to the Bootstrap. CRC press. [Google Scholar]
  3. Epstein M. P., Duncan R., Jiang Y., Conneely K. N., Allen A. S., Satten G. A. (2012). A permutation procedure to correct for confounders in case–control studies, including tests of rare variation. American journal of human genetics 91, 215–223. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Fog A. (2008). Calculation methods for Wallenius' noncentral hypergeometric distribution. Communications in Statistics—Simulation and Computation 37, 258–273. [Google Scholar]
  5. Kiezun A., Garimella K., Do R., Stitziel N. O., Neale B. M., McLaren P. J., Gupta N., Sklar P., Sullivan P. F., Moran J. L. (2012). Exome sequencing and the genetic basis of complex traits. Nature Genetics 44, 623–630. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Lancaster H. (1961). Significance tests in discrete distributions. Journal of the American Statistical Association 56, 223–234. [Google Scholar]
  7. Lee S., Abecasis G. R., Boehnke M., Lin X. (2014). Rare-variant association analysis: study designs and statistical tests. American Journal of Human Genetics 95, 5–23. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Lee S., Emond M. J., Bamshad M. J., Barnes K. C., Rieder M. J., Nickerson D. A., Christiani D. C., Wurfel M. M., Lin X. (2012). Optimal unified approach for rare-variant association testing with application to small-sample case-control whole-exome sequencing studies. American Journal of Human Genetics 91, 224–237. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Lee S., Wu M. C., Lin X. (2012). Optimal tests for rare variant effects in sequencing association studies. Biostatistics 13, 762–775. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Li B., Leal S. M. (2008). Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. American Journal of Human Genetics 83, 311–321. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Lin D. Y., Tang Z. Z. (2011). A general framework for detecting disease associations with rare variants in sequencing studies. American Journal of Human Genetics 89, 354–367. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Ma C., Blackwell T., Boehnke M., Scott L. J. (2013). Recommended joint and meta-analysis strategies for case–control association testing of single low-count variants. Genetic Epidemiology 37, 539–550. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Madsen B. E., Browning S. R. (2009). A groupwise association test for rare mutations using a weighted sum statistic. PLoS Genetics 5, e1000384. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Neale B. M., Rivas M. A., Voight B. F., Altshuler D., Devlin B., Orho-Melander M., Kathiresan S., Purcell S. M., Roeder K., Daly M. J. (2011). Testing for an unusual distribution of rare variants. PLoS Genetics 7, e1001322. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Pan W. (2009). Asymptotic tests of association with multiple SNPs in linkage disequilibrium. Genetic Epidemiology 33, 497–507. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Purcell S., Neale B., Todd-Brown K., Thomas L., Ferreira M. A., Bender D., Maller J., Sklar P., De Bakker P. I., Daly M. J. (2007). PLINK: a tool set for whole-genome association and population-based linkage analyses. American Journal of Human Genetics 81, 559–575. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Reppell M., Boehnke M., Zöllner S. (2012). FTEC: a coalescent simulator for modeling faster than exponential growth. Bioinformatics 28, 1282–1283. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Scott R. A., Lagou V., Welch R. P., Wheeler E., Montasser M. E., Luan J. A., Gustafsson S. (2012). Large-scale association analyses identify new loci influencing glycemic traits and provide insight into the underlying biological pathways. Nature Genetics 44, 991–1005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Sun J., Zheng Y., Hsu L. (2013). A Unified Mixed-Effects Model for Rare-Variant Association in Sequencing Studies. Genetic Epidemiology 37, 334–344. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Wu M. C., Lee S., Cai T., Li Y., Boehnke M. C., Lin X. (2011). Rare variant association testing for sequencing data wsing the sequence kernel association test (SKAT). American Journal of Human Genetics 89, 82–93. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Zuk O., Schaffner S. F., Samocha K., Do R., Hechter E., Kathiresan S., Daly M. J., Neale B. M., Sunyaev S. R., Lander E. S. (2014). Searching for missing heritability: designing rare variant association studies. Proceedings of the National Academy of Sciences 111, E455–E464. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Data

Articles from Biostatistics (Oxford, England) are provided here courtesy of Oxford University Press

RESOURCES