Summary
Gene-based burden tests are a popular and powerful approach for analysis of exome-wide association studies. These approaches combine sets of variants within a gene into a single burden score that is then tested for association. Typically, a range of burden scores are calculated and tested across a range of annotation classes and frequency bins. Correlation between these tests can complicate the multiple testing correction and hamper interpretation of the results. We introduce a method called the sparse burden association test (SBAT) that tests the joint set of burden scores under the assumption that causal burden scores act in the same effect direction. The method simultaneously assesses the significance of the model fit and selects the set of burden scores that best explain the association at the same time. Using simulated data, we show that the method is well calibrated and highlight scenarios where the test outperforms existing gene-based tests. We apply the method to 73 quantitative traits from the UK Biobank, showing that SBAT is a valuable additional gene-based test when combined with other existing approaches. This test is implemented in the REGENIE software.
Keywords: gene-based tests, association studies, exome sequencing
Gene-based burden tests are commonly used in exome-wide association studies. We introduce SBAT (sparse burden association test), which jointly models a set of burden scores under the assumption that the causal burden scores act in the same effect direction. We apply SBAT to 73 quantitative traits in the UK Biobank.
Introduction
Large-scale exome-sequencing studies are being conducted to elucidate the genetic basis of diseases and traits and discover novel drug targets.1,2,3 These studies enable association testing of rare coding variation not easily accessible via genome-wide association studies (GWASs) using genotype microarrays followed by imputation from publicly available reference panels. Typically, these studies will carry out statistical tests of the combined effect across many single variants in each gene with a trait of interest.
Possibly the simplest approach involves collapsing a subset of single variants in a gene into a single marker of gene perturbation or burden.4 For example, the set of predicted loss-of-function (pLoF) variants below 0.1% alternate allele frequency (AAF) could be combined by scoring an individual as 1 if they have at least one minor allele across the set, and 0 otherwise. This burden score can then be tested for association in the same way as a SNP (single-nucleotide polymorphisms). Alternatives include a weighted sum of variants, with weights dependent upon the AAF of variants,5 or a set of burden scores across a range of frequency thresholds, with significance assigned using permutation.6
Burden tests tend to have most power when all collapsed variants are causal and when causal variants alter gene function in the same effect direction. When these conditions are not met, variance components tests—such as the SKAT test7—that allow variants to have different effect directions can have more power. Alternatively, combining single variant tests, via the Cauchy combination test8 (called ACAT-V), into a single p value can be particularly powerful when there are only a small number of causal variants.9 Tests that combine across burden, variance component, and single-variant tests have also been proposed and exhibit good power.9,10,11 Many of these methods have been implemented in the REGENIE software, which provides a flexible approach to defining and adjusting how variants are annotated according to function and allele frequency.12
As the true set of causal variants is unknown, it is common to calculate many burden scores across a range of AAF thresholds and different variant annotation classes.2,13 For example, Backman et al.2 considered two annotation classes and five AAF thresholds: a strict burden of pLoFs and a more permissive burden of pLoFs with predicted deleterious missense variants were assigned into overlapping groups with AAF ≤ 1%, AAF ≤ 0.1%, AAF ≤ 0.01%, AAF ≤ 0.001%, and singletons only. This approach tends to produce highly structured and often highly correlated sets of tests, which can make the interpretation and accurate multiple testing correction difficult.
In this paper we propose an approach that circumvents these difficulties whereby a set of burden scores are jointly tested for association. We focus on quantitative traits and leave the extension to binary traits for future work. As the set of nested burden scores tend to be positively correlated and exhibit the same direction of association with the trait of interest, this suggests a prior distribution on the effect direction of the burden scores, which we enforce through fitting the joint model of burden scores using a non-negative least squares (NNLS) approach. We propose a quadratic form test statistic based on the NNLS model fit and show that it has a null distribution that is a mixture of chi-square distributions (not to be confused with a weighted sum of chi-squares) with mixture weights that depend on the covariance between the burden scores. We also develop a computationally efficient method for calculating p values from this null distribution. NNLS is known to induce sparsity in its solution and has the added benefit of providing a form of model selection across the set of burden scores, which can aid interpretation of which frequency bins and annotation classes harbor the causal variants. In this sense the method is rather unique in simultaneously achieving both model inference, via well controlled type I error for the test, and model selection, with many of the burden scores excluded from the final model fit. We call this test the sparse burden association test (SBAT).
Using simulation studies, we show that the SBAT has well calibrated type I error and highlight some scenarios where the SBAT has improved power over existing tests. We further show the performance of the test when applied to 73 quantitative traits from the UK Biobank and illustrate how the sparse nature of the parameter estimation in the test can aid interpretation of the causal signal at a gene.
Methods
Sparse burden association test
In a sample of N individuals, let Y denote the N-length vector of a quantitative trait. Let G denote the N × L matrix, where each column contains the genotypes of L SNPs in a gene of interest. Further define X to be a matrix of N × P burden scores, where each column contains one of P distinct burden scores. Each burden score is derived as function of the SNP matrix X, by collapsing the subset of variants that fall within a AAF bin and an annotation class. Xij is the indicator variable that the ith individual carries at least one rare variant at any of the SNPs included in the jth subset. The set of AAF bins and annotation classes are user defined. We assume that covariates (including the intercept) are projected out from both Y and X. Covariates may include terms such as REGENIE step 1 predictors12 that are estimated in advance and account for polygenicity, population structure, and relatedness.
We consider estimating the joint effect of all P burden scores in a linear model with positivity constraints on the P-vector of regression parameters as follows:
| (Equation 1) |
We assume that is known or is estimated with enough precision so that this can be ignored, such as the case of large-scale exome-sequencing datasets. This is an NNLS problem and we use the active set method14 to fit the model. We use to denote the NNLS estimate and to denote the ordinary least squares (OLS) estimate of without the non-negativity constraint.
Writing the least squares objective function for Equation 1 can be re-expressed as
where and cross product terms vanish because of the OLS normal equation . Hence Equation 1 is equivalent to
| (Equation 2) |
Treatment of this model is explicitly addressed by literature on inference of linear models with constraints,15,16,17 and the model can be thought as a multivariate analog of the one-sided test.15 To test the null hypothesis versus the alternative (with at least one strict inequality), we use the quadratic form test statistic
That has a null distribution that is a mixture of chi-squared distributions15,16,17 (not to be confused with a weighted sum of chi-squares) as follows:
Where is a chi-squared distribution with i degrees of freedom, and are weights that sum to 1 and depend upon V as follows
| (Equation 3) |
over all subsets of of size i denoted by , and is the complement set. is the covariance matrix of , and is the covariance matrix of , conditional upon , , and is the probability that all the variables of a multivariate normal distribution with zero mean and covariance are positive. In the case where V is diagonal, the weights simplify to
In the general case the sum has terms to evaluate which can be a potential computational bottleneck for p > 10. For example, for p = 25 implies evaluating terms in the sum from Equation 3. Thus, we explored a simple approximation in which a random sample of K terms in each sum, denoted Ai, were used to estimate the full sum
| (Equation 4) |
As we do not know in advance whether the burden scores will be positively or negatively associated with the quantitative trait under study, we apply the SBAT twice to both Y and −Y. We then combine the two p values using the Cauchy combination method.8 To avoid numerical instability, we apply the QR decomposition to the matrix of burden scores and leave only linearly independent columns for model fitting and inference.
This approach, together with a general suite of burden, variance component, and ACAT (aggregated Cauchy association test) gene-based tests have been implemented in the REGENIE v.3.0 software (https://github.com/rgcgithub/regenie), and this was used for all simulation and real data analysis results included in this paper.
Annotations and AAF cutoffs for burden scores
In both simulations and analysis of the UK Biobank data, exome variants were grouped based on four annotations classes: (1) pLoF only (labeled M1), (2) pLoF or predicted deleterious missense based on 5/5 in silico algorithms (labeled M3), (3) pLoF or predicted deleterious missense based on 1 or more out of 5 in silico algorithms (labeled M4), and (4) pLoF or missense (labeled M2). We also considered several AAF cutoffs when aggregating variants: AAF ≤ 1%, AAF ≤ 0.1%, AAF ≤ 0.01%, AAF ≤ 0.001% (only for UK Biobank application), and singletons. Exome variants were annotated using SnpEff and assigned to genes based on Ensembl v.85 (most deleterious consequence across any transcript) as previously described.2
Combining different gene-based tests into a single p value per gene
We considered three gene-based tests—SBAT, SKAT-O10,18 (with the collapsing strategy for ultra-rare variants with minor allele count below 10), and ACAT-V9—and another test, BURDEN-ACAT, that aggregates evidence of association from multiple burden scores using the Cauchy combination method with uniform weights.8,9 For the SBAT and BURDEN-ACAT tests, input was a set of burden scores built by grouping exome variants using four annotation classes and five AAF cutoffs as described above. For the SKAT-O and ACAT-V tests, we applied a single AAF cutoff of 1% along with AAF-dependent variant weights and aggregated the resulting p values (across the four annotation classes) by the Cauchy combination method. That produced a single per-gene p value for each of the four gene-based tests. Finally, we applied the Cauchy combination method on the four p values to derive a single p value per gene, referred to as GENE_P (Figure S3).
Simulated data
To assess the type I error calibration of the SBAT, we simulated phenotypes based on real genetic data from the UK Biobank array19 and exome data2 as follows. We randomly selected 100,000 samples from the set of participants who self-identified as white British participants to include population structure in our simulations (about 30% of samples are related 3rd degree or closer). For the ith individual, quantitative phenotypes were simulated as
where we selected M = 25,000 SNPs on odd chromosomes from the UK Biobank genotyping array to be causal, excluding variants with minor allele count below 100 or involved in inter-chromosomal LD (linkage disequilibrium), and Gij is the standardized genotype value at the jth marker for the ith individual. The effects for the causal SNPs were sampled independently from a normal distribution with mean zero and the variance was chosen so that they explained 20% of the total phenotypic variance. The environmental effect was independently sampled from a normal distribution with mean zero and variance set to result in a phenotypic variance of one. We simulated 100 phenotypic replicates, obtained REGENIE step 1 predictors based on 472,435 array SNPs, and tested the phenotypes for association with 1,000 randomly selected genes on even chromosomes from whole-exome sequencing data to evaluate type I error.
We also used UK Biobank array and exome data as the basis for a set of simulations to assess the power of the SBAT compared to SKAT-O10 and ACAT-V9 tests. For the ith individual, quantitative phenotypes were simulated as
where we followed the same scheme as in the type I error simulations to simulate additive polygenic effects from array SNPs on odd chromosomes, but also added effects from causal genes (all selected on even chromosomes, where is the genotype value (non-standardized) for the lth exome variant in the kth causal gene for the ith individual, and the corresponding fixed effect on the phenotype. For each causal gene, we considered only variants that were annotated either pLoF or missense. The absolute effect sizes were |0.1log10(AAF)| for pLoF variants, and |0.01log10(AAF)| for missense variants. This reflected the assumption that functional variants which are more deleterious are likely have larger effect sizes on the phenotype and be rarer. For singleton variants, we set the effect size to , where Rsingleton ≥ 1 is a positive constant which enabled singletons to have more severe impact on the phenotype than that solely based on the AAF. We varied the proportion of causal variants among non-singleton variants between 10% and 100% and for singleton variants between 30% and 100%. For the direction of effects, we considered three settings where (1) all variants have positive effects, (2) 80% of variants have positive effects (and remaining 20% had effects in the opposite direction), and (3) 50% of variants have positive effects. In all scenarios, the polygenic effect from array SNPs was set to explain 20% of the phenotypic variance when there are no effects from the causal genes (i.e., all ) with the remaining 80% of the variance explained by the environmental effect. We simulated 100 phenotypic replicates which we tested for association with each of the 10 causal genes resulting in 1,000 p values and evaluated power in each scenario. Similarly to the type I error simulations, we also obtained REGENIE step 1 predictors based on 472,435 array SNPs before scanning for associations.
We performed an additional set of simulation experiments to evaluate the performance of SBAT for variable selection. Using the same trait generative model as for the power simulations, we adjusted the effect sizes of the causal variants so that each contributed equally to the phenotypic variance, with the combined effect accounting for 1% of the trait variance. Only pLoF variants were set to be causal and we considered three scenarios: (1) only singleton variants are causal, (2) singletons and variants with AAF below 0.01% are causal, (3) singletons and variants with AAF between 0.1% and 1% are causal. For the gene-based association testing with SBAT, we grouped variants into 16 burden scores using the same four annotation classes as described above and the AAF cutoffs of 1%, 0.1%, 0.01%, and singletons.
Analysis of UK Biobank data
We applied SBAT to real data analysis of 73 quantitative phenotypes in the UK Biobank with sample sizes ranging from 89,734 to 430,074 European participants and up to 18,184 genes on autosomal chromosomes; these included biomarkers as well as anthropometry outcomes (Table S1). The European subset of participants were identified using methods described in Backman et al.2 These phenotypes were derived from the phenotypes available through the UK Biobank Data Showcase on April 1, 2020. UKB individual-level genotypic and phenotypic data are available to approved investigators via the UKB study (https://www.ukbiobank.ac.uk/), accessed pursuant to approved application number 26041. Additional information about registration for access to the data is available at https://www.ukbiobank.ac.uk/register-apply/. UK Biobank participants provided informed consent. For traits measured across multiple visits, we computed the mean value across all visits for each participant and analyzed the resulting variable after applying a rank-based inverse-normal transformation. For each trait, we first performed a GWAS with imputed variants to identify common variants independently associated with the phenotype as described in Backman et al.2 Briefly, we tested common (AAF ≥ 1%) single variants imputed from TOPMed (Trans Omics for Precision Medicine) for association with the trait using REGENIE. We then ran GCTA-COJO20 joint model to identify independent signals (p ≤ 10−7) using 10,000 randomly selected individuals from the UK Biobank TOPMed dataset to estimate linkage disequilibrium. These independently associated variants were included as covariates in the set-based test analyses we performed on exome rare variants. For the gene-based association tests, we grouped variants into 20 burden scores using the same four annotation classes as described above and the AAF cutoffs of 1%, 0.1%, 0.01%, 0.001%, and singletons. For SKAT-O and ACAT-V association tests, we considered only a single AAF cutoff of 1%. Covariates included age, age-squared, sex, age-x-sex, the first ten principal components based on array data, the first twenty principal components derived from exome variants with a AAF between 2.6 × 10−5 and 1%, and six exome sequence batch indicator variables.
Results
Approximate SBAT p Values
To mitigate the computational burden of weight calculation for SBAT (Equation 3), we propose an approximation with a single parameter K that controls the size of the random sample of terms in the weight calculation (Equation 4). We evaluate the performance of our approximation using three values of K (1, 10, 100) (Figure 1). The accuracy of the weight approximation naturally improves with K increasing from 1 to 100, as highlighted for 12 burden scores of APOB in the UK Biobank (Figure 1A). We next expand the comparison to 1,965 genes on chromosome 1 in the UK Biobank tested for association with low-density lipoprotein (LDL) phenotype (Figures 1B and 1C). The exact and approximate p values are almost undistinguishable when K is large enough (R2 > 0.9999 for K = 100), but differences between p values are noticeable at very small values of K (R2 = 0.991 for K = 1). We recommend the value of K = 10 (default in REGENIE), a reasonable compromise between the accuracy of approximation and compute time. To further reduce the computational burden of SBAT, we develop an adaptive strategy, named SBAT-ADAPT, which initially uses K = 2 weights to estimate the p value, and if it is below a significance level of 0.001, we re-compute the p value using K = 10 weights. This strategy allows us to quickly evaluate p values for weaker signals and increase accuracy for stronger signals which we are more interested in. Figure S2 shows the high concordance in p values between this adaptive strategy and using K = 10 weights to calculate all the p values (R2 = 0.991). Furthermore, it led to a 40% reduction in computational time when performing a whole-exome association scan on a single phenotype compared to using K = 10 weights (Table S3). As REGENIE can analyze multiple phenotypes within the same analysis run,12 we propose another strategy, named SBAT-MTW, which is better suited for this context. More precisely, instead of evaluating K weights separately for each trait, we compute the weights for the first phenotype and re-use these across all phenotypes. Assuming that the missingness patterns across traits is similar (which is also assumed in the REGENIE multi-trait setting), the LD matrices for the burden scores obtained given each phenotype’s missingness patterns should be highly correlated and thus lead to similar weights. We evaluate this strategy in real-data applications where SBAT-MTW is applied to 50 phenotypes and gives p values highly concordant with those from SBAT (R2 = 0.997). Table S3 shows the timing reduction of SBAT-MTW relative to SBAT when applied to multiple phenotypes where it increased as more phenotypes are being analyzed in the same REGENIE run (CPU time reduction of 65% for the multi-trait run with 50 phenotypes).
Figure 1.
Validation of approximate SBAT p values with different values of parameter K = 1, 10, and 100
(A) Exact (red) and approximate (gray) weights for 12 burden scores of APOB in the UK Biobank are compared with 5 repetitions (gray curves).
(B and C) Scatterplots of exact and approximate SBAT p values at raw and negative log10 scale for 1,965 genes on chromosome 1 and LDL phenotype in the UK Biobank. The top associated genes on chromosome 1 are PCSK9 and ANGPTL3 with SBAT p values 1.5 × 10−262 and 2.3 × 10−42, respectively. The R2 values are reported for both raw and log10-transformed p values.
Type 1 error simulation for SBAT
The QQ-plot on the log scale comparing the SBAT p values (using ) to the expected values under the null hypothesis of no association based on 100,000 null simulations is shown on Figure 2. The empirical type I error rates for SBAT for significance levels , and 5 × 10−4 are presented in Table S2. These results confirmed that the SBAT has correct type I error control for quantitative traits.
Figure 2.
Quantile-quantile plot of association test p values for SBAT in simulation studies under the null hypothesis
The p values were obtained from testing 100 simulated phenotypes in 1,000 genes where variants were grouped by four functional annotations and four AAF cutoffs resulting in 16 burden scores being combined in SBAT (using K = 10).
Power simulation
We compared the power performance of SBAT with SKAT-O and ACAT-V under multiple simulation settings where we varied the proportion as well as the direction of causal signals based on their functional annotation. Figure 3 shows the empirical power performance of the tests across a range of 6 simulation configurations. The SBAT test had the highest power relative to SKAT-O and ACAT-V when the burden assumption of same direction of effects for all the variants in the gene was held and singleton variants had significantly higher impact on the phenotype than non-singleton variants when adjusting for AAF. However, the power performance of SBAT was diminished when only a fraction of the singleton variants was causal or when the variant effect sizes were solely dependent on the variant AAF given the functional annotation. We also found that when the increased effect of singleton variants was present, SBAT remained as powerful or more than SKAT-O and ACAT-V even when variant effects were not all in the same direction (80/20 +/−). SBAT benefited from aggregating individual burden masks generated across different AAF thresholds in a joint model that incorporates correlations across masks (Figures S1 and S7). We found SBAT had the lowest power performance in the settings that were the farthest from the burden assumption (50/50 +/−). In summary, SBAT performed better than SKAT-O and ACAT-V when the same effect direction assumption was met, and the variant weighting scheme deviated from that used by default in SKAT-O which solely depends on AAF.
Figure 3.
Power performance of SBAT in simulation studies under various causal scenarios
Average quantile for the set-based tests as a function of the proportion of causal variants among singletons, pLoF, and missense variants. Each row corresponds to different assumptions for the direction of effects whether they are all positive (100/0 +/−), 80% positive and 20% negative (80/20 +/−), or half positive and the remaining negative (50/50 +/−). Each column corresponds to different assumptions for the effect of singleton variants, which was multiplied by a constant factor Rsingleton relative to non-singletons. The error bars represent 95% confidence intervals for the average quantile.
Variable selection
We evaluated how often SBAT assigned non-zero effect sizes in its joint model over burden masks through three different simulation settings where we selected causal variants across different allele frequency bins. Across all our simulation settings (Figure S8), SBAT most often assigned non-zero effect sizes to masks that contained the highest proportion of causal variants. In addition, masks that contained many null variants had on average effect sizes estimates close to zero in the joint model fit from SBAT.
Application to UK Biobank
We analyzed whole-exome sequencing data on up to 18,184 genes for 73 quantitative phenotypes in up to 430,074 European participants from the UK Biobank using SBAT, BURDEN-ACAT, SKAT-O, and ACAT-V association tests and conditioning on common variant signals identified through GCTA-COJO. We adjusted for age, sex, and exome sequence batch variables as well as principal components derived from array and rare exome variants. We used a conservative genome-wide significance threshold of accounting for the 73 quantitative traits, 18,184 genes, and 4 association tests applied. The median runtime per trait was 279 CPU hours to analyze all 18,184 genes with SKAT-O, ACAT-V, BURDEN-ACAT, and SBAT on 20 burden scores conditioning on an average of 644 common variants (the highest compute timing was for sitting height at 538 CPU hours with 1,722 conditional variants).
Across the 73 traits, SBAT identified 1,339 genome-wide significant associations, compared to 1,274 for BURDEN-ACAT, 1,462 for SKAT-O, and 1,175 for ACAT-V. SBAT uniquely identified 115 signals, which was greater than the signals uniquely identified by BURDEN-ACAT and SKAT-O, 16 and 58, respectively, and lower than ACAT-V, which had 158 unique signals (Figure 4). Among the 115 signals, the top association was between STRN and mean platelet thrombocyte volume (SBAT p value = 5.4 × 10−13); previous studies have reported variant associations at this gene with the phenotype.21,22 When counting the number of times each test gives the lowest p value at 1,848 genome-wide significant associations, we found that SBAT performs best (38.9% signals) followed by SKAT-O (28.4%), ACAT-V (21.6%), and BURDEN-ACAT (11.1%) (Figure S5).
Figure 4.
Gene-based analysis of 73 quantitative phenotypes with 18,184 genes using the UK Biobank dataset
(A) Scatterplot of p values comparing SBAT with BURDEN-ACAT, SKAT-O, and ACAT-V. Each dot on the plot represents a gene result for a given trait where the y axis denotes to the p value (on −log10 scale) for SBAT and the x axis denotes the p value (on −log10 scale) for the comparison test. For BURDEN-ACAT, SKAT-O, and ACAT-V, we applied ACAT to the p values across the 4 annotation classes (and 5 AAF cutoffs for BURDEN-ACAT) to obtain a single p value per gene. The dashed lines represent a significance threshold of 9.4 × 10−9 corresponding to a Bonferroni correction for about 5 million tests based on 73 traits analyzed, 18,184 genes, and 4 methods applied. Green points refer to signals found only by SBAT, yellow points refer to signals detected by both tests, and purple points represent signals missed by SBAT but detected by the other test. p values were capped at 2.2 × 10−307.
(B) Upset plot of 1,848 genome-wide significant signals discovered across SBAT, BURDEN-ACAT, SKAT-O, and ACAT-V association tests using a significance threshold of 9.4 × 10−9.
We highlight two examples where using SBAT to jointly combine burden signals was the only method that led to a detectable signal compared to BURDEN-ACAT, SKAT-O, ACAT-V, or the marginal burden tests using various AAF cutoffs and functional annotations. In Figure 5, the association between standing height23,24 and PITX1, which encodes a protein critical in the development of the lower limbs,25,26 was discovered only by SBAT (p value = 2.3 × 10−11). The strongest signal among the individual burden scores, which came from considering a burden of singleton pLoF variants, did not reach the genome-wide significance threshold (p value = 3.8 × 10−8). Furthermore, both SKAT-O and ACAT-V gave p values above that of the strongest burden signal (smallest p values = 3.2 × 10−7 and 1.5 × 10−7, respectively). The joint model fitted with SBAT indicated that the gene signal was driven primarily by singleton pLoF variants as well as the combination of pLoF and missense variants predicted deleterious with an AAF below 0.1%, both associated with lower height. The second example was between platelet count and ETV6, which encodes a transcription factor27 (Figure 6). The association signal did not reach genome-wide significance threshold across the burden scores, where the smallest p value was from a burden of pLoF and missense variants with an AAF cutoff of 1% (p value = 2.6 × 10−8). Only by combining the burden scores signals through SBAT did the association reach the significance threshold (p value = 1.3 × 10−12). Indeed, combining the burden scores p values using the Cauchy combination method ACAT (BURDEN-ACAT) was not sufficient to reach the significance threshold (p value = 2.7 × 10−7). Effect size estimates from the SBAT joint model indicated the signal was driven by singleton variants as well as more common pLoF- and missense variants predicted to be deleterious.
Figure 5.
Deep dive into the association between PITX1 and standing height
(A) Summary of all gene-based tests performed for the gene grouping variants based on AAF as well as annotation class (M1, pLoF only; M2, pLoF or missense; M3, pLoF or predicted deleterious missense based on 5/5 in silico algorithms; M4, pLoF or predicted deleterious missense based on 1 or more out of 5 in silico algorithms) using SKAT-O, ACAT-V, Burden test from REGENIE, BURDEN-ACAT, and SBAT on the burden scores. The y axis denotes the p value (on −log10 scale) for each statistical test and grouping of variants.
(B) Effect size estimates for the burden scores grouped by AAF and annotation class based on a marginal model with each mask (MARGINAL), a joint model on all scores using ordinary least squares (OLS), or the SBAT joint model (SBAT). Only burden scores which were linearly independent are displayed.
Figure 6.
Deep dive into the association between ETV6 and platelet count
(A) Summary of all gene-based tests performed for the gene grouping variants based on AAF as well as annotation class (M1, pLoF only; M2, pLoF or missense; M3, pLoF or predicted deleterious missense based on 5/5 in silico algorithms; M4, pLoF or predicted deleterious missense based on 1 or more out of 5 in silico algorithms) using SKAT-O, ACAT-V, Burden test from REGENIE, BURDEN-ACAT, and SBAT on the burden scores. The y axis denotes the p value (on −log10 scale) for each statistical test and grouping of variants.
(B) Effect size estimates for the burden scores grouped by AAF and annotation class based on a marginal model with each mask (MARGINAL), a joint model on all scores using ordinary least squares (OLS), or the SBAT joint model (SBAT). Only burden scores which were linearly independent are displayed.
We also evaluated the performance of the unified test strategy, GENE_P, which combines results across SBAT, SKAT-O, ACAT-V, and BURDEN-ACAT using the Cauchy combination method. GENE_P finds the most signals (1,688) compared to the other methods (1,312 for SBAT, 1,447 for SKAT-O, 1,154 for ACAT-V, and 1,278 for BURDEN-ACAT) using a genome-wide significance threshold of 7.5 × 10−9 (Figure S4). While there are 138 gene/trait associations that are missed by GENE_P, they all correspond to signals near the significance threshold (less than one order of magnitude away). In addition, we carried out a leave-one-test-out evaluation to assess the added benefit of each test to all the other tests. Leaving out ACAT-V, SBAT, and SKAT-O led to a reduction in the number of associations of 102, 73, and 23, respectively. Leaving out BURDEN-ACAT led to an increase of 5 associations (Figure S6). This highlights the importance of including SBAT into the GENE-P approach, and how the simpler BURDEN-ACAT approach offers little advantage in terms of power.
Discussion
In this work, we introduced a gene-based association test that pools information across multiple burden scores using constrained linear regression. Our approach is well suited for gene-based testing: modeling burden scores jointly accounts for burden score correlations, while non-negative constraints induce sparsity in effect estimates. In simulations we showed that our SBAT method has well controlled type I error rates and outperforms SKAT-O and ACAT-V in scenarios where most causal variants have the same effect directions and rarer variants substantially contribute to the gene signal. We further confirmed our simulation results in the analysis of 73 quantitative phenotypes in the UK Biobank whole-exome sequencing data and showed that the four examined tests (SBAT, SKAT-O, ACAT-V, and BURDEN-ACAT) share most of the gene signals, complementing each other by signals that are test specific. SBAT notably revealed the highest number of unique associations (115) among the other two tests targeting the burden signals, SKAT-O (58) and BURDEN-ACAT (16). Following a common practice in designing gene-burden tests,9,10,11 we proposed a strategy to provide a single p value per gene by combining results from the four tests available at different variant annotation classes and allele frequency bins.
We highlighted two gene-phenotype SBAT associations in the UK Biobank analysis that were missed by alternative tests, SKAT-O, ACAT-V, and BURDEN-ACAT. The two associations PITX1-standing height and ETV6-platelet count are well established in the literature.23,24,27 Our results stress the role of pLoF singletons for both gene-phenotype pairs, where the burden scores with these variants show the lowest p values (3.8 × 10−8 and 2.6 × 10−8, respectively) but do not pass the significance threshold of our study (9.4 × 10−9). The SBAT joint model aggregates the evidence of association from multiple burden scores amplifying the signal (SBAT p values 2.3 × 10−11 and 1.3 × 10−12, respectively). These two SBAT findings agree with our simulation results, where SBAT outperforms other tests when the signal is concentrated at rarer variants with the same direction of effects. We also observe that SBAT gives more informative effect estimates in comparison to marginal and joint OLS estimates (Figures 5B and 6B), selecting the top burden scores of pLoF singleton variants and favoring missense burden scores with more deleterious annotations (M3 and M4 are preferred to M2).
Our study also has limitations. We used a particular set of nested annotations for burden scores from our previous works2,3,13 and considered only two main methods for comparison, SKAT-O and ACAT-V. While the list of selected annotations and methods is not exhaustive, we believe our simulation results served well to contrast the SBAT features against SKAT-O and ACAT-V. As data generation models can favor one method over the other in simulations, we empirically showed the competitive performance of SBAT in the exome-wide association study of 73 agnostically selected traits from 430,074 participants in the UK Biobank. The current implementation of SBAT is developed for quantitative traits, as the previous works on constrained linear regression were built for continuous response variable.15,16,17 Future work will be focused on SBAT extension to binary traits by following the score test statistic approach from SKAT12 and applying the saddle-point approximation (SPA) correction to handle tests of rare variants with unbalanced binary traits.
We routinely use nested burden scores as this allows researchers to evaluate the effect of gradually increasing the size of the set of variants, i.e., starting with the most deleterious pLoF variants, then adding in missense variants with decreasing deleteriousness scores. Another possibility would be to use non-overlapping sets of variants, but users would not then be able to see the effect of gradually increasing the variant set and could miss signals found only by combining sets together.
We envision that the constrained linear regression approach can also be applied to other association tests of multiple variables: joint testing of multiple traits28 and/or multiple SNPs.29 To enable these future extensions, we made an important practical contribution with approximate weight calculation in a mixture of chi-squared distributions for constrained linear regression, a problem not addressed in the previous theoretical works.15,16,17 The constrained test can also operate using association summary statistics and correlation matrix of tested variables rather than individual-level data. We derived this summary-statistics option (Equation 2) through reformulation of the constrained regression problem (Equation 1) in terms of joint effect sizes and the residual variance. The (unconstrained) joint effect estimates can be inferred from the marginal effect estimates using existing methods.20
In summary, we developed a scalable multivariate version of the one-sided test with application to testing multiple burden scores. The SBAT method is available in the REGENIE software and together with other gene-based tests will empower further discovery from sequencing association studies.
Data and code availability
All of the gene-based tests used in this paper are implemented in the open-source REGENIE software https://rgcgithub.github.io/regenie/.
Acknowledgments
This research was conducted with the UKB resource (project no. 26041). We are grateful to two reviewers who made constructive and insightful comments on the paper.
Declaration of interests
All authors are current employees and/or stockholders of Regeneron Pharmaceuticals.
Published: October 3, 2024
Footnotes
Supplemental information can be found online at https://doi.org/10.1016/j.ajhg.2024.08.021.
Web resources
REGENIE, https://github.com/rgcgithub/regenie
Supplemental information
References
- 1.Dewey F.E., Murray M.F., Overton J.D., Habegger L., Leader J.B., Fetterolf S.N., O’Dushlaine C., Van Hout C.V., Staples J., Gonzaga-Jauregui C., et al. Distribution and clinical impact of functional variants in 50,726 whole-exome sequences from the DiscovEHR study. Science. 2016;354 doi: 10.1126/science.aaf6814. [DOI] [PubMed] [Google Scholar]
- 2.Backman J.D., Li A.H., Marcketta A., Sun D., Mbatchou J., Kessler M.D., Benner C., Liu D., Locke A.E., Balasubramanian S., et al. Exome sequencing and analysis of 454,787 UK Biobank participants. Nature. 2021;599:628–634. doi: 10.1038/s41586-021-04103-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Akbari P., Gilani A., Sosina O., Kosmicki J.A., Khrimian L., Fang Y.Y., Persaud T., Garcia V., Sun D., Li A., et al. Sequencing of 640,000 exomes identifies GPR75 variants associated with protection from obesity. Science. 2021;373 doi: 10.1126/science.abf8683. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Li B., Leal S.M. Methods for Detecting Associations with Rare Variants for Common Diseases: Application to Analysis of Sequence Data. Am. J. Hum. Genet. 2008;83:311–321. doi: 10.1016/j.ajhg.2008.06.024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Madsen B.E., Browning S.R. A Groupwise Association Test for Rare Mutations Using a Weighted Sum Statistic. PLoS Genet. 2009;5 doi: 10.1371/journal.pgen.1000384. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Price A.L., Kryukov G.V., De Bakker P.I.W., Purcell S.M., Staples J., Wei L.-J., Sunyaev S.R. Pooled Association Tests for Rare Variants in Exon-Resequencing Studies. Am. J. Hum. Genet. 2010;86:832–838. doi: 10.1016/j.ajhg.2010.04.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Wu M.C., Lee S., Cai T., Li Y., Boehnke M., Lin X. Rare-variant association testing for sequencing data with the sequence kernel association test. Am. J. Hum. Genet. 2011;89:82–93. doi: 10.1016/j.ajhg.2011.05.029. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Liu Y., Xie J. Cauchy combination test: a powerful test with analytic p-value calculation under arbitrary dependency structures. J. Am. Stat. Assoc. 2020;115:393–402. doi: 10.1080/01621459.2018.1554485. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Liu Y., Chen S., Li Z., Morrison A.C., Boerwinkle E., Lin X. ACAT: A Fast and Powerful p Value Combination Method for Rare-Variant Analysis in Sequencing Studies. Am. J. Hum. Genet. 2019;104:410–421. doi: 10.1016/j.ajhg.2019.01.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Lee S., Wu M.C., Lin X. Optimal tests for rare variant effects in sequencing association studies. Biostatistics. 2012;13:762–775. doi: 10.1093/biostatistics/kxs014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Li X., Li Z., Zhou H., Gaynor S.M., Liu Y., Chen H., Sun R., Dey R., Arnett D.K., Aslibekyan S., et al. Dynamic incorporation of multiple in silico functional annotations empowers rare variant association analysis of large whole-genome sequencing studies at scale. Nat. Genet. 2020;52:969–983. doi: 10.1038/s41588-020-0676-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Mbatchou J., Barnard L., Backman J., Marcketta A., Kosmicki J.A., Ziyatdinov A., Benner C., O'Dushlaine C., Barber M., Boutkov B., et al. Computationally efficient whole-genome regression for quantitative and binary traits. Nat. Genet. 2021;53:1097–1103. doi: 10.1038/s41588-021-00870-7. [DOI] [PubMed] [Google Scholar]
- 13.Van Hout C.V., Tachmazidou I., Backman J.D., Hoffman J.D., Liu D., Pandey A.K., Gonzaga-Jauregui C., Khalid S., Ye B., Banerjee N., et al. Exome sequencing and characterization of 49,960 individuals in the UK Biobank. Nature. 2020;586:749–756. doi: 10.1038/s41586-020-2853-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Bro R., De Jong S. A fast non-negativity-constrained least squares algorithm. J. Chemom. 1997;11:393–401. [Google Scholar]
- 15.Kudo A. A Multivariate Analogue of the One-Sided Test. Biometrika. 1963;50:403–418. [Google Scholar]
- 16.Shapiro A. Towards a Unified Theory of Inequality Constrained Testing in Multivariate Analysis. Int. Stat. Rev./Rev. Int. Stat. 1988;56:49–62. [Google Scholar]
- 17.Gouriéroux C., Holly A., Monfort A. Likelihood Ratio Test, Wald Test, and Kuhn-Tucker Test in Linear Models with Inequality Constraints on the Regression Parameters. Econometrica. 1982;50:63–80. [Google Scholar]
- 18.Zhou W., Bi W., Zhao Z., Dey K.K., Jagadeesh K.A., Karczewski K.J., Daly M.J., Neale B.M., Lee S. SAIGE-GENE+ improves the efficiency and accuracy of set-based rare variant association tests. Nat. Genet. 2022;54:1466–1469. doi: 10.1038/s41588-022-01178-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Bycroft C., Freeman C., Petkova D., Band G., Elliott L.T., Sharp K., Motyer A., Vukcevic D., Delaneau O., O’Connell J., et al. The UK Biobank resource with deep phenotyping and genomic data. Nature. 2018;562:203–209. doi: 10.1038/s41586-018-0579-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Yang J., Ferreira T., Morris A.P., Medland S.E., Genetic Investigation of ANthropometric Traits GIANT Consortium. DIAbetes Genetics Replication And Meta-analysis DIAGRAM Consortium. Madden P.A.F., Heath A.C., Martin N.G., Montgomery G.W., et al. Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nat. Genet. 2012;44:369–S3. doi: 10.1038/ng.2213. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Vuckovic D., Bao E.L., Akbari P., Lareau C.A., Mousas A., Jiang T., Chen M.-H., Raffield L.M., Tardaguila M., Huffman J.E., et al. The Polygenic and Monogenic Basis of Blood Traits and Diseases. Cell. 2020;182:1214–1231.e11. doi: 10.1016/j.cell.2020.08.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Astle W.J., Elding H., Jiang T., Allen D., Ruklisa D., Mann A.L., Mead D., Bouman H., Riveros-Mckay F., Kostadima M.A., et al. The Allelic Landscape of Human Blood Cell Trait Variation and Links to Common Complex Disease. Cell. 2016;167:1415–1429.e19. doi: 10.1016/j.cell.2016.10.042. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Akiyama M., Ishigaki K., Sakaue S., Momozawa Y., Horikoshi M., Hirata M., Matsuda K., Ikegawa S., Takahashi A., Kanai M., et al. Characterizing rare and low-frequency height-associated variants in the Japanese population. Nat. Commun. 2019;10 doi: 10.1038/s41467-019-12276-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Berndt S.I., Gustafsson S., Mägi R., Ganna A., Wheeler E., Feitosa M.F., Justice A.E., Monda K.L., Croteau-Chonka D.C., Day F.R., et al. Genome-wide meta-analysis identifies 11 new loci for anthropometric traits and provides insights into genetic architecture. Nat. Genet. 2013;45:501–512. doi: 10.1038/ng.2606. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Gurnett C.A., Alaee F., Kruse L.M., Desruisseau D.M., Hecht J.T., Wise C.A., Bowcock A.M., Dobbs M.B. Asymmetric Lower-Limb Malformations in Individuals with Homeobox PITX1 Gene Mutation. Am. J. Hum. Genet. 2008;83:616–622. doi: 10.1016/j.ajhg.2008.10.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Szeto D.P., Rodriguez-Esteban C., Ryan A.K., O'Connell S.M., Liu F., Kioussi C., Gleiberman A.S., Izpisúa-Belmonte J.C., Rosenfeld M.G. Role of the Bicoid-related homeodomain factor Pitx1 in specifying hindlimb morphogenesis and pituitary development. Genes Dev. 1999;13:484–494. doi: 10.1101/gad.13.4.484. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Bohlander S.K. ETV6: A versatile player in leukemogenesis. Semin. Cancer Biol. 2005;15:162–174. doi: 10.1016/j.semcancer.2005.01.008. [DOI] [PubMed] [Google Scholar]
- 28.Julienne H., Laville V., Mccaw Z.R., He Z., Guillemot V., Lasry C., Ziyatdinov A., Nerin C., Vaysse A., Lechat P., et al. Multitrait GWAS to connect disease variants and biological mechanisms. PLoS Genet. 2021;17 doi: 10.1371/journal.pgen.1009713. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.De Leeuw C.A., Mooij J.M., Heskes T., Posthuma D. MAGMA: Generalized Gene-Set Analysis of GWAS Data. PLoS Comput. Biol. 2015;11 doi: 10.1371/journal.pcbi.1004219. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All of the gene-based tests used in this paper are implemented in the open-source REGENIE software https://rgcgithub.github.io/regenie/.






