Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2021 Sep 1.
Published in final edited form as: Genet Epidemiol. 2020 Jun 8;44(6):579–588. doi: 10.1002/gepi.22325

Power loss due to testing association between covariate adjusted traits and genetic variants

Pranav Yajnik 1, Michael Boehnke 1
PMCID: PMC7610149  NIHMSID: NIHMS1614707  PMID: 32511788

Abstract

Multiple linear regression is commonly used to test for association between genetic variants and continuous traits and estimate genetic effect sizes. Confounding variables are controlled for by including them as additional covariates. An alternative technique that is increasingly used is to regress out covariates from the raw trait and then perform regression analysis with only the genetic variants included as predictors. In the case of single-variant analysis, this adjusted trait regression (ATR) technique is known to be less powerful than the traditional technique when the genetic variant is correlated with the covariates We extend previous results for single-variant tests by deriving exact relationships between the single-variant score, Wald, likelihood-ratio, and F-test statistics and their ATR analogs. We also derive the asymptotic power of ATR analogs of the multiple-variant score and burden tests. We show that the maximum power loss of the ATR analog of the multiple-variant score test is completely characterized by the canonical correlations between the set of genetic variants and the set of covariates. Further, we show that for both single- and multiple-variant tests, the power loss for ATR analogs increases with increasing stringency of Type 1 error control (α) and increasing correlation (or canonical correlations) between the genetic variant (or multiple variants) and covariates. We recommend using ATR only when maximum canonical correlation between variants and covariates is low, as is typically true.

Keywords: adjusted outcome, power loss, covariates, linear regression, genome-wide association study

INTRODUCTION

Multiple linear regression and the associated ordinary least-squares and F-test methodologies are effective and widely used approaches to test for association between genetic variants and quantitative traits and to estimate genetic effect sizes while controlling for the effects of other variables (covariates). Covariates may be included to account for confounding (e.g. due to population structure or assay batch effects), to reduce trait variability and consequently increase power, or to exclude associations that are driven primarily through the action of the variants on an intermediate trait.

Current genome-wide association studies (GWAS) typically assay hundreds of thousands to millions of genetic variants. Single-variant association tests are performed separately on each variant to test whether the variant is associated with the trait. Multi-variant, gene-, or region-based tests are performed to address the omnibus hypothesis that one or more in a set of variants are associated with the trait. Since the dependent variable and covariates are typically the same across all tests, some analysts use a two-stage approach for quantitative trait GWAS (Randall et al., 2013; UK10K Consortium, 2015; Tachmazidou et al., 2017; Kanai et al., 2018; Styrkarsdottir et al., 2019; Niarchou et al., 2020 are some examples of studies employing this methodology). In the first stage, an ‘adjusted’ trait is obtained as the residuals from the regression of the trait on covariates. In the second stage, association analyses are performed to test for association between the adjusted trait and each variant (or set of variants) without inclusion of other covariates. We term this strategy “adjusted-trait regression (without covariates)” (ATR).

Although ATR can be conceptualized as a two-stage method, we note that it bears no relation to the “two-stage least-squares” method used in structural equations modeling and estimation of causal effects using instrumental variables. We assume that the target of inference is the conditional association between the unadjusted trait and variants given the covariates rather than the association between the adjusted trait and variants unconditional on the covariates. Thus, we view ATR as a numerical technique to conveniently approximate the results that would have been obtained from analysis of the unadjusted trait (with covariates included). The strategy of analyzing a covariate-adjusted trait may be used for any statistical method that deals with linear models, including gene/region based tests like burden or SKAT (Lee et al., 2014) or methods for linear mixed-models.

We have not found any methods papers that recommend the use of ATR. Indeed, the research articles cited above make use of ATR without comment or justification. ATR results are not identical to results obtained from modeling the unadjusted trait along with covariates. Previous investigations of single-variant models showed that the ordinary least-squares ATR estimator of genetic effect is biased towards zero by a factor of 1 − R2 (Demissie & Cupples, 2011; Xing et al., 2011; Che et al., 2012), where R2 is the sample coefficient of determination obtained by regressing the genetic variant onto the covariates. These investigations used approximations and simulations to assess power and Type 1 error of the ATR-based tests assuming a Type 1 error rate of α=0.05 and showed that ATR is typically less powerful than multiple linear regression when the sample correlation between a genetic variant and covariates is non-zero. More recently, Sofer et al. (2019) showed that the ATR-based single-variant score and multi-variant SKAT test statistics are numerically (deterministically) dominated by the corresponding test statistics obtained from analyzing the unadjusted trait with covariates leading to deflated p-values and loss of power.

We extend these previous results by deriving the exact relationship between ATR and multiple linear regression score, likelihood ratio, Wald, and F test-statistics for single-variant analysis. We use these relationships to derive (1) the exact finite sample distributions of the ATR test-statistics (hence, exact power and Type 1 error) under the assumption of independent and identically normally distributed errors and (2) the asymptotic relationship between the test-statistics for situations where the assumption is suspect. In addition, we derive the asymptotic distributions of ATR based analogs of two gene/region-based tests: the burden test and the (omnibus) score test, and show that these tests applied in the ATR framework may also suffer from loss of power compared to their multiple linear regression analogs. In particular, we show that the maximum possible power loss for gene-based ATR score tests depends on the maximum canonical correlation between the set of variants and the set of covariates, so that we expect power loss to be modest in typical GWAS with low to moderate population structure.

METHODS AND RESULTS

Definition of the ATR approach

We assume a model of the form:

Yi=α+j=1mβjgij+l=1kγlcil+ϵi (M1)

Here Yi,1in is the trait value for the ith study participant, gij the genotype (or genotype-imputation-based dosage) for the jth variant for this study participant, βj the effect of the variant on the trait (conditional on the other m − 1 variants and covariates), cil the value of the lth covariate, γj the (conditional) effect of the covariate, and ϵi a random error. We assume the errors are independent and identically distributed across observations with Eϵi=0 and Varϵi=σ2. For single-variant models, m=1 and β is the conditional effect of the variant on the trait given the covariates, but unconditional on any other variant.

The above model can be represented as Y=Gβ+Cγ+ϵ where Y and ϵ are n×1 vectors, G is an n×m matrix, β is a m×1 vector, C is an n×(k+1) matrix (including a column of ones for the intercept), and γ=α,γ1,,γk is a k+1×1 vector. We have VarYG,C=Varϵ=σ2In where In is the n-dimensional identity matrix. We wish to test H0:β=0. Further, we assume that the test statistic T has the form T=f(Y,G,C). We note that the distribution of T under the null may depend on G and C and on parameters that need to be estimated from the data. We assume that the (possibly estimated) parameter value θ^ required to define the distribution of T under the null (for example, degrees of freedom for the F-statistic) also has the form θ^=gY,G,C.

Let HC=CCC1C. Then Yr=YCγ^=InHCY is the vector of residuals obtained by regressing Y onto C using ordinary least squares (with γ^=CC1CY). We define the ATR analog of T to be TATR=f(Yr,G,Jn) where Jn=(1,,1) is the n×1 vector of ones denoting the intercept. Further, we assume that the parameter θ for ATR is calculated as θ^ATR=g(Yr,G,Jn). This definition of the ATR analog implies that inference based on TATR can be performed by using existing software designed for inference with T simply by replacing Y and C with Yr and Jn. We note that if the parameter of the null distribution for a method depends on Y and/or C, we may have θ^θ^ATR, and the ATR analog may reference a null distribution that differs from the one used by the unadjusted method to calculate p-values.

Ordinary least-squares estimation with ATR

The ordinary least-squares estimator of β is given by β^=GrGr1GrYr where Gr=(InHC)G is the matrix of residuals of variants regressed onto C. This result is often referred to as the Frisch-Waugh-Lovell theorem (Frisch & Waugh, 1933; Lovell, 2008). In the appendix, we show that

β^ATR=(ImRGC2)β^

where RGC2=GInH1G1G(HCH1)G and H1=JnJn/n. Note that the eigenvalues of RGC2 are the sample canonical correlations between the set of genetic variants and the set of covariates. In particular, RGC2=0 (the zero matrix) if and only if every genetic variant is uncorrelated with all covariates. Further, we have Eβ^ATR=ImRGC2β and, consequently, Eβ^ATR=0 if and only if none of the genetic variants are associated with the trait (conditional on covariates). Thus, any test that is valid for testing the omnibus hypothesis H0:Eβ^ATR=0 is also valid for testing H0:β=0.

In the case of single-variant analysis (m=1), the above relationship simplifies to β^ATR=(1R2)β^ and we recover the result obtained previously (Demissie & Cupples, 2011; Xing et al., 2011; Che et al., 2012; Sofer et al., 2019). Thus, for single-variant analysis, the ATR ordinary least-squares estimator can only be biased towards the null. This is not true for individual elements of β^ATR when m>1. Indeed, Eβ^ATRj is a linear combination of all the elements of the vector β. In particular, βj=0 does not necessarily imply that Eβ^ATRj=0. Thus, a test that is valid for H0:Eβ^ATRj=0 is not necessarily valid for H0:βj=0 (unless all remaining elements of β are also 0).

Single-variant association testing with ATR

Xing et al. (2011) showed that WATRW where W is the Wald test statistic. Che et al. (2012) refined an approximation proposed by Demissie and Cupples (2011) for the F test statistic (F) to FATR=n2nk21R21+R2r2Yr,Grr2(Yr,Gr)F where r2(Yr,Gr) is the sample squared correlation between Yr and Gr and F is the F statistic. Xing et al. (2011) and Che at al. (2012) used simulations to estimate power and Type 1 error rate for α=0.05.

We show that SATR=1R2S, where S is the score test statistic for the above linear model when m=1. For linear models, the test statistics for the score, Wald, likelihood ratio, and F tests bear simple, deterministic relationships to each other (Vandaele 1981). Combining SATR=1R2S with these known relationships yields the following set of equalities:

FATR=n2nk2×(1R2)F1+R2F/(nk2)
WATR=(1R2)W1+R2W/n
LRATR=LRnlog(1+R2[eLR/n1])

where LR denotes the likelihood ratio test statistic. We see that S,W,andLR are always strictly greater than their ATR anologs if R2>0 and equal to them if R2=0. P-values for the score, Wald, and likelihood ratio tests are standardly computed assuming the test statistics follow a chi-square distribution with θ=s=1 degree of freedom (χ12 distribution). The ATR analogs of these methods also assume this same distribution and are less powerful than their counterparts if R2>0.

In contrast, FATR>F if F<kR2(n2) and the ATR analog of the F-test uses the F-distribution with 1 and n2 degrees of freedom while the F-test assumes a distribution with 1 and nk2 degrees of freedom; in this case, θ^ATRθ^ since the denominator degrees of freedom depends on the number of covariates. Thus, the ATR analog of the F-test may be slightly anti-conservative if R20 and/or the number of covariates is large relative to the sample size. This is quite unlikely given the large sample sizes of current GWAS, the large values of the test statistic required to reject the null, and the fact that the expected value of the sample coefficient of determination increases with increasing number of predictors, even when the variant is independent of the predictors at the population level, in which case ER2k/n for large samples.

For a fixed number of covariates, the score, Wald, likelihood ratio, and F test statistics asymptotically converge to the same random variable T (almost surely) under the null and local alternatives (β=On1/2 i.e. when the effect size tends to zero asymptotically). Similarly, their ATR analogs each converge to 1R2T. Asymptotically, each of the ATR test statistics follows a scaled χ12 distribution whose scaling factor is less than or equal to one and are, thus, conservative when R2>0. The exact finite sample distribution of the F statistic is known in the case where errors are normally distributed; the exact distributions of all the other test statistics can be derived easily given the above relationships.

For simplicity, we illustrate the conservative nature of ATR for single-variant tests under asymptotic conditions. Here, we have PTATR<α=P(T<α/(1R2)). The relationship between the p-values generated by the score test and its ATR analog is non-linear; the ATR test becomes more conservative as the p-value threshold for declaring significance (α) becomes more stringent. Figure 1 shows power of the ATR test with R2=0.05 for α values ranging from 101 to 1010 where the effect size for each α value is chosen to yield 80% power for the score test. At the usual GWAS threshold of α=5×108, the power of the ATR test is about 76%. Figure 2 shows how, for fixed α=5×108, the ATR test becomes less powerful as R2 increases (again, with effect size chosen to yield 80% power for the score test).

Figure 1:

Figure 1:

Power of ATR analog of single-variant score test when R2=0.05 with varying stringency of statistical significance α displayed in the negative log ten scale. Effect sizes vary as a function of α to yield 80% power for the score test.

Figure 2:

Figure 2:

Power of ATR analog of single-variant score test with increasing R2 for α=5×108. The effect size was chosen to yield 80% power for the score test.

Burden tests with ATR

The relationships derived for the single-variant tests are directly applicable to burden tests. Burden tests typically assume the same multiple linear regression model presented in the previous section with G replaced by B=i=1mwiGi=GW where Gi,,Gm are m genetic variants (columns of G),wi are weights (and W=w1,,ws), and B is the (weighted) burden of alternate alleles (or genotype imputation-based dosages) from the m variants. For burden tests, R2 is the sample coefficient of determination obtained by regressing B onto C. Given G and C, the maximum possible value for R2 is obtained when the weight vector W is a scalar multiple of the eigenvector of RGC2 corresponding to the maximum eigenvalue and the maximum R2 is equal to the maximum eigenvalue.

Classical omnibus tests with ATR

The omnibus null hypothesis that none of the m variants are associated with trait (conditional on covariates) can be tested with the omnibus/multivariate score, Wald, likelihood ratio, and F tests. As before, these tests are asymptotically equivalent and we consider the score test as an exemplar. Unlike the single-variant case, no deterministic relationship exists between SATR and S when m>1 (that is, SATR can take multiple values for any given value of S). However, we show that

1Rmax2SSATR(1Rmin2)S

where Rmax2 and Rmin2 are the maximum and minimum canonical correlations between the variants and covariates. Recall that S asymptotically follows a χm2(δ2) distribution with non-centrality parameter δ2=1σ2βGInHCG. Under the null, the distribution of S depends only on the parameter θ^=m. Asymptotically, SATR follows the same distribution as the random variable i=1p1Ri2Zi where R12,,Rp2 are the distinct eigenvalues of RGC2 (in decreasing order so that R12=Rmax2 and with p possibly smaller than m) and the random variables Zi are mutually independent with Zi~χνi2λi2,i=1pνi=m (see Appendix). Since θ^ is independent of C, we have θ^ATR=θ^ and p-values for SATR are calculated assuming a central χs2 distribution.

Note that the score test yields the same power for all effect size vectors β such that βGInHCβ=c where c0 is a constant. Although the actual difference in power between S and SATR depends on the true value of β, we show that, amongst all β that yield the same power for the score test, the ATR analog achieves minimum power when β is a scalar multiple of the eigenvector of RGC2 corresponding to the maximum eigenvalue (see Appendix). Here, λ12=δ2 and λi2=0 for i=2,,p. Thus, the maximum possible power loss of the ATR analog of the score test (relative to the score test) is completely characterized by the set of canonical correlations between the variants and covariates.

Figure 3 shows, for fixed α=5×108 and s=10 variants, the power of ATR analog across a range of Rmax2 with effect size chosen to yield 80% power for the omnibus score test. We calculated tail probabilities for the distribution of SATR using Davies’ method as implemented in the R package CompQuadForm (de Micheaux, P. L., & de Micheaux, M. P. L., 2017). We consider two situations. First, if the remaining canonical correlations are zero, the maximum possible power loss is slightly larger than that for the single-variant case for m=10 and power loss increases as m increases (m=100 shown in Figure 3). Second, if all canonical correlations are equal to Rmax2,SATR follows the scaled chi-squared distribution 1Rmax2χm2δ2, and the maximum possible power loss is equal to the minimum possible power loss; thus, for a given value of Rmax2, this constitutes the worst-case scenario for ATR (Figure 3). Note that the maximum number of non-zero canonical correlations cannot exceed min(m,k). Thus, the second scenario is unlikely to occur in practice.

Figure 3:

Figure 3:

Power of ATR analog of the multi-variant (omnibus) score test (Y-axis) with m=10 (black) and m=100 (red) variants. X-axis shows the maximum canonical correlation between the variants and covariates. Solid line: power when the other canonical correlations are 0. Dashed line: power when other canonical correlations are equal to the maximum correlation. Effect sizes for the set of variants are chosen to yield 80% power for the omnibus score test and minimum power for the ATR analog (see text) with α=5×108.

DISCUSSION

The ATR approach is often used in genetic association studies (Randall et al., 2013; UK10K Consortium, 2015; Tachmazidou et al., 2017; Kanai et al., 2018; Styrkarsdottir et al., 2019; Niarchou et al., 2020), and several papers have used simulation to assess its properties at modest significance thresholds (Demissie & Cupples, 2011; Xing et al., 2011; Che et al., 2012). However, to our knowledge no papers have presented analytic evaluations of ATR or considered significance thresholds appropriate for GWAS. The Frisch-Waugh-Lovell theorem (Frisch & Waugh, 1933; Lovell, 2008) demonstrates that when the target of inference is confined to a subset of predictors in the multiple linear regression model (e.g. genetic variants), OLS analysis can be achieved as a two-stage method by regressing the covariate adjusted trait onto the covariate adjusted variants. Thus, the ATR strategy of adjusting the trait but not the variants is formally justified in the context of multiple linear regression only when variants and covariates are uncorrelated.

It may seem that score-tests like those presented above or SKAT employ the same strategy as ATR. Indeed, for single-variant analyses the score-statistic for linear models (GYr) is based on the adjusted trait and unadjusted variant. However, the score test-statistic (calculated by squaring the score-statistic and dividing by its estimated variance) does depend on the adjusted variants. Indeed, it can be shown that ATR over-estimates the variance of the score-statistic by a factor of 1R21 due to using unadjusted variants in the variance calculation. Our derivations also show that single-variant OLS based inference can be fully recovered from the ATR based inference given the summary statistic R2 for each variant. For multi-variant analyses, the entire RGC2 matrix is required.

For single-variant association tests, previous papers show by computer simulation that ATR is less powerful than the (theoretically justified) two-sided t and Wald tests when the variant is correlated with the covariates (Demissie & Cupples, 2011; Xing et al., 2011; Che et al., 2012; Sofer et al., 2019). We extend previous results by deriving the exact distribution of the ATR analogs for single-variant Wald, likelihood ratio, score, and F tests, and the asymptotic distributions for gene-based burden and score tests, and assessing size and power at significance levels appropriate for GWAS.

For single-variant tests, we show that the loss of power of the ATR method is completely characterized by the coefficient of determination (R2) obtained by regressing the variant onto the covariates, with the power loss increasing with increasing R2. Further, we show that loss of power increases as the p-value cutoff used to declare significance becomes more stringent. Characterizing power loss for the ATR analogs of gene-based tests is more complex. For gene-based score tests, the power loss depends on both the (true) strength of association between each variant and the outcome, and the correlation between each variant and the covariates. Power loss is greater when the subset of variants driving the association is also the subset that is driving the canonical correlation between variants and covariates. For the ATR analogs of the multiple linear regression omnibus test of association, we show that the maximum possible power loss is completely characterized by the canonical correlations between the variants and covariates with maximum power loss increasing with increasing values of any of the canonical correlations. When there is only a single non-zero canonical correlation, the maximum power loss is similar to the single-variant case.

At the significance threshold of α=5×108 typically used in GWAS, an R2 of 0.1 results in power decreasing from 80% (for the two-sided t test) to about 71% for the single-variant ATR test. Thus, we recommend that ATR based methods only be used when the R2 for the majority of variants is expected be substantially less than 0.1. We re-iterate that sets of covariates not associated with the variant do not result in loss of power due to using ATR; in fact, they increase power if they explain some of the trait variance (Robinson & Jewell, 1991). Covariates that are associated with the trait but not genetic variants in a population based sample may be associated with genetic variants in studies that sample participants non-randomly (Munafo et al., 2018; Greenland et al., 1999); for example, two variables that both cause a disease but are independent in a population will be associated in a case-control sample (Monsees et al., 2009).

In GWAS, the most commonly included covariates that are likely to be correlated with a large number of variants are indicators of genetic ancestry (e.g. principal components). The distribution of correlation depends on the degree of population structure in the sample and the mean R2 across variants is (approximately) the sample Fst. For intra-continental samples, typically Fst<0.05 but for inter-continental samples it can be >0.1 [The 1000 Genomes Project Consortium, 2015]. As a further example, we calculated R2 between ~750,000 genotyped variants and the first 2, 5, and 10 genetic principal components for ~409,000 participants with white-British ancestry in the UK Biobank (details of SNP QC and PCA generation in Bycroft et al., 2018) and found all R2 values were < 0.05. In the analysis including the remaining 78,000 non-white participants (total sample size ~487,000), 6% of variants showed R2>0.05 and 2.5% showed R2>0.10 (the results were approximately similar with 2, 5, and 10 PCs).

Other commonly included covariates that may be correlated with variants are intermediate traits lying in between the gene and primary trait in the causal pathway, and indicators of sample processing or batch effects. For intermediate traits that are genetically complex, values of R2 will typically be much smaller than 0.1. The situation with batch effects is less clear, especially for sequencing data which are sensitive to both sample processing and genotype calling methods. Finally, variants which are known to be associated with the trait may also be included as covariates, especially in fine mapping analyses or while searching for multiple independent signals within the same locus. Here, we recommend against using ATR based methods since there is potentially a large power loss for variants in even moderate linkage disequilibrium with the associated variant.

In multiple-variant tests such as burden and omnibus tests (like the F-test or SKAT), we note that least-squares effect size estimator for any particular variant may be biased either towards or away from the null for ATR. Thus, although ATR based tests are valid for the omnibus hypothesis that none of the variants are associated, an ATR based test for the conditional effect of a variant given the remaining variants may not be valid. This is of particular importance for post-hoc testing when the omnibus test is rejected and the analyst wishes to identify the subset of variants driving the association. We recommend against using ATR for such purposes.

When the distribution of the trait differs substantially from the normal distribution, ATR based methods are commonly used in conjunction with applying the inverse normal transform to the adjusted trait. Sofer et al. (2019) show that testing for association between the transformed adjusted trait and unadjusted variants may lead to increased Type 1 error and instead recommend using adjusted variants. McCaw et al. (2019) implement an omnibus test with this strategy.

Finally, we have assumed throughout that the multiple linear model (M1) is appropriate to answer the research question at hand and that β truly measures the effect of interest. This necessitates including certain covariates (e.g. confounders), excluding others (e.g. colliders; see Greenland et al., 1999) and accounting for sample-selection effects (Munafo et al., 2018). For example, Aschard et al. (2015) show that simply adjusting for heritable covariates may lead to biased estimates of the direct (unmediated) effect of the variant on the trait and may lead to increased Type 1 error. We note that when OLS analysis of the full regression model results in increased Type 1 error, ATR will also be unable to fully control Type 1 error (although, the magnitude of Type 1 error will be lower with increasing R2). Thus, ATR is invalid whenever OLS analysis of the full regression model is invalid.

In summary, we derive distributions of the ATR analogs of commonly used association test statistics. We show that ATR based methods are conservative when variants are correlated with covariates. We quantify the power loss and recommend that ATR based methods be used only when the squared correlation between variants and covariates can be confidently bounded to be substantially smaller than 0.1. We note that for commonly included covariates like age, gender and known or inferred ancestry, this is typically true and ATR based methods will likely result in negligible power loss. However, we reiterate that ATR is an ad-hoc methodology. Thus, we recommend that analysts carefully choose covariates based on a plausible causal model (accounting for sample-selection effects) and employ estimation/hypothesis-testing methods that are theoretically justified for those models.

Acknowledgments

Grant Number: NIH NHGRI HG009976

Appendix

All notation in the Appendix is as defined in the main text.

ATR estimator for β

The OLS estimator for β is given by β^=GInHCG1GYr where Yr=(InHC)Y is the residual vector obtained from regressing Y onto C, and HC=CCC1C. Note that Varβ^=σ2G(InHC)G1. Since ATR simply replaces YandC with YrandJn, we have

β^ATR=[G(InH1)G]1G(InH1)Yr=[G(InH1)G]1GYr=[G(InH1)G]1[G(InHC)G]β^=(Im[G(InH1)G]1[G(HCH1)G])β^=def(ImRGC2)β^

The second equality holds because InH1Yr=YrYr (where Yr is the sample mean of Yr) and Yr=0. The third equality holds because GYr=[GInHCG]β^ which follows from the expression for β^. The fourth equality follows with straightforward algebra. Note that the eigenvalues of RGC2=defGInH1G1GHCH1G are the canonical correlations between G and C. Thus, when each variant is uncorrelated with all the covariates, all the eigenvalues of RGC2 are 0 and β^ATR=β^.

When the model contains only one variant (m=1), we have GIH1G1GIHCG=1R2 where R2 is the coefficient of determination obtained by regressing the variant onto the covariates.

Relationship between the score test statistic and its ATR analog

The score test-statistic for testing H0:β=0 is given by

S=1σ~2β^GInHCGβ^

where σ~2=1nYInHCY=1nYrYr is the maximum likelihood estimator (MLE) for σ2 under the null (Vandaele 1981).

Note that σ~ATR2=1nYrInH1Yr=1nYrYr=σ~2 since InH1Yr=YrYr=Yr. Thus, we have

SATR=1σ˜2β^ATRG(InH1)Gβ^ATR=1σ˜2β^[G(InHC)G][G(InH1)G]1[G(InHC)G]β^=1σ˜2β^[G(InHC)G][ImRGC2]β^=S1σ˜2β^[G(InHC)G]RGC2β^

Equivalently, we have

SATRS=β^GInHCG[ImRGC2]β^β^GInHCGβ^

Recall that, for all vectors x such that xBx=c (for any constant c>0) the generalized Rayleigh quotient Q=xAxxBx is bounded below and above by the minimum and maximum eigenvalues of B1A. Thus, we have

1Rmax2SSATR(1Rmin2)S

where Rmin2 and Rmax2 are the smallest and largest eigenvalues of RGC2. The lower (upper) bound is attained when β^ is parallel to the eigenvector corresponding to maximum (minimum) eigenvalue of RGC2. When each variant is orthogonal to each of the covariates we have Rmin2=Rmax2=0 and SATR=S.

When the model contains only one variant, the above relationship simplifies to the deterministic relationship SATR=(1R2)S (with R2 as defined previously). For m>1, the relationship is not deterministic (that is, SATR can take multiple values for any given value of S) unless all the variants are collinear. We can use the relationships between the score, Wald, likelihood-ratio, and F test statistics (Vandaele 1981) to derive exact expressions for the relationships between each of these tests and their ATR analogs for single variant models. We state these relationships in the main text (but omit the straightforward algebra).

Asymptotic distribution of SATR

Asymptotically, SATR converges in distribution to the distribution of the quadratic form β^Aβ^ with A=σ2GInHCGGInH1G1GInHCG. With suitable regularity conditions, asymptotically β^~N(μ,V) with V=σ2GInHCG1. Baldessari (1967) derived the distribution of quadratic forms in multivariate normal variables. Since A is symmetric and V positive definite, there exists an invertible matrix M such that MV1M=Im and MAM=Λ with Λ an m×m diagonal matrix. Thus, we have that ImRGC2=VA=MΛM1; that is, the columns of M are the eigenvectors of ImRGC2 (and RGC2) and the ith element of the diagonal of Λ is 1li2 with li2 the eigenvalue of RGC2 corresponding to the ith column of M. Let Rj2,1jp denote the pm distinct eigenvalues of RGC2 with R12>···>Rp2. Let Bj be the m×m diagonal matrix which has elements 1 where Λ has elements 1Rj2 and 0 otherwise. Then, from Baldessari (1967, Theorem 1) and some trivial algebra, SATR follows the same distribution as j=1p(1Rj2)Zj, where Zj~χνj2(λj2) (that is, a non-central chi-squared distribution with νj degrees of freedom and non-centrality parameter λj2),λj2=M1βBjM1β and νj is the geometric multiplicity of Rj2.

Recall that, asymptotically, S~χm2(δ2) with δ2=βV1β=M1βM1β. Thus, we have j=1pλj2=δ2. When β lies in the space spanned by the eigenvector(s) of RGC2 corresponding to the (distinct) eigenvalue Rk2,1kp, we have λk2=δ2 and λi2=0,ik. Consider the set Δ of vectors β that yield the same power for the score test (that is, all vectors β for which βV1β=δ2 for a given δ2). Unlike S, the power of SATR may differ when β takes different values in this set. We use a result derived by Matthew and Nordstöm (1997) to find values in Δ that lead to minimum power for SATR:

Theorem 3 (Matthew and Nordstöm, 1997). Let Xi and Yi be distributed, respectively, as χνi2(δi2) and χνi2(μi2), i=1,,n, with X1,,Xn independent and Y1,,Yn independent. Then

i=inλiXiDi=1nλiYi

holds for all nonnegative λis satisfying λ1λn if and only if

i=1kδi2i=1kμi2for allk=1,n.

In the above theorem, XDY denotes that the random variable Y stochastically dominates X. From the above theorem and preceding details of the distribution of SATR, it follows that distribution followed by SATR when β lies in the space spanned by the eigenvectors of RGC2 corresponding to the maximum eigenvalue R12=Rmax2 is dominated by the distribution followed by SATR when β takes any other value in Δ.

Footnotes

DATA AVAILABILITY STATEMENT

Data sharing not applicable – no new data generated.

CONFLICT OF INTEREST STATEMENT

The authors have no conflict of interest to declare.

REFERENCES

  1. 1000 Genomes Project Consortium. (2015). A global reference for human genetic variation. Nature, 526(7571), 68–74. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Aschard H, Vilhjálmsson BJ, Joshi AD, Price AL, & Kraft P (2015). Adjusting for heritable covariates can bias effect estimates in genome-wide association studies. The American Journal of Human Genetics, 96(2), 329–339. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Bycroft C, Freeman C, Petkova D, Band G, Elliott LT, Sharp K, ... & Cortes A (2018). The UK Biobank resource with deep phenotyping and genomic data. Nature, 562(7726), 203–209. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Che R, Motsinger-Reif AA, & Brown CC (2012). Loss of power in two-stage residual-outcome regression analysis in genetic association studies. Genetic epidemiology, 36(8), 890–894. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Demissie S, & Cupples LA (2011). Bias due to two-stage residual-outcome regression analysis in genetic association studies. Genetic epidemiology, 35(7), 592–596. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Frisch R, & Waugh FV (1933). Partial time regressions as compared with individual trends. Econometrica: Journal of the Econometric Society, 387–401. [Google Scholar]
  7. Greenland S, Pearl J, & Robins JM (1999). Causal diagrams for epidemiologic research. Epidemiology, 37–48. [PubMed] [Google Scholar]
  8. Kanai M, Akiyama M, Takahashi A, Matoba N, Momozawa Y, Ikeda M, ... & Kubo M (2018). Genetic analysis of quantitative traits in the Japanese population links cell types to complex human diseases. Nature genetics, 50(3), 390–400. [DOI] [PubMed] [Google Scholar]
  9. Lee S, Abecasis GR, Boehnke M, & Lin X (2014). Rare-variant association analysis: study designs and statistical tests. The American Journal of Human Genetics, 95(1), 5–23. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Lovell MC (2008). A simple proof of the FWL theorem. The Journal of Economic Education, 39(1), 88–91. [Google Scholar]
  11. Mathew T, & Nordström K (1997). Inequalities for the probability content of a rotated ellipse and related stochastic domination results. The Annals of Applied Probability, 7(4), 1106–1117. [Google Scholar]
  12. McCaw ZR, Lane JM, Saxena R, Redline S, & Lin X (2019). Operating characteristics of the rank-based inverse normal transformation for quantitative trait analysis in genome-wide association studies. Biometrics. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. de Micheaux PL, & de Micheaux MPL (2017). Package ‘CompQuadForm’. CRAN Repository. [Google Scholar]
  14. Monsees GM, Tamimi RM, & Kraft P (2009). Genome-wide association scans for secondary traits using case-control samples. Genetic Epidemiology: The Official Publication of the International Genetic Epidemiology Society, 33(8), 717–728. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Munafò MR, Tilling K, Taylor AE, Evans DM, & Davey Smith G (2018). Collider scope: when selection bias can substantially influence observed associations. International journal of epidemiology, 47(1), 226–235. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Niarchou M, Byrne EM, Trzaskowski M, Sidorenko J, Kemper KE, McGrath JJ, ... & Wray NR (2020). Genome-wide association study of dietary intake in the UK biobank study and its associations with schizophrenia and other traits. Translational Psychiatry, 10(1), 1–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Randall JC, Winkler TW, Kutalik Z, Berndt SI, Jackson AU, Monda KL, ... & Workalemahu T (2013). Sex-stratified genome-wide association studies including 270,000 individuals show sexual dimorphism in genetic loci for anthropometric traits. PLoS Genet, 9(6), e1003500. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Robinson LD, & Jewell NP (1991). Some surprising results about covariate adjustment in logistic regression models. International Statistical Review/Revue Internationale de Statistique, 227–240. [Google Scholar]
  19. Sofer T, Zheng X, Gogarten SM, Laurie CA, Grinde K, Shaffer JR, ... & Lange L (2019). A fully adjusted two-stage procedure for rank-normalization in genetic association studies. Genetic epidemiology, 43(3), 263–275. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Styrkarsdottir U, Stefansson OA, Gunnarsdottir K, Thorleifsson G, Lund SH, Stefansdottir L, ... & Ivarsdottir EV (2019). GWAS of bone size yields twelve loci that also affect height, BMD, osteoarthritis or fractures. Nature communications, 10(1), 1–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Tachmazidou I, Süveges D, Min JL, Ritchie GR, Steinberg J, Walter K, ... & McCarthy S (2017). Whole-genome sequencing coupled to imputation discovers genetic signals for anthropometric traits. The American Journal of Human Genetics, 100(6), 865–884 [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. UK10K consortium. (2015). The UK10K project identifies rare variants in health and disease. Nature, 526(7571), 82–90. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Vandaele W (1981). Wald, likelihood ratio, and Lagrange multiplier tests as an F test. Economics Letters, 8(4), 361–365. [Google Scholar]
  24. Xing G, Lin CY, & Xing C (2011). A comparison of approaches to control for confounding factors by regression models. Human heredity, 72(3), 194–205. [DOI] [PubMed] [Google Scholar]

RESOURCES