Abstract
Multiple linear regression is commonly used to test for association between genetic variants and continuous traits and estimate genetic effect sizes. Confounding variables are controlled for by including them as additional covariates. An alternative technique that is increasingly used is to regress out covariates from the raw trait and then perform regression analysis with only the genetic variants included as predictors. In the case of single-variant analysis, this adjusted trait regression (ATR) technique is known to be less powerful than the traditional technique when the genetic variant is correlated with the covariates We extend previous results for single-variant tests by deriving exact relationships between the single-variant score, Wald, likelihood-ratio, and F-test statistics and their ATR analogs. We also derive the asymptotic power of ATR analogs of the multiple-variant score and burden tests. We show that the maximum power loss of the ATR analog of the multiple-variant score test is completely characterized by the canonical correlations between the set of genetic variants and the set of covariates. Further, we show that for both single- and multiple-variant tests, the power loss for ATR analogs increases with increasing stringency of Type 1 error control () and increasing correlation (or canonical correlations) between the genetic variant (or multiple variants) and covariates. We recommend using ATR only when maximum canonical correlation between variants and covariates is low, as is typically true.
Keywords: adjusted outcome, power loss, covariates, linear regression, genome-wide association study
INTRODUCTION
Multiple linear regression and the associated ordinary least-squares and F-test methodologies are effective and widely used approaches to test for association between genetic variants and quantitative traits and to estimate genetic effect sizes while controlling for the effects of other variables (covariates). Covariates may be included to account for confounding (e.g. due to population structure or assay batch effects), to reduce trait variability and consequently increase power, or to exclude associations that are driven primarily through the action of the variants on an intermediate trait.
Current genome-wide association studies (GWAS) typically assay hundreds of thousands to millions of genetic variants. Single-variant association tests are performed separately on each variant to test whether the variant is associated with the trait. Multi-variant, gene-, or region-based tests are performed to address the omnibus hypothesis that one or more in a set of variants are associated with the trait. Since the dependent variable and covariates are typically the same across all tests, some analysts use a two-stage approach for quantitative trait GWAS (Randall et al., 2013; UK10K Consortium, 2015; Tachmazidou et al., 2017; Kanai et al., 2018; Styrkarsdottir et al., 2019; Niarchou et al., 2020 are some examples of studies employing this methodology). In the first stage, an ‘adjusted’ trait is obtained as the residuals from the regression of the trait on covariates. In the second stage, association analyses are performed to test for association between the adjusted trait and each variant (or set of variants) without inclusion of other covariates. We term this strategy “adjusted-trait regression (without covariates)” (ATR).
Although ATR can be conceptualized as a two-stage method, we note that it bears no relation to the “two-stage least-squares” method used in structural equations modeling and estimation of causal effects using instrumental variables. We assume that the target of inference is the conditional association between the unadjusted trait and variants given the covariates rather than the association between the adjusted trait and variants unconditional on the covariates. Thus, we view ATR as a numerical technique to conveniently approximate the results that would have been obtained from analysis of the unadjusted trait (with covariates included). The strategy of analyzing a covariate-adjusted trait may be used for any statistical method that deals with linear models, including gene/region based tests like burden or SKAT (Lee et al., 2014) or methods for linear mixed-models.
We have not found any methods papers that recommend the use of ATR. Indeed, the research articles cited above make use of ATR without comment or justification. ATR results are not identical to results obtained from modeling the unadjusted trait along with covariates. Previous investigations of single-variant models showed that the ordinary least-squares ATR estimator of genetic effect is biased towards zero by a factor of 1 − R2 (Demissie & Cupples, 2011; Xing et al., 2011; Che et al., 2012), where is the sample coefficient of determination obtained by regressing the genetic variant onto the covariates. These investigations used approximations and simulations to assess power and Type 1 error of the ATR-based tests assuming a Type 1 error rate of and showed that ATR is typically less powerful than multiple linear regression when the sample correlation between a genetic variant and covariates is non-zero. More recently, Sofer et al. (2019) showed that the ATR-based single-variant score and multi-variant SKAT test statistics are numerically (deterministically) dominated by the corresponding test statistics obtained from analyzing the unadjusted trait with covariates leading to deflated p-values and loss of power.
We extend these previous results by deriving the exact relationship between ATR and multiple linear regression score, likelihood ratio, Wald, and F test-statistics for single-variant analysis. We use these relationships to derive (1) the exact finite sample distributions of the ATR test-statistics (hence, exact power and Type 1 error) under the assumption of independent and identically normally distributed errors and (2) the asymptotic relationship between the test-statistics for situations where the assumption is suspect. In addition, we derive the asymptotic distributions of ATR based analogs of two gene/region-based tests: the burden test and the (omnibus) score test, and show that these tests applied in the ATR framework may also suffer from loss of power compared to their multiple linear regression analogs. In particular, we show that the maximum possible power loss for gene-based ATR score tests depends on the maximum canonical correlation between the set of variants and the set of covariates, so that we expect power loss to be modest in typical GWAS with low to moderate population structure.
METHODS AND RESULTS
Definition of the ATR approach
We assume a model of the form:
| (M1) |
Here is the trait value for the study participant, the genotype (or genotype-imputation-based dosage) for the variant for this study participant, the effect of the variant on the trait (conditional on the other m − 1 variants and covariates), the value of the covariate, the (conditional) effect of the covariate, and a random error. We assume the errors are independent and identically distributed across observations with and . For single-variant models, and is the conditional effect of the variant on the trait given the covariates, but unconditional on any other variant.
The above model can be represented as where and are vectors, is an matrix, is a vector, is an matrix (including a column of ones for the intercept), and is a vector. We have where is the n-dimensional identity matrix. We wish to test . Further, we assume that the test statistic has the form . We note that the distribution of under the null may depend on and and on parameters that need to be estimated from the data. We assume that the (possibly estimated) parameter value required to define the distribution of under the null (for example, degrees of freedom for the F-statistic) also has the form .
Let . Then is the vector of residuals obtained by regressing onto using ordinary least squares (with ). We define the ATR analog of to be where is the vector of ones denoting the intercept. Further, we assume that the parameter for ATR is calculated as . This definition of the ATR analog implies that inference based on can be performed by using existing software designed for inference with simply by replacing and with and . We note that if the parameter of the null distribution for a method depends on and/or , we may have , and the ATR analog may reference a null distribution that differs from the one used by the unadjusted method to calculate p-values.
Ordinary least-squares estimation with ATR
The ordinary least-squares estimator of is given by where is the matrix of residuals of variants regressed onto . This result is often referred to as the Frisch-Waugh-Lovell theorem (Frisch & Waugh, 1933; Lovell, 2008). In the appendix, we show that
where and . Note that the eigenvalues of are the sample canonical correlations between the set of genetic variants and the set of covariates. In particular, (the zero matrix) if and only if every genetic variant is uncorrelated with all covariates. Further, we have and, consequently, if and only if none of the genetic variants are associated with the trait (conditional on covariates). Thus, any test that is valid for testing the omnibus hypothesis is also valid for testing .
In the case of single-variant analysis , the above relationship simplifies to and we recover the result obtained previously (Demissie & Cupples, 2011; Xing et al., 2011; Che et al., 2012; Sofer et al., 2019). Thus, for single-variant analysis, the ATR ordinary least-squares estimator can only be biased towards the null. This is not true for individual elements of when . Indeed, is a linear combination of all the elements of the vector . In particular, does not necessarily imply that . Thus, a test that is valid for is not necessarily valid for (unless all remaining elements of are also ).
Single-variant association testing with ATR
Xing et al. (2011) showed that where is the Wald test statistic. Che et al. (2012) refined an approximation proposed by Demissie and Cupples (2011) for the F test statistic () to where is the sample squared correlation between and and is the F statistic. Xing et al. (2011) and Che at al. (2012) used simulations to estimate power and Type 1 error rate for .
We show that , where is the score test statistic for the above linear model when . For linear models, the test statistics for the score, Wald, likelihood ratio, and F tests bear simple, deterministic relationships to each other (Vandaele 1981). Combining with these known relationships yields the following set of equalities:
where denotes the likelihood ratio test statistic. We see that are always strictly greater than their ATR anologs if and equal to them if . P-values for the score, Wald, and likelihood ratio tests are standardly computed assuming the test statistics follow a chi-square distribution with degree of freedom ( distribution). The ATR analogs of these methods also assume this same distribution and are less powerful than their counterparts if .
In contrast, if and the ATR analog of the F-test uses the F-distribution with and degrees of freedom while the F-test assumes a distribution with and degrees of freedom; in this case, since the denominator degrees of freedom depends on the number of covariates. Thus, the ATR analog of the F-test may be slightly anti-conservative if and/or the number of covariates is large relative to the sample size. This is quite unlikely given the large sample sizes of current GWAS, the large values of the test statistic required to reject the null, and the fact that the expected value of the sample coefficient of determination increases with increasing number of predictors, even when the variant is independent of the predictors at the population level, in which case for large samples.
For a fixed number of covariates, the score, Wald, likelihood ratio, and F test statistics asymptotically converge to the same random variable (almost surely) under the null and local alternatives ( i.e. when the effect size tends to zero asymptotically). Similarly, their ATR analogs each converge to . Asymptotically, each of the ATR test statistics follows a scaled distribution whose scaling factor is less than or equal to one and are, thus, conservative when . The exact finite sample distribution of the F statistic is known in the case where errors are normally distributed; the exact distributions of all the other test statistics can be derived easily given the above relationships.
For simplicity, we illustrate the conservative nature of ATR for single-variant tests under asymptotic conditions. Here, we have . The relationship between the p-values generated by the score test and its ATR analog is non-linear; the ATR test becomes more conservative as the p-value threshold for declaring significance () becomes more stringent. Figure 1 shows power of the ATR test with for values ranging from to where the effect size for each value is chosen to yield power for the score test. At the usual GWAS threshold of , the power of the ATR test is about 76%. Figure 2 shows how, for fixed , the ATR test becomes less powerful as increases (again, with effect size chosen to yield power for the score test).
Figure 1:
Power of ATR analog of single-variant score test when with varying stringency of statistical significance displayed in the negative log ten scale. Effect sizes vary as a function of to yield 80% power for the score test.
Figure 2:
Power of ATR analog of single-variant score test with increasing for . The effect size was chosen to yield 80% power for the score test.
Burden tests with ATR
The relationships derived for the single-variant tests are directly applicable to burden tests. Burden tests typically assume the same multiple linear regression model presented in the previous section with replaced by where are genetic variants (columns of are weights (and ), and is the (weighted) burden of alternate alleles (or genotype imputation-based dosages) from the variants. For burden tests, is the sample coefficient of determination obtained by regressing onto . Given and , the maximum possible value for is obtained when the weight vector is a scalar multiple of the eigenvector of corresponding to the maximum eigenvalue and the maximum is equal to the maximum eigenvalue.
Classical omnibus tests with ATR
The omnibus null hypothesis that none of the variants are associated with trait (conditional on covariates) can be tested with the omnibus/multivariate score, Wald, likelihood ratio, and F tests. As before, these tests are asymptotically equivalent and we consider the score test as an exemplar. Unlike the single-variant case, no deterministic relationship exists between and when (that is, can take multiple values for any given value of ). However, we show that
where and are the maximum and minimum canonical correlations between the variants and covariates. Recall that asymptotically follows a distribution with non-centrality parameter . Under the null, the distribution of depends only on the parameter . Asymptotically, follows the same distribution as the random variable where are the distinct eigenvalues of (in decreasing order so that and with possibly smaller than ) and the random variables are mutually independent with (see Appendix). Since is independent of , we have and p-values for are calculated assuming a central distribution.
Note that the score test yields the same power for all effect size vectors such that where is a constant. Although the actual difference in power between and depends on the true value of , we show that, amongst all that yield the same power for the score test, the ATR analog achieves minimum power when is a scalar multiple of the eigenvector of corresponding to the maximum eigenvalue (see Appendix). Here, and for . Thus, the maximum possible power loss of the ATR analog of the score test (relative to the score test) is completely characterized by the set of canonical correlations between the variants and covariates.
Figure 3 shows, for fixed and variants, the power of ATR analog across a range of with effect size chosen to yield power for the omnibus score test. We calculated tail probabilities for the distribution of using Davies’ method as implemented in the R package CompQuadForm (de Micheaux, P. L., & de Micheaux, M. P. L., 2017). We consider two situations. First, if the remaining canonical correlations are zero, the maximum possible power loss is slightly larger than that for the single-variant case for and power loss increases as increases ( shown in Figure 3). Second, if all canonical correlations are equal to follows the scaled chi-squared distribution , and the maximum possible power loss is equal to the minimum possible power loss; thus, for a given value of , this constitutes the worst-case scenario for ATR (Figure 3). Note that the maximum number of non-zero canonical correlations cannot exceed . Thus, the second scenario is unlikely to occur in practice.
Figure 3:
Power of ATR analog of the multi-variant (omnibus) score test (Y-axis) with (black) and (red) variants. X-axis shows the maximum canonical correlation between the variants and covariates. Solid line: power when the other canonical correlations are 0. Dashed line: power when other canonical correlations are equal to the maximum correlation. Effect sizes for the set of variants are chosen to yield 80% power for the omnibus score test and minimum power for the ATR analog (see text) with .
DISCUSSION
The ATR approach is often used in genetic association studies (Randall et al., 2013; UK10K Consortium, 2015; Tachmazidou et al., 2017; Kanai et al., 2018; Styrkarsdottir et al., 2019; Niarchou et al., 2020), and several papers have used simulation to assess its properties at modest significance thresholds (Demissie & Cupples, 2011; Xing et al., 2011; Che et al., 2012). However, to our knowledge no papers have presented analytic evaluations of ATR or considered significance thresholds appropriate for GWAS. The Frisch-Waugh-Lovell theorem (Frisch & Waugh, 1933; Lovell, 2008) demonstrates that when the target of inference is confined to a subset of predictors in the multiple linear regression model (e.g. genetic variants), OLS analysis can be achieved as a two-stage method by regressing the covariate adjusted trait onto the covariate adjusted variants. Thus, the ATR strategy of adjusting the trait but not the variants is formally justified in the context of multiple linear regression only when variants and covariates are uncorrelated.
It may seem that score-tests like those presented above or SKAT employ the same strategy as ATR. Indeed, for single-variant analyses the score-statistic for linear models () is based on the adjusted trait and unadjusted variant. However, the score test-statistic (calculated by squaring the score-statistic and dividing by its estimated variance) does depend on the adjusted variants. Indeed, it can be shown that ATR over-estimates the variance of the score-statistic by a factor of due to using unadjusted variants in the variance calculation. Our derivations also show that single-variant OLS based inference can be fully recovered from the ATR based inference given the summary statistic for each variant. For multi-variant analyses, the entire matrix is required.
For single-variant association tests, previous papers show by computer simulation that ATR is less powerful than the (theoretically justified) two-sided t and Wald tests when the variant is correlated with the covariates (Demissie & Cupples, 2011; Xing et al., 2011; Che et al., 2012; Sofer et al., 2019). We extend previous results by deriving the exact distribution of the ATR analogs for single-variant Wald, likelihood ratio, score, and F tests, and the asymptotic distributions for gene-based burden and score tests, and assessing size and power at significance levels appropriate for GWAS.
For single-variant tests, we show that the loss of power of the ATR method is completely characterized by the coefficient of determination () obtained by regressing the variant onto the covariates, with the power loss increasing with increasing . Further, we show that loss of power increases as the p-value cutoff used to declare significance becomes more stringent. Characterizing power loss for the ATR analogs of gene-based tests is more complex. For gene-based score tests, the power loss depends on both the (true) strength of association between each variant and the outcome, and the correlation between each variant and the covariates. Power loss is greater when the subset of variants driving the association is also the subset that is driving the canonical correlation between variants and covariates. For the ATR analogs of the multiple linear regression omnibus test of association, we show that the maximum possible power loss is completely characterized by the canonical correlations between the variants and covariates with maximum power loss increasing with increasing values of any of the canonical correlations. When there is only a single non-zero canonical correlation, the maximum power loss is similar to the single-variant case.
At the significance threshold of typically used in GWAS, an of results in power decreasing from 80% (for the two-sided t test) to about 71% for the single-variant ATR test. Thus, we recommend that ATR based methods only be used when the for the majority of variants is expected be substantially less than 0.1. We re-iterate that sets of covariates not associated with the variant do not result in loss of power due to using ATR; in fact, they increase power if they explain some of the trait variance (Robinson & Jewell, 1991). Covariates that are associated with the trait but not genetic variants in a population based sample may be associated with genetic variants in studies that sample participants non-randomly (Munafo et al., 2018; Greenland et al., 1999); for example, two variables that both cause a disease but are independent in a population will be associated in a case-control sample (Monsees et al., 2009).
In GWAS, the most commonly included covariates that are likely to be correlated with a large number of variants are indicators of genetic ancestry (e.g. principal components). The distribution of correlation depends on the degree of population structure in the sample and the mean across variants is (approximately) the sample . For intra-continental samples, typically but for inter-continental samples it can be [The 1000 Genomes Project Consortium, 2015]. As a further example, we calculated between ~750,000 genotyped variants and the first 2, 5, and 10 genetic principal components for ~409,000 participants with white-British ancestry in the UK Biobank (details of SNP QC and PCA generation in Bycroft et al., 2018) and found all values were < 0.05. In the analysis including the remaining 78,000 non-white participants (total sample size ~487,000), 6% of variants showed and 2.5% showed (the results were approximately similar with 2, 5, and 10 PCs).
Other commonly included covariates that may be correlated with variants are intermediate traits lying in between the gene and primary trait in the causal pathway, and indicators of sample processing or batch effects. For intermediate traits that are genetically complex, values of will typically be much smaller than 0.1. The situation with batch effects is less clear, especially for sequencing data which are sensitive to both sample processing and genotype calling methods. Finally, variants which are known to be associated with the trait may also be included as covariates, especially in fine mapping analyses or while searching for multiple independent signals within the same locus. Here, we recommend against using ATR based methods since there is potentially a large power loss for variants in even moderate linkage disequilibrium with the associated variant.
In multiple-variant tests such as burden and omnibus tests (like the F-test or SKAT), we note that least-squares effect size estimator for any particular variant may be biased either towards or away from the null for ATR. Thus, although ATR based tests are valid for the omnibus hypothesis that none of the variants are associated, an ATR based test for the conditional effect of a variant given the remaining variants may not be valid. This is of particular importance for post-hoc testing when the omnibus test is rejected and the analyst wishes to identify the subset of variants driving the association. We recommend against using ATR for such purposes.
When the distribution of the trait differs substantially from the normal distribution, ATR based methods are commonly used in conjunction with applying the inverse normal transform to the adjusted trait. Sofer et al. (2019) show that testing for association between the transformed adjusted trait and unadjusted variants may lead to increased Type 1 error and instead recommend using adjusted variants. McCaw et al. (2019) implement an omnibus test with this strategy.
Finally, we have assumed throughout that the multiple linear model (M1) is appropriate to answer the research question at hand and that truly measures the effect of interest. This necessitates including certain covariates (e.g. confounders), excluding others (e.g. colliders; see Greenland et al., 1999) and accounting for sample-selection effects (Munafo et al., 2018). For example, Aschard et al. (2015) show that simply adjusting for heritable covariates may lead to biased estimates of the direct (unmediated) effect of the variant on the trait and may lead to increased Type 1 error. We note that when OLS analysis of the full regression model results in increased Type 1 error, ATR will also be unable to fully control Type 1 error (although, the magnitude of Type 1 error will be lower with increasing ). Thus, ATR is invalid whenever OLS analysis of the full regression model is invalid.
In summary, we derive distributions of the ATR analogs of commonly used association test statistics. We show that ATR based methods are conservative when variants are correlated with covariates. We quantify the power loss and recommend that ATR based methods be used only when the squared correlation between variants and covariates can be confidently bounded to be substantially smaller than 0.1. We note that for commonly included covariates like age, gender and known or inferred ancestry, this is typically true and ATR based methods will likely result in negligible power loss. However, we reiterate that ATR is an ad-hoc methodology. Thus, we recommend that analysts carefully choose covariates based on a plausible causal model (accounting for sample-selection effects) and employ estimation/hypothesis-testing methods that are theoretically justified for those models.
Acknowledgments
Grant Number: NIH NHGRI HG009976
Appendix
All notation in the Appendix is as defined in the main text.
ATR estimator for
The OLS estimator for is given by where is the residual vector obtained from regressing onto , and . Note that . Since ATR simply replaces with , we have
The second equality holds because (where is the sample mean of ) and . The third equality holds because which follows from the expression for . The fourth equality follows with straightforward algebra. Note that the eigenvalues of are the canonical correlations between and . Thus, when each variant is uncorrelated with all the covariates, all the eigenvalues of are and .
When the model contains only one variant (), we have where is the coefficient of determination obtained by regressing the variant onto the covariates.
Relationship between the score test statistic and its ATR analog
The score test-statistic for testing is given by
where is the maximum likelihood estimator (MLE) for under the null (Vandaele 1981).
Note that since . Thus, we have
Equivalently, we have
Recall that, for all vectors such that (for any constant ) the generalized Rayleigh quotient is bounded below and above by the minimum and maximum eigenvalues of . Thus, we have
where and are the smallest and largest eigenvalues of . The lower (upper) bound is attained when is parallel to the eigenvector corresponding to maximum (minimum) eigenvalue of . When each variant is orthogonal to each of the covariates we have and .
When the model contains only one variant, the above relationship simplifies to the deterministic relationship (with as defined previously). For , the relationship is not deterministic (that is, can take multiple values for any given value of ) unless all the variants are collinear. We can use the relationships between the score, Wald, likelihood-ratio, and F test statistics (Vandaele 1981) to derive exact expressions for the relationships between each of these tests and their ATR analogs for single variant models. We state these relationships in the main text (but omit the straightforward algebra).
Asymptotic distribution of
Asymptotically, converges in distribution to the distribution of the quadratic form with . With suitable regularity conditions, asymptotically with . Baldessari (1967) derived the distribution of quadratic forms in multivariate normal variables. Since is symmetric and positive definite, there exists an invertible matrix such that and with an diagonal matrix. Thus, we have that ; that is, the columns of are the eigenvectors of (and ) and the element of the diagonal of is with the eigenvalue of corresponding to the column of . Let denote the distinct eigenvalues of with . Let be the diagonal matrix which has elements where has elements and otherwise. Then, from Baldessari (1967, Theorem 1) and some trivial algebra, follows the same distribution as , where (that is, a non-central chi-squared distribution with degrees of freedom and non-centrality parameter and is the geometric multiplicity of .
Recall that, asymptotically, with . Thus, we have . When lies in the space spanned by the eigenvector(s) of corresponding to the (distinct) eigenvalue , we have and . Consider the set of vectors that yield the same power for the score test (that is, all vectors for which for a given ). Unlike , the power of may differ when takes different values in this set. We use a result derived by Matthew and Nordstöm (1997) to find values in that lead to minimum power for :
Theorem 3 (Matthew and Nordstöm, 1997). Let and be distributed, respectively, as and , , with independent and independent. Then
holds for all nonnegative ’s satisfying if and only if
In the above theorem, denotes that the random variable stochastically dominates . From the above theorem and preceding details of the distribution of , it follows that distribution followed by when lies in the space spanned by the eigenvectors of corresponding to the maximum eigenvalue is dominated by the distribution followed by when takes any other value in .
Footnotes
DATA AVAILABILITY STATEMENT
Data sharing not applicable – no new data generated.
CONFLICT OF INTEREST STATEMENT
The authors have no conflict of interest to declare.
REFERENCES
- 1000 Genomes Project Consortium. (2015). A global reference for human genetic variation. Nature, 526(7571), 68–74. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Aschard H, Vilhjálmsson BJ, Joshi AD, Price AL, & Kraft P (2015). Adjusting for heritable covariates can bias effect estimates in genome-wide association studies. The American Journal of Human Genetics, 96(2), 329–339. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bycroft C, Freeman C, Petkova D, Band G, Elliott LT, Sharp K, ... & Cortes A (2018). The UK Biobank resource with deep phenotyping and genomic data. Nature, 562(7726), 203–209. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Che R, Motsinger-Reif AA, & Brown CC (2012). Loss of power in two-stage residual-outcome regression analysis in genetic association studies. Genetic epidemiology, 36(8), 890–894. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Demissie S, & Cupples LA (2011). Bias due to two-stage residual-outcome regression analysis in genetic association studies. Genetic epidemiology, 35(7), 592–596. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Frisch R, & Waugh FV (1933). Partial time regressions as compared with individual trends. Econometrica: Journal of the Econometric Society, 387–401. [Google Scholar]
- Greenland S, Pearl J, & Robins JM (1999). Causal diagrams for epidemiologic research. Epidemiology, 37–48. [PubMed] [Google Scholar]
- Kanai M, Akiyama M, Takahashi A, Matoba N, Momozawa Y, Ikeda M, ... & Kubo M (2018). Genetic analysis of quantitative traits in the Japanese population links cell types to complex human diseases. Nature genetics, 50(3), 390–400. [DOI] [PubMed] [Google Scholar]
- Lee S, Abecasis GR, Boehnke M, & Lin X (2014). Rare-variant association analysis: study designs and statistical tests. The American Journal of Human Genetics, 95(1), 5–23. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lovell MC (2008). A simple proof of the FWL theorem. The Journal of Economic Education, 39(1), 88–91. [Google Scholar]
- Mathew T, & Nordström K (1997). Inequalities for the probability content of a rotated ellipse and related stochastic domination results. The Annals of Applied Probability, 7(4), 1106–1117. [Google Scholar]
- McCaw ZR, Lane JM, Saxena R, Redline S, & Lin X (2019). Operating characteristics of the rank-based inverse normal transformation for quantitative trait analysis in genome-wide association studies. Biometrics. [DOI] [PMC free article] [PubMed] [Google Scholar]
- de Micheaux PL, & de Micheaux MPL (2017). Package ‘CompQuadForm’. CRAN Repository. [Google Scholar]
- Monsees GM, Tamimi RM, & Kraft P (2009). Genome-wide association scans for secondary traits using case-control samples. Genetic Epidemiology: The Official Publication of the International Genetic Epidemiology Society, 33(8), 717–728. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Munafò MR, Tilling K, Taylor AE, Evans DM, & Davey Smith G (2018). Collider scope: when selection bias can substantially influence observed associations. International journal of epidemiology, 47(1), 226–235. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Niarchou M, Byrne EM, Trzaskowski M, Sidorenko J, Kemper KE, McGrath JJ, ... & Wray NR (2020). Genome-wide association study of dietary intake in the UK biobank study and its associations with schizophrenia and other traits. Translational Psychiatry, 10(1), 1–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Randall JC, Winkler TW, Kutalik Z, Berndt SI, Jackson AU, Monda KL, ... & Workalemahu T (2013). Sex-stratified genome-wide association studies including 270,000 individuals show sexual dimorphism in genetic loci for anthropometric traits. PLoS Genet, 9(6), e1003500. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Robinson LD, & Jewell NP (1991). Some surprising results about covariate adjustment in logistic regression models. International Statistical Review/Revue Internationale de Statistique, 227–240. [Google Scholar]
- Sofer T, Zheng X, Gogarten SM, Laurie CA, Grinde K, Shaffer JR, ... & Lange L (2019). A fully adjusted two-stage procedure for rank-normalization in genetic association studies. Genetic epidemiology, 43(3), 263–275. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Styrkarsdottir U, Stefansson OA, Gunnarsdottir K, Thorleifsson G, Lund SH, Stefansdottir L, ... & Ivarsdottir EV (2019). GWAS of bone size yields twelve loci that also affect height, BMD, osteoarthritis or fractures. Nature communications, 10(1), 1–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tachmazidou I, Süveges D, Min JL, Ritchie GR, Steinberg J, Walter K, ... & McCarthy S (2017). Whole-genome sequencing coupled to imputation discovers genetic signals for anthropometric traits. The American Journal of Human Genetics, 100(6), 865–884 [DOI] [PMC free article] [PubMed] [Google Scholar]
- UK10K consortium. (2015). The UK10K project identifies rare variants in health and disease. Nature, 526(7571), 82–90. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vandaele W (1981). Wald, likelihood ratio, and Lagrange multiplier tests as an F test. Economics Letters, 8(4), 361–365. [Google Scholar]
- Xing G, Lin CY, & Xing C (2011). A comparison of approaches to control for confounding factors by regression models. Human heredity, 72(3), 194–205. [DOI] [PubMed] [Google Scholar]



