To the Editor: Day et al. present a striking example of how adjusting for heritable covariates correlated with the outcome in a genetic association study can bias genetic effect estimates—possibly creating a strong association where genotype has no causal effect on outcome. This bias was the focus of our recent report.1 As Day et al. point out, this phenomenon is a special case of the broader concept of collider-stratification bias2, 3. By choosing an outcome that is not associated with autosomal variation (gender), but strongly associated with the covariate (height), Day et al. demonstrate that collider-stratification bias can lead to many very strong false-positive signals. Their results also nicely confirm the direct relationship between gene-covariate effect estimates and gene-outcome effect estimates after adjustment for the covariate. The most significant p value for rs724016 after adjustment (Padj = 7 × 10−90) is impressive but actually expected—even without running the analysis. The most recent meta-analysis genome-wide association study (GWAS) of height reported a Z score of −26.9 (p = 3.2 × 10−158) for association between rs724016 and height for a sample size of 252,972.4 Assuming that the effect of the SNP on height is similar across the UK Biobank and GIANT (genetic investigation of anthropometric traits) consortium participants, and assuming a correlation between height and gender of 0.71, as derived from the 2013–2014 NHANES (National Health and Nutrition Examination Survey) survey for non-Hispanic white individuals over 20 years old,5 the expected chi-square in a height-adjusted analysis under the null hypothesis is . Similarly to the Day et al. experiment, this corresponds to a highly significant p value of 3 × 10−92. Interestingly, the height-sex correlation is almost equal to the value where the significance of the observed signal in the adjusted analysis matches the significance of the SNP-covariate association test, assuming the same sample size in both analyses. For a stronger correlation, the adjusted analysis would result in a more significant association on average. Then, looking at the SNP-covariate association p value would no longer be a good indicator of potential bias, presenting a worrisome situation.
Although the example of Day et al. confirms the potential for collider bias, their example assumes that the underlying causal model is understood so that the genetic effect of the variants can be safely interpreted. However, in practice, the underlying mechanism is generally unknown, or at least subject to debate, and for some investigators the question of whether to adjust for a covariate remains. Taking height as an example, it is intuitive to adjust for height when considering phenotypes such as pulmonary function, bone mineral density, intracranial volume, and body measurements in general, given that they are proportional to height. The common argument for the adjustment has been about focusing on factors associated with the outcome independently of height—in other words, factors altering this proportionality. The adjustment will partly fulfill this goal by (1) reducing the signal at SNPs harboring positive pleiotropy (i.e., SNPs positively associated with both height and the outcome) and (2) enhancing the signal for variants associated with the primary outcome only and those with negative pleiotropic effects. However, because of height polygenicity, it will also (3) induce a false signal at all genetic variants associated with height only (as observed in the experiment by Day et al.). Estimation of the shared genetic correlation between the outcome and the covariate, e.g., as done by Bulik-Sullivan et al.,6 can provide a first approximation of the proportion of SNPs falling into scenarios one, two, and three described above, and thus can help with the decision of whether or not to adjust.
References
- 1.Aschard H., Vilhjálmsson B.J., Joshi A.D., Price A.L., Kraft P. Adjusting for heritable covariates can bias effect estimates in genome-wide association studies. Am. J. Hum. Genet. 2015;96:329–339. doi: 10.1016/j.ajhg.2014.12.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Greenland S. Quantifying biases in causal models: classical confounding vs collider-stratification bias. Epidemiology. 2003;14:300–306. [PubMed] [Google Scholar]
- 3.Greenland S., Pearl J., Robins J.M. Causal diagrams for epidemiologic research. Epidemiology. 1999;10:37–48. [PubMed] [Google Scholar]
- 4.Wood A.R., Esko T., Yang J., Vedantam S., Pers T.H., Gustafsson S., Chu A.Y., Estrada K., Luan J., Kutalik Z., Electronic Medical Records and Genomics (eMEMERGEGE) Consortium. MIGen Consortium. PAGEGE Consortium. LifeLines Cohort Study Defining the role of common variation in the genomic and biological architecture of adult human height. Nat. Genet. 2014;46:1173–1186. doi: 10.1038/ng.3097. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Centers for Disease Control and Prevention (CDC), National Center for Health Statistics (NCHS). National Health and Nutrition Examination Survey Data, 2013-2014. http://www.cdc.gov/nchs/nhanes/nhanes_questionnaires.htm.
- 6.Bulik-Sullivan B., Finucane H.K., Anttila V., Gusev A., Day F.R., Loh P.R., Duncan L., Perry J.R., Patterson N., Robinson E.B., ReproGen Consortium. Psychiatric Genomics Consortium. Genetic Consortium for Anorexia Nervosa of the Wellcome Trust Case Control Consortium 3 An atlas of genetic correlations across human diseases and traits. Nat. Genet. 2015;47:1236–1241. doi: 10.1038/ng.3406. [DOI] [PMC free article] [PubMed] [Google Scholar]