Mendelian randomization (MR) can be variously dated as 67,1 33,2 283 or 164 (amongst others) years old. It is clear that in recent times there has been an exponential increase in publications on MR, both theoretical and applied.5,6 In addition to papers focused on MR, MR analyses are increasingly seen in the set of follow-up analyses for a genome-wide association study (GWAS), together with the obligatory bioinformatic and functional evidence. The Internationl Journal of Epidemiology compiled a special issue on MR to accompany the second international MR conference in 2015.7 The present issue repeats this for the fourth international conference. The speed of progress in the field is reflected in both the content of the issues, and by the fact that the volume of material now exceeds the capacity of a single issue. Further compilations of MR papers will appear in subsequent issues.
MR Made (Too?) Easy
The initial extended exposition of MR in 2003 included examples of two-sample Mendelian randomization (MR),4 but the rise in popularity of the two-sample approach came with the development of methods for MR using publicly-available summary data.8–10 It is now increasingly easy to perform an MR analysis without personally collecting any data. This is not necessarily a negative point, as such analyses have several advantages: they are able to use large data resources for many disease outcomes, and they can be replicated by anyone having access to the same data. However, it is also increasingly easy to perform an MR analysis without any critical thought. The formula—conduct a genome-wide association study, take all genome-wide significant variants, perform a two-sample MR analysis—can certainly increase one’s publication count. But can such an analysis be considered a contribution to the scientific literature, as it could have been (and probably has been11) performed already by a machine in a large automated pipeline for large numbers of risk factors and outcomes?
In this editorial, we discuss how subject-specific knowledge and methodological advances can elevate MR analyses from being simply the result of a recipe, and prevent papers being published that are essentially a compendium of numbers with little insight into causal mechanisms.
Hundreds of Variants
MR investigations often use all available genetic variants that are associated with a given trait.12 While this is understandable from the perspective of power, as datasets increase in size and ever more associated variants are discovered, power is becoming less of a crucial factor. The law of diminishing returns means that additional variants will explain less and less variance in the risk factor of interest, and so the additional power from including these variants in the analysis is often not justified, particularly given what is becoming known about pleiotropy (discussed below).
A potential advantage of using more and more variants is the potential to use more elaborate robust methods. However, whether it is better to use more variants and more complex methods, or a more curated set of variants and simpler methods, is still up for debate. For example, the recently proposed latent causal variable method presented the novel finding that body mass index (BMI) was not a clear cause of type 2 diabetes.13 Although a positive result was observed on exclusion of outliers from the analysis, this finding indicates that caution is needed in interpreting the results from such methods, and highlights the importance of data visualization.
Hundreds of Risk Factors
One approach that can potentially help is to use data on additional risk factors. Such data can be incorporated in many ways: to identify pleiotropic variants and exclude them from the analysis,14 to adjust for the potential effects of competing risk factors (such as in multivariable MR15,16), to identify genetic variants having a specific mechanism of association with the risk factor,17 or to find commonalities between clusters of variants having similar associations with the outcome.18 Such approaches are particularly important when there is heterogeneity between causal estimates based on different variants, and can help clarify why the heterogeneity occurs.19 Few MR analyses have a clean and unambiguous interpretation. Heterogeneity is not a complete barrier to causal inference, and provides an opportunity to understand why certain variants are associated (or more strongly associated) with the outcome than others.
A related statistical approach to MR is colocalization20 (and the related HEIDI test21). This assesses whether the same variants are driving the genetic associations with the risk factor and with the outcome (suggesting a causal effect of the risk factor on the outcome), or whether there are different variants driving the two associations (suggesting that the association is spurious or pleiotropic). Colocalization is particularly important to test when the MR analysis is based on a single gene region, such as for a protein risk factor using cis-acting variants.22
Universal Pleiotropy
Pleiotropy was recognized as a potential obstacle to MR from the early days,4 and was evoked particularly in response to MR investigations that downplayed the causal relevance of a favoured putative causal biomarker.23 Later the key distinction between vertical and horizontal pleiotropy was drawn,24 and it was recognized that much of the extensive pleiotropy observed across the human genome represented the former category, which is the essence of MR (i.e. genetic associations represent the downstream influence of a biological pathway) rather than being a problem. Ever larger studies detected variants with smaller and smaller effects on multiple phenotypes, and Sewall Wright’s notion of universal pleiotropy25 was reinvented as the omnigenic model,26 in which peripheral genes exert their effect on downstream traits through regulating core genes. When formalized,27 such a model can be operationalized in a multivariable MR framework, offering a way to test its veracity.
In some MR settings, there is an unresolved tension between whether to use an (often single) cis-variant as an instrument for a proximal phenotype (transcription, methylation, or protein, for example)—which has face validity, but does not allow for sensitivity analyses dependent on multiple instruments—or to use trans- as well as cis-variants and perform such sensitivity analyses. In some settings it is clear that in the latter case the variants are influencing the exposure of interest through another phenotype. For example, a genome-wide association study (GWAS) of C-reactive protein (CRP) identified several variants that primarily influence BMI as genome-wide significant hits for CRP,28 because BMI influences CRP.29 These would clearly be invalid instruments for an MR study of the causal effects of CRP, and would also violate the assumptions of many of the sensitivity analyses (e.g. the InSIDE assumption of MREgger). If the phenotypes are recognized, multivariable MR could resolve this situation, but there are likely to be unrecognized phenotypes in many such situations.
Biobanks and the Race to the Bottom
In addition to summary statistics, another fertile source of data for MR is biobanks, such as UK Biobank, a longitudinal population-based cohort study.30 While UK Biobank is rightly praised for its open data policies, this can also lead to investigators racing each other to be the first to publish an investigation using this powerful dataset. Unfortunately, hastily performed MR investigations are not likely to be the most reliable. Journals should be aware that just because a previous investigation has been performed does not mean that the last word has been spoken on that particular epidemiological question. Even if future investigators have access to exactly the same data, a superior analysis may be possible by being more careful about how the analysis is conducted.
Epidemiological Methodology
Traditionally, causal inference has relied on epidemiological design rather than sophisticated statistical methodology. In contrast, the trend in MR has been to rely on statistical methodology to provide robust causal inferences. Increasingly the combination of MR with epidemiologically powerful and sometimes novel designs is being seen. Cross-generational studies in genetics are an example of this. Liu et al.31 used parental disease outcomes to proxy for disease risk in their offspring (genome-wide analysis by proxy; GWAX). This design was originally conceived to maximize case numbers for diseases of old age but has the additional advantage of avoiding selection bias,32 as it is unlikely that parent’s cause of death would influence whether offspring data are available for analysis. MR studies are now extending this design to estimate causal effects of phenotypes on offspring outcomes, for example demonstrating that maternal smoking does lower offspring birthweight, but has little influence on other offspring health outcomes.33 In this issue of The International Journal of Epidemiolgy, Evans et al.34 develop a novel approach to inferring the effects of parental exposures on offspring outcomes within a structural equation modelling (SEM) framework. A potentially powerful combination of the strengths of twin and MR study designs, also through SEM, has recently been proposed.35
Other proposals to incorporate study design in MR include subset analyses,36 use of statistical interactions37 and sibling-based investigations.38 More convincing MR investigations can be performed when the structure of the data facilitates the analysis strategy.
Conclusions
In short, bigger and faster and even more extensive is not necessarily better. Automated procedures for performing analyses have a place for MR, whether performed in a high-throughput manner, or by a well-meaning human researcher trying to follow best practice (Figure 1). However, every epidemiological question is different and requires thought as to how to curate the data and the analysis plan to produce the most reliable inference. Maybe one day machine learning will have solved how to triangulate evidence from different sources, but for now MR is still a field where humans can have an edge over machines.
Humans and automation—how to find a harmonious balance? Image credit: Modern Times © Roy Export S.A.S.
Footnotes
Stephen Burgess: 0000-0001-5365-8760
George Davey Smith: 0000-0002-1407-8314
References
- 1.Fisher R. Statistical methods in genetics. Heredity. 1952;6:1–12. [Google Scholar]
- 2.Katan M. Apolipoprotein E isoforms, serum cholesterol and cancer. Lancet. 1986;1:507–08. doi: 10.1016/s0140-6736(86)92972-7. [DOI] [PubMed] [Google Scholar]
- 3.Gray R, Wheatley K. How to avoid bias when comparing bone marrow transplantation with chemotherapy. Bone Marrow Transpl. 1991;7(Suppl 3):9–12. [PubMed] [Google Scholar]
- 4.Davey Smith G, Ebrahim S. Mendelian randomization’: can genetic epidemiology contribute to understanding environmental determinants of disease? Int J Epidemiol. 2003;32:1–22. doi: 10.1093/ije/dyg070. [DOI] [PubMed] [Google Scholar]
- 5.Hartwig FP, Davies NM, Hemani G, Davey Smith G. Two-sample Mendelian randomization: avoiding the downsides of a powerful, widely applicable but potentially fallible technique. Int J Epidemiol. 2016;45:1717–726. doi: 10.1093/ije/dyx028. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Koellinger PD, de Vlaming R. Mendelian randomization: The challenge of unobserved environmental confounds. Int J Epidemiol. 2019 doi: 10.1093/ije/dyz138. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Burgess S, Timpson NJ, Ebrahim S, Davey Smith G. Mendelian randomization: where are we now and where are we going? Int J Epidemiol. 2015;44:379–88. doi: 10.1093/ije/dyv108. [DOI] [PubMed] [Google Scholar]
- 8.The International Consortium for Blood Pressure Genome-Wide Association Studies. Genetic variants in novel pathways influence blood pressure and cardiovascular disease risk. Nature. 2011;478:103–09. doi: 10.1038/nature10405. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Burgess S, Butterworth AS, Thompson SG. Mendelian randomization analysis with multiple genetic variants using summarized data. Genet Epidemiol. 2013;37:658–65. doi: 10.1002/gepi.21758. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Bowden J, Del Greco F, Minelli C, et al. Improving the accuracy of two-sample summary data Mendelian randomization: moving beyond the NOME assumption. Int J Epidemiol. 2019 doi: 10.1093/ije/dyy258. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Hemani G, Bowden J, Haycock PC, et al. Automating Mendelian randomization through machine learning to construct a putative causal map of the human phenome. bioRxiv. 2017 doi: 10.1101/173682. 173682. [DOI] [Google Scholar]
- 12.Thrift AP, Gong J, Peters U, et al. Mendelian randomization study of height and risk of colorectal cancer. Int J Epidemiol. 2015;44:662–72. doi: 10.1093/ije/dyv082. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.O’Connor L, Price A. Distinguishing genetic correlation from causation across 52 diseases and complex traits. Nat Genet. 2018;50:1728–734. doi: 10.1038/s41588-018-0255-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Corbin LJ, Richmond RC, Wade KH, et al. Body mass index as a modifiable risk factor for type 2 diabetes: Refining and understanding causal estimates using Mendelian randomisation. Diabetes. 2016;65:3002–007. doi: 10.2337/db16-0418. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Burgess S, Thompson SG. Multivariable Mendelian randomization: the use of pleiotropic genetic variants to estimate causal effects. Am J Epidemiol. 2015;181:251–60. doi: 10.1093/aje/kwu283. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Sanderson E, Davey Smith G, Windmeijer F, Bowden J. An examination of multivariable Mendelian randomization in the single sample and two-sample summary data settings. Int J Epidemiol. 2019 doi: 10.1093/ije/dyy262. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Walter S, Kubzansky LD, Koenen KC, et al. Revisiting Mendelian randomization studies of the effect of body mass index on depression. Am J Med Genet. 2015;168:108–15. doi: 10.1002/ajmg.b.32286. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Burgess S, Foley CN, Allara E, Staley JR, Howson JM. A robust and efficient method for Mendelian randomization with hundreds of genetic variants: unravelling mechanisms linking HDL-cholesterol and coronary heart disease. bioRxiv. 2019 doi: 10.1101/566851. 566851. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Bowden J, Hemani G, Davey Smith G. Detecting individual and global horizontal pleiotropy in Mendelian randomization: a job for the humble heterogeneity statistic? Am J Epidemiol. 2018;187:2681–685. doi: 10.1093/aje/kwy185. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Solovieff N, Cotsapas C, Lee PH, Purcell SM, Smoller JW. Pleiotropy in complex traits: challenges and strategies. Nat Rev Genet. 2013;14:483–95. doi: 10.1038/nrg3461. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Zhu Z, Zhang F, Hu H, et al. Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nat Genet. 2016;48:481–87. doi: 10.1038/ng.3538. [DOI] [PubMed] [Google Scholar]
- 22.Zheng J, Haberland V, Baird D, et al. Phenome-wide Mendelian randomization mapping the influence of the plasma proteome on complex diseases. bioRxiv. 2019 doi: 10.1101/627398. 627398. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Ridker P, Paynter N, Danik J, Glynn R. Interpretation of Mendelian randomization studies and the search for causal pathways in atherothrombosis: the need for caution. Metab Syndr Relat Disord. 2010;8:465–69. doi: 10.1089/met.2010.0071. [DOI] [PubMed] [Google Scholar]
- 24.Davey Smith G, Hemani G. Mendelian randomization: genetic anchors for causal inference in epidemiological studies. Hum Mol Genet. 2014;23:R89–98. doi: 10.1093/hmg/ddu328. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Wright S. Evolution and the Genetics of Populations. Vol 1: Genetics and Biometric Foundations. Chicago IL: University of Chicago Press; 1968. [Google Scholar]
- 26.Boyle EA, Li YI, Pritchard JK. An expanded view of complex traits: from polygenic to omnigenic. Cell. 2017;169:1177–186. doi: 10.1016/j.cell.2017.05.038. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Liu X, Li YI, Pritchard JK. Trans effects on gene expression can drive omnigenic inheritance. Cell. 2019;177:1022–034. doi: 10.1016/j.cell.2019.04.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Ligthart S, Vaez A, Võsa U, et al. Genome analyses of >200, 000 individuals identify 58 loci for chronic inflammation and highlight pathways that link inflammation and complex disorders. American J Hum Genet. 2018;103:691–706. doi: 10.1016/j.ajhg.2018.09.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Timpson N, Nordestgaard B, Harbord R, et al. C-reactive protein levels and body mass index: elucidating direction of causation through reciprocal Mendelian randomization. Int J Obes. 2011;35:300–08. doi: 10.1038/ijo.2010.137. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Sudlow C, Gallacher J, Allen N, et al. UK Biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLOS Med. 2015;12:e1001779. doi: 10.1371/journal.pmed.1001779. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Liu JZ, Erlich Y, Pickrell JK. Case-control association mapping by proxy using family history of disease. Nat Genet. 2017;49:325–31. doi: 10.1038/ng.3766. [DOI] [PubMed] [Google Scholar]
- 32.Gkatzionis A, Burgess S. Contextualizing selection bias in Mendelian randomization: how bad is it likely to be? Int J Epidemiol. 2019 doi: 10.1093/ije/dyy202. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Yang Q, Millard LAC, Davey Smith G. Proxy gene-by-environment Mendelian randomization study confirms a causal effect of maternal smoking on offspring birthweight, but little evidence of long-term influences on offspring health. bioRxiv. 2019 doi: 10.1101/601443. 601443. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Evans DM, Moen GH, Hwang LD, Lawlor DA, Warrington NM. Elucidating the role of maternal environmental exposures on offspring health and disease using two-sample Mendelian randomization. Int J Epidemiol. 2019 doi: 10.1093/ije/dyz019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Minică CC, Dolan CV, Boomsma DI, de Geus E, Neale MC. Extending causality tests with genetic instruments: An integration of Mendelian randomization with the classical twin design. Behav Genet. 2018;48:337–49. doi: 10.1007/s10519-018-9904-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.van Kippersluis H, Rietveld CA. Pleiotropy-robust Mendelian randomization. Int J Epidemiol. 2018;47:1279–88. doi: 10.1093/ije/dyx002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Spiller W, Slichter D, Bowden J, Davey Smith G. Detecting and correcting for bias in Mendelian randomization analyses using gene-by-environment interactions. Int J Epidemiol. 2019 doi: 10.1093/ije/dyy202. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Brumpton B, Sanderson E, Hartwig FP, et al. Within-family studies for Mendelian randomization: avoiding dynastic, assortative mating, and population stratification biases. bioRxiv. 2019 doi: 10.1101/602516. 602516. [DOI] [PMC free article] [PubMed] [Google Scholar]

