Abstract
Mendelian randomization (MR) makes causal claims by treating genetic variation in an analogous way to randomization in a clinical trial. MR investigations can be viewed as analogous to a randomized encouragement design, in that genetic variants do not determine the precise level of an exposure, but increase liability to it. As such, an MR estimate typically does not represent an achievable or well-defined causal effect in terms of the exposure, as it reflects the impact of a life-long shift in the trajectory of the exposure, which likely differs between individuals. We advocate for MR investigations to be performed to assess evidence for a causal hypothesis, rather than to estimate a well-defined causal quantity. MR estimates are useful to combine evidence across genetic variants, to assess the validity of variants as instruments, to provide confidence intervals, and to compare estimates across outcomes. However, numerical estimates from MR should not be over-interpreted. The value of an MR investigation is not to quantify the magnitude of effect for a well-defined intervention in the exposure. Instead, it provides a distinct source of evidence to increase or decrease confidence in a causal hypothesis, which can be triangulated with evidence from other sources.
Background
Modern approaches to causal inference in epidemiology tend to emphasize the use of formal frameworks to express and estimate well-defined causal effects (1,2). A causal effect is well-defined when it is a clear and unambiguous comparison between different counterfactual states (3). Causal inference for effects which are not well-defined can be hampered by ambiguities about the nature of an exposure or intervention, its timing, and the population under investigation. If an investigator wants to estimate a precise causal quantity (the precise target of estimation is known as an “estimand”), expressing the quantity in terms of a target trial helps clarify unspoken assumptions in the analysis of data that can lead to bias if ignored (1). However, this is not the only valid paradigm for causal inference. Historical approaches to causal inference were less reliant on formalism, and tended to emphasize study design [e.g. Bradford Hill 1948 (4)] or the qualitative synthesis of diverse types of evidence [e.g. Bradford Hill 1965 (5)] to justify causal claims.
Mendelian randomization (MR) was introduced as a way of exploiting the random allocation of genetic variants to infer the causal effect of an exposure on a given health outcome using observational data (6–8). Its initial exposition effectively argued that a valid MR investigation requires each selected genetic variant to satisfy the three core instrumental variable (IV) assumptions (6), but did not use the language of instrumental variables. Later researchers formalized these assumptions using the terminology of instrumental variables (9,10): that the variant is associated with the exposure of interest, that there are no confounding pathways between the variant and outcome, and that the variant can only affect the outcome via the exposure. An association between a genetic variant which is a valid instrumental variable for the exposure of interest and the outcome implies that the exposure has a causal effect on the outcome (11–13).
Methods for the estimation of causal effects using instrumental variables, developed in the field of econometrics over the past century, have been adopted by the epidemiological community (14). Indeed, MR has been defined by some as simply “instrumental variable analysis with genetic instruments” (11). In most cases, MR investigations have provided estimates from formal instrumental variable methods (15). In occasional cases, MR investigations have simply reported on the presence and direction of association of genetic variants with the outcome, rather than providing an estimate (16,17). For example, genetic variants that modify the metabolism of carcinogens present in tobacco smoke inhalation have been used to assess the effects of smoking on bladder cancer risk (18). In this case, it would not be possible to provide a numerical estimate as the variant does not associate with any smoking behaviour, and measures of metabolism rate for these carcinogens were not available. Indeed, the trend towards estimation is symptomatic of a wider shift in MR from biologically-justified analyses to statistically-driven analyses applied without biological consideration (19,20).
We argue that MR investigations can provide useful information even though the assumptions required for the literal interpretation of causal point estimates expressed in terms of a unit change in the exposure are implausible (21). Specifically, a conventional MR analysis tests if a lifelong predisposition to greater exposure levels is associated with an increased (or decreased) risk of the outcome. As such, MR investigations can be motivated as using genetic variants to assess a causal hypothesis rather than to estimate a practically-relevant causal parameter.
MR can be viewed as analogous to an encouragement design in which people are randomized to an intervention which impacts their liability to elevated or lowered levels of an exposure (Figure 1). In any randomized trial, the causal effect is well-defined with respect to allocation to the intervention. However, unless the exposure regimes in the intervention and control arms of the trial are precisely defined and strictly enforced such that they are homogeneous across individuals, the causal effect cannot be so easily interpreted as a well-defined causal effect in terms of the exposure.
Figure 1.
The analogy between an MR analysis for the effect of smoking and a hypothetical randomized encouragement design using social support to incentivize smoking cessation. In an encouragement design, people are not directly randomized to an exposure regimen (e.g. smoking a fixed number of cigarettes per day), but instead to some external factor (e.g. social support) which influences liability to the exposure. This is similar to an MR investigation where people are “randomized” to a genetic variant that influences levels of an exposure, rather than setting the exposure to a particular value. In both designs, the causal contrast is well-defined with respect to the intervention (i.e. social support or allele assignment). However, additional assumptions are required to interpret estimates in terms of a change in the exposure (i.e. smoking). An intention-to-treat type effect in terms of the intervention can be estimated, but a per-protocol effect in terms of the exposure cannot typically be estimated.
Likewise, if the instrumental variable assumptions hold, the MR estimate based on a genetic variant is well defined with respect to allocation to that genetic variant, and represents the effect of changing an individual’s genotype at conception, or equivalently changing their exposure trajectory from conception in the same way as the genetic variant changes the exposure (same mechanism of intervention, same timing, and so on). But such a change likely differs from the intervention on the exposure to be implemented in practice. Using the language of Pearl’s ladder of causal inference (22), the MR estimate is defined in terms of counterfactuals (third rung), but the intervention to achieve those counterfactuals is not practical (second rung).Hence, rather than predict the quantitative impact of a specific intervention, the role of an MR estimate is to increase or decrease our qualitative confidence in an exposure as a causal risk factor for the outcome.
While we are not the first to advocate the use of MR for hypothesis testing rather than estimation (12), initial proposals suggested simply reporting the genetic association with the outcome, which is feasible with a single genetic variant, but overly simplistic when there are multiple genetic variants. Equally, there is strong advice in the epidemiological literature discouraging the reporting of results as p-values without accompanying estimates and confidence intervals (23). MR estimates represent the causal effect of a shift in lifetime liability to the exposure, although the specific magnitude and timing of the shift will likely differ between individuals (24). As such, our proposal is that MR investigations report an estimate with an appropriate measure of statistical uncertainty, but do not take an overly literal interpretation of the precise numerical estimate. We suggest that MR investigations follow existing guidelines when reporting MR estimates (23,25).
In the next section, we outline some reasons why MR estimates should not be interpreted too literally. We then discuss motivations for causal hypothesis testing. In the final section, we explore some practical implications of our arguments.
The interpretation of Mendelian randomization estimates
Formal statistical assumptions for causal estimation
Two quantities that are typically targeted in instrumental variable estimation are the average causal effect, representing the population average difference in the outcome resulting from an intervention in the exposure; and the local average causal effect, representing the average difference in the outcome resulting from an intervention in the exposure among those whose value of the exposure is influenced by the instrumental variable (26). Estimation of the average causal effect requires a homogeneity assumption; a sufficient condition is that the effect of the instrument on the exposure is constant in the population (27). One challenge to the homogeneity assumption is the existence of genotype-exposure interactions (28). Estimation of the local average causal effect requires the monotonicity assumption; the effect of the instrument on the exposure is in the same direction for all individuals in the population. However, since we cannot know whose exposure was changed by the instrument, local treatment effects are applicable to an unknown subgroup and not necessarily translatable to any applied setting (29). Causal bounds can be used as an alternative to monotonicity and homogeneity (30), but these are often too imprecise to be useful in practice. Interpreting estimates also requires assuming linearity of the exposure effect on the outcome (11,31). If the MR investigation is conducted in a two-sample setting, then transportability of estimates between the two datasets is required (32). The formal statistical assumptions for causal estimation are stringent, and often implausible.
In contrast, hypothesis testing does not require these parametric assumptions (33–35). The sharp causal null hypothesis that changes in the exposure do not influence the outcome for any individual in the study population can be assessed by simply testing for an association between the genetic instrument and the outcome (or equivalently, an association between genetically-predicted levels of the exposure and the outcome) (12).
Contextual assumptions for causal estimation
Further to these statistical assumptions, any application of MR requires careful consideration of how the genetic variants affect the exposure, and how this compares to any proposed intervention on the exposure. To quantify the impact of an intervention, MR requires that genetic variants impact the exposure in the same way as the proposed intervention (i.e. both have quantitively and qualitatively similar effects). This is known as the gene—environment equivalence assumption (14). However, interventions typically differ from the genetic change in the exposure in a number of ways. For example, genetic effects on exposures tend to be lifelong, whereas clinical interventions are typically performed on adult patients. Thus, even when the gene—environment equivalence assumption holds, the literal interpretation of an MR estimate is the effect of an intervention which can never be exactly replicated in practice.
In contrast, to assess the mechanistic relevance of an exposure simply requires that genetic interventions are qualitatively similar to the relevant environmental intervention, which is often plausible. Hence, under a weaker version of the gene—environment equivalence assumption stating that the genetic variants affect the exposure via the same biological mechanism as the intervention, we can test whether interventions on this mechanism will affect the outcome.
Defining the causal risk factor
A further limitation of causal estimation is our ability to identify the true causal risk factor (12). When the specific function of a genetic variant is unclear, a causal estimate stated in terms of a particular exposure may be misleading due to misidentifying the true causal agent (36). An example is the effect of interleukin-6 receptor signalling on coronary heart disease risk (37). Measurement of the relevant causal risk factor (intracellular interleukin-6 receptor signalling) in live patients is impractical. Alternatively, the MR estimates could use a non-causal proxy of the underlying causal risk factor as the nominal exposure for the statistical analysis, such as C-reactive protein (a downstream marker of inflammation). Assuming the variants are valid instruments for interleukin-6 receptor signalling, any resulting estimates cannot be interpreted as the effect of the measured biomarker, but instead represent the effect of the unmeasured causal risk factor – in this case, interleukin-6 receptor signalling. Therefore, when presenting or interpreting an MR estimate, it is important to think carefully about the precise risk factor under investigation, and how the genetic variants affect this risk factor versus how they affect the trait that has been measured (38,39).
In contrast, we can assess a causal hypothesis expressed in terms of a broad exposure without being explicit about the precise causal risk factor when the relevant mechanism can only be ascertained crudely. For example, lifetime smoking index is a composite measure of smoking initiation, duration, heaviness, and cessation (40). Genetic variants associated with lifetime smoking index can provide evidence on the causal effect of smoking without differentiating between the precise effects of smoking initiation, duration, heaviness, or cessation. This is under the assumption that any effect of the variants acts on the outcome only via one or more of these exposures. Similarly, genetic variants associated with body mass index can provide evidence on the causal effect of adiposity without an explicit claim that the risk factor is ‘weight divided by the square of height’ (the definition of body mass index), assuming that the genetic variants influence the outcome via an adiposity-related pathway.
Accounting for time
MR estimates represent the causal effect of a change in the exposure, but one equivalent to changing an individual’s genetic code at conception. They therefore represent lifetime effects, whereas any intervention will be performed at a later time to the genetic ‘intervention’ and represents an effect with a limited duration at a specific time (41,42). Transporting estimates from MR analyses to clinical interventions therefore requires assumptions about how the intervention varies over time. Most exposures are not temporally constant, which can additionally bias naive MR estimates (43).
In contrast, without any further assumptions, hypothesis testing can assess the sharp causal null hypothesis – that changes in the exposure do not affect the outcome for the participants assessed in the study (33). While investigators should be careful about extrapolating the lifelong effects assessed in MR to the impact of proposed time-limited interventions, it is usually reasonable to believe that MR estimates are a reliable guide to the existence and direction of causal effects in practice.
In defence of causal hypothesis testing
Causal hypothesis testing can be sufficient
Some scientific research is undertaken to obtain a precise numerical answer, particularly in the field of public health. However, much research is done to understand aetiological relationships between variables. For example, pre-clinical pharmaceutical research and Phase 2 trials aim to find evidence for target efficacy, i.e. to learn if a modifiable trait or mechanism has a potential causal effect on a disease outcome. It is only in a Phase 3 trial that the goal is to estimate the magnitude of any precisely-defined effect. Within the framework of drug development, we see Mendelian randomization as contributing evidence to the early stages on target efficacy, rather than the final phase of estimation. While further research is needed to understand how findings translate to real-world practice, there is value in assessing the impact of a shift in the distribution of an exposure on an outcome even if the shift cannot be precisely defined or replicated in practice. Increasing (or decreasing) our confidence in the causal status of a proposed mechanism is a worthwhile aim, even if the specific intervention on the mechanism is not achievable or well-defined.
Our proposal has some connections with the work of Judea Pearl, whose seminal work on structural causal models emphasizes whether there is a causal effect of an exposure on an outcome or not (44). A structural causal model, typically represented by a directed acyclic graph, encodes causal hypotheses in a non-parametric way.
Hypothesis testing and estimation in clinical trials
The randomized trial is often taken as a paradigm for defining and estimating the causal effect of a well-defined intervention (24). However, some influential trialists see causal hypothesis testing as an important goal of randomized trials, even for a Phase 3 trial. For example, Richard Peto and colleagues have argued that trials should be used “to distinguish reliably between the only two medically plausible alternatives: either there is no worthwhile difference in survival, or treatment confers a moderate, but worthwhile, benefit” (45). This is because trialists “are not trying to provide exact quantitative estimates of percentage risk reductions in some precisely defined population of patients. We are simply trying to determine whether or not some type of treatment – tested in a wide range of trials – produces any effect on mortality” (46). Indeed, heterogeneity between trials means that no two trials will target the exact same causal quantity, but they can provide evidence to support (or refute) the same broad causal hypothesis. While the target trial framework is a useful tool for improving the quality of observational studies, the requirement for well-defined estimation should not limit the scope of well-designed epidemiological investigations.
Hypothesis testing is a canonical scientific activity
The statistical essence of an MR analysis is to select genetic variants which are associated with the exposure, and test if people with elevated genetically-predicted levels of the exposure have greater risk of the outcome (8,12). This is analogous to performing an intention-to-treat analysis in a randomized trial to test if an intervention works; a statistical association between genetically-predicted levels of the exposure and the outcome provides evidence supporting a causal effect of the exposure. Epidemiology, and statistics more broadly, have moved away from null hypothesis significance testing based on the dichotomization of p-values. This is in part due to a burgeoning literature on the misinterpretation and misapplication of p-values (23). Some appear to view the abuses of p-values as justifying a rejection of hypothesis testing in general.
However, hypothesis testing pre-dates formal statistical methods, and is the primary language through which theories of the history and philosophy of science have described and analysed science (47). Within a frequentist paradigm, hypothesis testing does not require a dichotomization of results into “significant” and “non-significant” (48). Bayesian approaches can also be used – indeed the original application of Bayes’ theorem was refuting David Hume’s hypothesis about the non-existence of miracles (49). Hence, while we acknowledge that p-values can be abused, we would see a hypothesis test result (alongside an MR estimate) as a core output of an MR investigation.
Radical uncertainty in economics
While the techniques of instrumental variable analysis were developed in the field of econometrics, some thinkers in the wider discipline of economics believe that precise estimation of well-defined quantities should not be the aim of economic analyses. For example, John Kay and Mervyn King argue that the pervasive presence of unquantifiable uncertainty in economics means that economic models are at best an approximation of reality, and so need to be taken with “a pinch of salt” (that is, their precise numerical estimates should not be interpreted literally and uncritically) (50). They argue economists should instead perform analyses which aim to understand underlying processes and use these to inform policy. The philosophical intuition that pervasive unquantifiable uncertainty means we should not overinterpret numeric economic estimates has a prestigious lineage among early 20th century economists including Benjamin Graham (the inventor of financial security analysis) (51), Friedrich Hayek (a luminary of the Austrian School) (52), Frank Knight (a founder of the Chicago School) (53), and John Maynard Keynes (the inventor of Keynesianism) (54).
The philosophy can perhaps be summarized by Keynes’ quote that he would “rather be vaguely right than precisely wrong”. This has clear implications into the trade-off between bias and variance; it is generally better to have an unbiased estimate, even if that estimate is imprecise. Additionally, if the option is to answer the precise question of interest unreliably, or a related question reliably, it would often be preferable to get a reliable answer, even if the numerical answer is not fully relevant to the precise question of interest. In the case of MR, while the intervention mimicked by the genetic variant(s) may not be similar to the proposed intervention to be applied in practice, the primary concern is whether we get reliable evidence, rather than whether we estimate the most relevant quantity.
Implications for Mendelian randomization investigations in practice
Mendelian randomization estimate as a life-long causal contrast
At its most austere, an MR investigation does not attempt to estimate a causal quantity, but simply tests the sharp causal null hypothesis that the causal effect equals zero for all individuals. A serious limitation of this approach is the difficulty of interpreting a null finding, as it does not allow reporting of confidence intervals or estimating the power to detect a minimal effect of interest.
A more nuanced position (which we would typically support) is to estimate the association of genetically-predicted levels of the exposure with the outcome. Under comparatively weak assumptions, this quantity (which we call the MR estimate, and can be viewed as analogous to an intention-to-treat estimate from a clinical trial) can provide some insight into the causal effect of an exposure without having all the attributes of a well-defined causal effect.
For instance, if the genetic variant’s association with the exposure has the same direction throughout the life course (temporal monotonicity), and the direction of the causal effect of the exposure on the outcome is consistent across the life course, then the sign of the MR estimate (i.e. positive or negative) will be an accurate guide of the direction of the causal effect no matter when the genetic association is estimated. As such, MR estimates can have some meaning in predicting the effect of an intervention in the exposure even if they are not targeting an effect that is well-defined in terms of the exposure (Table 1).
Table 1. Some potential goals in performing and interpreting a Mendelian randomization (MR) analysis.
| Goal of analysis | Interpretation of estimate |
|---|---|
| Causal estimation | For an MR estimate to have an interpretation as a well-defined causal effect, questions about who is being studied, what is the nature of the genetic “intervention”, and what is the timing and duration of the intervention must be precisely defined. |
| Pure hypothesis testing | Under weaker assumptions, the MR estimate has no interpretation as a causal parameter, and solely acts as a test statistic for a causal hypothesis test. |
| Estimate with some causal attributes and directional hypothesis testing | An MR estimate may be well-defined as an intervention in a genetic variant, but has a fuzzy interpretation as an intervention in the exposure: it has some of the attributes of a well-defined causal effect, but not all. A particular case of interest is when the sign of the estimate is a reliable guide of the direction of a causal effect of the exposure on the outcome in a given population. Sufficient conditions are temporal monotonicity (of the instrument-exposure association) and the exposure-outcome effect having a consistent direction across the life course. |
For example, genetic variant rs16969968 in the CHRNA5-A3-B4 gene cluster can be used to instrument smoking heaviness (i.e. number of cigarette packs smoked per week) because of its role in nicotine metabolism (55). People with the variant metabolize nicotine faster, and therefore need to smoke more to achieve the same effect, which results in greater smoking heaviness. It is not plausible for there to be a subgroup of the population where the gene’s biological function is inverted. Indeed, because the variant has a similar biological function across the life course (Figure 2), inheriting an additional copy of the A allele of rs16969968 should never associate with reduced exposure to smoking. For many outcomes, it is biologically implausible for the effect of smoking heaviness to be harmful in some time periods and protective in other time periods. As such, a positive association between rs16969968 and lung cancer risk is supportive of the directional hypothesis that an increased lifetime liability to smoking increases risk of lung cancer among people whose smoking heaviness is changed by inheriting an extra copy of the A allele of rs16969968.
Figure 2.
Synthetic data illustrating two hypothetical average trajectories in time for smoking heaviness among people carrying (blue line) and not carrying (black line) an additional copy of the A allele for the rs16969968 genetic variant. rs16969968 impacts on how fast people metabolize nicotine, and therefore how heavily they smoke (i.e. number of cigarette packs smoked per week) (55). If the genetic association with smoking heaviness is positive (or zero) across the life course, and the effect of smoking on the outcome is also in a consistent direction, then the MR estimate should provide a reliable indication of the direction of the causal effect no matter when the genetic association with smoking is measured.
Although a positive effect implies that the exposure increases risk of the outcome at some point in the life course, it may be that the effect of the exposure on the outcome is limited to a specific critical period. Similarly, we cannot make inferences about effects of the exposure that occur after the timepoint when the outcome is measured. Hence there are conditions under which an MR estimate can provide meaningful information on the direction of effect of an exposure in a particular population, although in practice a specific intervention could still be ineffective due to its timing.
Mendelian randomization investigations should still provide estimates
An MR estimate can be expressed in objective terms as the difference in the outcome per unit increase in genetically-predicted levels of the exposure (56). Although the causal interpretation of MR estimates in terms of a unit change in the exposure is not always clear, their calculation and presentation is often useful:
First, estimates provide important evidence on the causal relevance of the exposure. For example, a small p-value can be compatible with detecting an effect which is not of clinical relevance, while a large p-value can be compatible with low power to detect a relevant effect (57). Evaluating power can be difficult for an MR investigation. For example, if MR estimates do not represent effects which would be seen in an actual intervention, then defining the minimal relevant effect size of interest may be difficult. A practical alternative to a power calculation is the use of a positive control analysis. Detecting an association with an outcome that the exposure is believed to effect (on which the effect is similar or smaller than the hypothesized effect on the exposure) in a similar or smaller sized sample is indicative that there is sufficient power to detect the effect of interest.
Second, MR estimates can combine evidence from multiple variants to increase power to detect a true causal effect. It may be that no variant individually provides strong evidence for a causal effect of the exposure based solely on its association with the outcome, but the combination of evidence from all of the variants does. Under parametric assumptions, the causal estimate from the two-stage least squares method (or equivalently with summary-level data, the inverse-variance weighted method) represents the optimal combination of evidence from multiple instrumental variables in terms of statistical power (58). Hence, an MR estimate is a useful quantity even if it is simply viewed as a test statistic for a causal hypothesis test.
Third, MR estimates from different variants can be compared to assess violations of the instrumental variable assumptions. Heterogeneity in MR estimates using different genetic variants may indicate that the instrumental variable assumptions are violated for one or more of the variants (14,59).
Fourth, several robust methods for instrumental variable analysis have been developed, which not only provide consistent estimates under some violations of the instrumental variable assumptions, but also give close to nominal Type 1 error rates with some invalid instruments (60).
Finally, MR estimates for the same exposure may have a similar interpretation, and so can be compared, particularly if the exposure affects the outcomes in a similar way. For example, MR estimates for low-density lipoprotein cholesterol can be compared to assess the relative magnitude of impact of lipid-lowering therapy on different cardiovascular diseases (61).
Triangulation of evidence
The principle of triangulation is that we should approach complex questions from many different perspectives, using various approaches that make different assumptions (62). No single approach is foolproof, either in terms of internal validity (robustness to bias) or external validity (relevance to the question of interest). However, inference can be strengthened by contrasting evidence from approaches making different assumptions (and ideally with biases in different directions). Hence, evidence from an MR investigation should not be viewed in isolation. Its value is not whether it perfectly addresses a particular question about a specific intervention, but how it contributes to the evidence base about the causal status of an exposure. The strength of evidence provided will be subjective, but there is already inherent subjectivity in the validity of the instrumental variable assumptions. The primary conclusion from an MR analysis therefore should be to increase or decrease our qualitative confidence in an exposure as a causal risk factor for the outcome, not to predict the quantitative impact of a specific intervention.
Conclusion
Some past researchers have criticized MR as being pointless (in the sense of not being worthwhile). We argue it should not be “completely pointless” (in the sense that it should not provide a point estimate), but rather it should be “somewhat pointless” (in the sense that it should provide a point estimate, but the estimate should not be interpreted as an achievable causal effect).
MR estimates provide important information on the causal nature of an exposure even if they do not quantify the expected impact of an achievable intervention. Numerical values of MR estimates should not be over-interpreted. Although MR investigations should provide an estimate, its role is to be a test statistic for the sharp causal null hypothesis, to combine evidence across multiple genetic variants, to assess the validity of variants as instruments, to provide confidence intervals, and to be compared across outcomes. A consequence is that MR estimates are generally not appropriate for use in downstream quantitative analyses, such as cost-effectiveness analyses.
The role of an MR investigation is not to provides a perfect answer to a question about the magnitude of effect for a well-defined intervention in an exposure. Instead, it provides a distinct source of evidence to support (or refute) a causal hypothesis that is robust to several sources of bias common in epidemiological studies.
Acknowledgements
We want to acknowledge James Yarmolinsky, Marcus Munafò, Nasir Bashir, and the MRC-Integrative Epidemiology Unit’s MR Clinic for generously providing feedback on earlier versions of the manuscript.
Funding
BW is funded by an Economic and Social Research Council (ESRC) South West Doctoral Training Partnership (SWDTP) 1+3 PhD Studentship Award (ES/P000630/1) and the Wellcome Trust (225790/Z/22/Z). SB is supported by the Wellcome Trust (225790/Z/22/Z). The research was supported by the United Kingdom Research and Innovation Medical Research Council (MC_UU_00002/7 and MC_UU_00040/01). For the purpose of open access, the authors have applied a Creative Commons Attribution (CC BY) licence to any Author Accepted Manuscript version arising.
Footnotes
Conflicts of interests
The authors declare no conflicts of interest.
Author contributions
BW and SB conceived and wrote the article.
Data availability statement
NA
References
- 1.Hernán MA, Wang W, Leaf DE. Target Trial Emulation: A Framework for Causal Inference From Observational Data. JAMA. 2022 Dec 27;328(24):2446–7. doi: 10.1001/jama.2022.21383. [DOI] [PubMed] [Google Scholar]
- 2.Dang LE, Gruber S, Lee H, Dahabreh IJ, Stuart EA, Williamson BD, et al. A causal roadmap for generating high-quality real-world evidence. J Clin Transl Sci. 7(1):e212. doi: 10.1017/cts.2023.635. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Hernán MA, Taubman SL. Does obesity shorten life? The importance of well-defined interventions to answer causal questions. Int J Obes (Lond) 2008 Aug;32(Suppl 3):S8–14. doi: 10.1038/ijo.2008.82. [DOI] [PubMed] [Google Scholar]
- 4.Streptomycin Treatment of Pulmonary Tuberculosis. Br Med J. 1948 Oct 30;2(4582):769–82. [PMC free article] [PubMed] [Google Scholar]
- 5.Hill AB. The Environment and Disease: Association or Causation? Proceedings of the Royal Society of Medicine. 1965 May 1;58(5):295–300. doi: 10.1177/003591576505800503. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Smith GD, Ebrahim S. “Mendelian randomization”: can genetic epidemiology contribute to understanding environmental determinants of disease? Int J Epidemiol. 2003 Feb;32(1):1–22. doi: 10.1093/ije/dyg070. [DOI] [PubMed] [Google Scholar]
- 7.Youngman L, Keavney B, Palmer A, Parish S, Clark S, Danesh J, et al. Plasma fibrinogen and fibrinogen genotypes in 4685 cases of myocardial infarction and in 6002 controls: Test of causality by “Mendelian randomisation”. [cited 2024 Feb 3];Circulation. 2000 102(18) [Internet] Available from: https://ora.ox.ac.uk/objects/uuid:8b3fe67d-1ac6-44ac-9028-37412cba3086. [Google Scholar]
- 8.Katan MB. Apolipoprotein E isoforms, serum cholesterol, and cancer. Lancet. 1986 Mar 1;1(8479):507–8. doi: 10.1016/s0140-6736(86)92972-7. [DOI] [PubMed] [Google Scholar]
- 9.Tobin MD, Minelli C, Burton PR, Thompson JR. Commentary: development of Mendelian randomization: from hypothesis test to “Mendelian deconfounding”. Int J Epidemiol. 2004 Feb;33(1):26–9. doi: 10.1093/ije/dyh016. [DOI] [PubMed] [Google Scholar]
- 10.Lawlor DA, Harbord RM, Sterne JAC, Timpson N, Davey Smith G. Mendelian randomization: using genes as instruments for making causal inferences in epidemiology. Stat Med. 2008 Apr 15;27(8):1133–63. doi: 10.1002/sim.3034. [DOI] [PubMed] [Google Scholar]
- 11.Didelez V, Sheehan N. Mendelian randomization as an instrumental variable approach to causal inference. Stat Methods Med Res. 2007 Aug;16(4):309–30. doi: 10.1177/0962280206077743. [DOI] [PubMed] [Google Scholar]
- 12.VanderWeele TJ, Tchetgen Tchetgen EJ, Cornelis M, Kraft P. Methodological challenges in mendelian randomization. Epidemiology. 2014 May;25(3):427–35. doi: 10.1097/EDE.0000000000000081. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Nitsch D, Molokhia M, Smeeth L, DeStavola BL, Whittaker JC, Leon DA. Limits to causal inference based on Mendelian randomization: a comparison with randomized controlled trials. Am J Epidemiol. 2006 Mar 1;163(5):397–403. doi: 10.1093/aje/kwj062. [DOI] [PubMed] [Google Scholar]
- 14.Sanderson E, Glymour MM, Holmes MV, Kang H, Munafò J, et al. Mendelian randomization. Nat Rev Methods Primers. 2022 Feb 10;2(1):1–21. doi: 10.1038/s43586-021-00092-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Woolf B, Di Cara N, Moreno-Stokoe C, Skrivankova V, Drax K, Higgins JPT, et al. Investigating the transparency of reporting in two-sample summary data Mendelian randomization studies using the MR-Base platform. International Journal of Epidemiology. 2022 Apr 6;:dyac074. doi: 10.1093/ije/dyac074. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Keavney B, Danesh J, Parish S, Palmer A, Clark S, Youngman L, et al. Fibrinogen and coronary heart disease: test of causality by “Mendelian randomization”. Int J Epidemiol. 2006 Aug;35(4):935–43. doi: 10.1093/ije/dyl114. [DOI] [PubMed] [Google Scholar]
- 17.Gill D, Burgess S. Use of a Genetic Variant Related to Circulating FXa (Activated Factor X) Levels to Proxy the Effect of FXa Inhibition on Cardiovascular Outcomes. Circulation: Genomic and Precision Medicine. 2020 Oct;13(5):551–3. doi: 10.1161/CIRCGEN.120.003061. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Smith GD. Mendelian Randomization for Strengthening Causal Inference in Observational Studies: Application to Gene × Environment Interactions. Perspect Psychol Sci. 2010 Sep 1;5(5):527–45. doi: 10.1177/1745691610383505. [DOI] [PubMed] [Google Scholar]
- 19.Smith GD, Ebrahim S. Mendelian randomisation at 20 years: how can it avoid hubris, while achieving more? The Lancet Diabetes & Endocrinology. 2024 Jan 1;12(1):14–7. doi: 10.1016/S2213-8587(23)00348-0. [DOI] [PubMed] [Google Scholar]
- 20.Addressing the credibility crisis in Mendelian randomization. [cited 2024 Oct 31];BMC Medicine. doi: 10.1186/s12916-024-03607-5. [Internet] [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Hernán MA, Robins JM. Instruments for causal inference: an epidemiologist’s dream? Epidemiology. 2006 Jul;17(4):360–72. doi: 10.1097/01.ede.0000222409.00878.37. [DOI] [PubMed] [Google Scholar]
- 22.The Book of Why: The New Science of Cause and Effect (Penguin Science): Amazon.co.uk: Pearl, Judea, Mackenzie, Dana: 9780141982410: Books. [cited 2025 Mar 14]. [Internet] Available from: https://www.amazon.co.uk/Book-Why-Science-Cause-Effect/dp/0141982411/ref=sr_1_1?crid=UYBOSKQJUDNN&dib=eyJ2IjoiMSJ9.tN39SuDoG3IDfVK0uUCedF9t2hFvc1nUYu-oCje4DbCkr7aK2F8DFS3MEB9FOx3ZUmpfemcj7V4GpG6uXqZnR5_Bn4_MGh19A0ZvqousbajbzhM8QhmlJqF652pIE0T9V0NPN-ldQqQDCS-Tuy-hoh0w3RtXmv4xmb2GxNUcFwKNb4nS_Lu5aT6WRpWVncTbzoPSeaIEN8mGTwgKTTt4qHau14sppOTx2QiPwnlSl8E.2QE_U_lpCwJbtfoEv_Wq4XoNdjcJEnh2K5h0HVlvC-g&dib_tag=se&keywords=book+of+why&qid=1741952584&s=books&sprefix=book+of+why%2Cstripbooks%2C105&sr=1-1.
- 23.Wasserstein RL, Lazar NA. The ASA Statement on p-Values: Context, Process, and Purpose. The American Statistician. 2016 Apr 2;70(2):129–33. [Google Scholar]
- 24.Swanson SA, Tiemeier H, Ikram MA, Hernán MA. Nature as a Trialist?: Deconstructing the Analogy Between Mendelian Randomization and Randomized Trials. Epidemiology. 2017 Sep;28(5):653–9. doi: 10.1097/EDE.0000000000000699. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Skrivankova VW, Richmond RC, Woolf BAR, Davies NM, Swanson SA, VanderWeele TJ, et al. Strengthening the reporting of observational studies in epidemiology using mendelian randomisation (STROBE-MR): explanation and elaboration. BMJ. 2021 Oct 26;375:n2233. doi: 10.1136/bmj.n2233. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Swanson SA, Hernán MA. Commentary: How to Report Instrumental Variable Analyses (Suggestions Welcome. Epidemiology. 2013 May;24(3):370. doi: 10.1097/EDE.0b013e31828d0590. [DOI] [PubMed] [Google Scholar]
- 27.Hartwig FP, Wang L, Davey Smith G, Davies NM. Average causal effect estimation via instrumental variables: the no simultaneous heterogeneity assumption. [cited 2022 Jun 28];arXiv. 2021 doi: 10.1097/EDE.0000000000001596. [Internet] Available from: http://arxiv.org/abs/2010.10017. [DOI] [PubMed] [Google Scholar]
- 28.Virolainen SJ, VonHandorf A, Viel KCMF, Weirauch MT, Kottyan LC. Gene–environment interactions and their impact on human health. Genes Immun. 2023 Feb;24(1):1–11. doi: 10.1038/s41435-022-00192-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Swanson SA, Hernán MA. The challenging interpretation of instrumental variable estimates under monotonicity. International Journal of Epidemiology. 2018 Aug 1;47(4):1289–97. doi: 10.1093/ije/dyx038. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Diemer EW, Havdahl A, Andreassen OA, Munafò MR, Njolstad PR, Tiemeier H, et al. Bounding the average causal effect in Mendelian randomisation studies with multiple proposed instruments: An application to prenatal alcohol exposure and attention deficit hyperactivity disorder. Paediatric and Perinatal Epidemiology. 2023;37(4):326–37. doi: 10.1111/ppe.12951. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Didelez V, Meng S, Sheehan NA. Assumptions of IV Methods for Observational Epidemiology. Statistical Science. 2010 Feb;25(1):22–40. [Google Scholar]
- 32.Bowden J, Del Greco MF, Minelli C, Zhao Q, Lawlor DA, Sheehan NA, et al. Improving the accuracy of two-sample summary-data Mendelian randomization: moving beyond the NOME assumption. Int J Epidemiol. 2019 Jun;48(3):728–42. doi: 10.1093/ije/dyy258. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Swanson SA, Labrecque J, Hernán MA. Causal null hypotheses of sustained treatment strategies: What can be tested with an instrumental variable? Eur J Epidemiol. 2018 Aug 1;33(8):723–8. doi: 10.1007/s10654-018-0396-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Woolf B, Mason A, Zagkos L, Sallis H, Munafò MR, Gill D. MRSamePopTest: introducing a simple falsification test for the two-sample mendelian randomisation ‘same population’ assumption. BMC Research Notes. 2024 Jan 17;17(1):27. doi: 10.1186/s13104-024-06684-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Burgess S, Davies NM, Thompson SG, EPIC-InterAct Consortium Instrumental variable analysis with a nonlinear exposure-outcome relationship. Epidemiology. 2014 Nov;25(6):877–85. doi: 10.1097/EDE.0000000000000161. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Woolf B, Cronjé HT, Zagkos L, Larsson SC, Gill D, Burgess S. Comparison of caffeine consumption behavior with plasma caffeine levels as exposure measures in drug-target Mendelian randomization. American Journal of Epidemiology. 2024 Jun 20;:kwae143. doi: 10.1093/aje/kwae143. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.The interleukin-6 receptor as a target for prevention of coronary heart disease: a mendelian randomisation analysis. The Lancet. 2012 Mar 31;379(9822):1214–24. doi: 10.1016/S0140-6736(12)60110-X. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Burgess S, Labrecque JA. Mendelian randomization with a binary exposure variable: interpretation and presentation of causal estimates. Eur J Epidemiol. 2018;33(10):947–52. doi: 10.1007/s10654-018-0424-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Howe LJ, Tudball M, Davey Smith G, Davies NM. Interpreting Mendelian-randomization estimates of the effects of categorical exposures such as disease status and educational attainment. International Journal of Epidemiology. 2022 Jun 1;51(3):948–57. doi: 10.1093/ije/dyab208. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Wootton RE, Richmond RC, Stuijfzand BG, Lawn RB, Sallis HM, Taylor GMJ, et al. Evidence for causal effects of lifetime smoking on risk for depression and schizophrenia: a Mendelian randomisation study. Psychol Med. 2020 Oct;50(14):2435–43. doi: 10.1017/S0033291719002678. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Burgess S, Butterworth A, Malarstig A, Thompson SG. Use of Mendelian randomisation to assess potential benefit of clinical intervention. BMJ. 2012 Nov 6;345:e7325. doi: 10.1136/bmj.e7325. [DOI] [PubMed] [Google Scholar]
- 42.Holmes MV, Ala-Korpela M, Davey Smith G. Mendelian randomization analyses in cardiometabolic disease: challenges in evaluating causality. Nat Rev Cardiol. 2017 Oct;14(10):577–90. doi: 10.1038/nrcardio.2017.78. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Labrecque JA, Swanson SA. Interpretation and Potential Biases of Mendelian Randomization Estimates With Time-Varying Exposures. Am J Epidemiol. 2019 Jan 1;188(1):231–8. doi: 10.1093/aje/kwy204. [DOI] [PubMed] [Google Scholar]
- 44.Pearl J. Causality. 2nd. Cambridge University Press; Cambridge New York, NY Port Melbourne New Delhi Singapore: 2009. p. 486. [Google Scholar]
- 45.Yusuf S, Collins R, Peto R. Why do we need some large, simple randomized trials? Stat Med. 1984 Dec;3(4):409–22. doi: 10.1002/sim.4780030421. [DOI] [PubMed] [Google Scholar]
- 46.Peto R. Why do we need systematic overviews of randomized trials? Statist Med. 1987 Apr;6(3):233–40. doi: 10.1002/sim.4780060306. [DOI] [PubMed] [Google Scholar]
- 47.Chalmers. What is This Thing Called Science? 4th. Open University Press; Maidenhead: 2013. p. 304. [Google Scholar]
- 48.Cole SR, Edwards JK, Greenland S. Surprise! Am J Epidemiol. 2021 Feb 1;190(2):191–3. doi: 10.1093/aje/kwaa136. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Earman J. In: Bayes’s Theorem. Eells R, editor. Oxford University Press; 2008. Bayes, Hume, Price, and Miracles; pp. 91–110. [Google Scholar]
- 50.King M, Kay J. Radical Uncertainty: Decision-making for an unknowable future. The Bridge Street Press; London: 2020. p. 544. [Google Scholar]
- 51.Graham B. Intelligent Investor: The Definitive Book on Value Investing - A Book of Practical Counsel. Harper Business; 2003. p. 640. Rev edition. [Google Scholar]
- 52.Hayek F. The Pretence of Knowledge. [cited 2024 Feb 3];The Sveriges Riksbank Prize in Economic Sciences in Memory of Alfred Nobel 1974. 1974 [Internet] Available from: https://www.nobelprize.org/prizes/economic-sciences/1974/hayek/lecture/ [Google Scholar]
- 53.Knight FH. In: Risk, Uncertainty, and Profit. McClure J, editor. Signalman Publishing; Kissimmee, Fla: 2009. p. 260. [Google Scholar]
- 54.Keynes JM. The General Theory of Employment. The Quarterly Journal of Economics. 1937;51(2):209–23. [Google Scholar]
- 55.Lassi G, Taylor AE, Timpson NJ, Kenny PJ, Mather RJ, Eisen T, et al. The CHRNA5–A3–B4 Gene Cluster and Smoking: From Discovery to Therapeutics. Trends Neurosci. 2016 Dec;39(12):851–61. doi: 10.1016/j.tins.2016.10.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Burgess S, O’Donnell CJ, Gill D. Expressing Results From a Mendelian Randomization Analysis: Separating Results From Inferences. JAMA Cardiol. 2021 Jan 1;6(1):7–8. doi: 10.1001/jamacardio.2020.4317. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Cox DR, Spjøtvoll E, Johansen S, van Zwet WR, Bithell JF, Barndorff-Nielsen O, et al. The Role of Significance Tests [with Discussion and Reply] Scandinavian Journal of Statistics. 1977;4(2):49–70. [Google Scholar]
- 58.Wooldridge J. Introductory Econometrics: A Modern Approach. 7th. South-Western College Publishing; Boston, MA: 2019. p. 816. [Google Scholar]
- 59.Burgess S, Thompson SG. Mendelian Randomization: Methods for Causal Inference Using Genetic Variants. 2nd. Chapman and Hall/CRC; 2021. p. 240. [Google Scholar]
- 60.Slob EAW, Burgess S. A comparison of robust Mendelian randomization methods using summary data. Genetic Epidemiology. 2020;44(4):313–29. doi: 10.1002/gepi.22295. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Allara E, Morani G, Carter P, Gkatzionis A, Zuber V, Foley CN, et al. Genetic Determinants of Lipids and Cardiovascular Disease Outcomes: A Wide-Angled Mendelian Randomization Investigation. Circ Genom Precis Med. 2019 Dec;12(12):e002711. doi: 10.1161/CIRCGEN.119.002711. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Munafò MR, Davey Smith G. Robust research needs many lines of evidence. Nature. 2018 Jan;553(7689):399–401. doi: 10.1038/d41586-018-01023-3. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
NA


