Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2025 Feb 18.
Published in final edited form as: Arthritis Care Res (Hoboken). 2024 Aug 7;76(11):1451–1460. doi: 10.1002/acr.25400

A Guide to Understanding Mendelian Randomization Studies

Kevin Nguyen 1, Braxton D Mitchell 2
PMCID: PMC11833605  NIHMSID: NIHMS2051249  PMID: 39030941

Abstract

Epidemiology provides a powerful framework for characterizing exposure–disease relationships, but its utility for making causal inferences is limited because epidemiologic data are observational in nature and subject to biases stemming from undetected confounding variables and reverse causation. Mendelian randomization (MR) is an increasingly popular method used to circumvent these limitations. MR uses genetic variants, or instruments, as a natural experiment to proxy an exposure, thus allowing estimation of causal effects upon an outcome that are minimally affected by the usual biases present in epidemiologic studies. Notably, MR relies on three core assumptions related to the selection of the genetic instruments, and adherence to these assumptions must be carefully evaluated to assess the validity of the causal estimates. The goal of this review is to provide readers with a basic understanding of MR studies and how to read and evaluate them. Specifically, we outline the basics of how MR analysis is conducted, the assumptions underlying instrument selection, and how to assess the quality of MR studies.

Introduction

Mendelian randomization (MR) studies have become commonly used in epidemiology to evaluate the causal effect of an exposure (eg, body mass index [BMI]) on an outcome (eg, osteoarthritis [OA]). By using genetic variants as a proxy for a direct measurement of the exposure, properly designed MR studies can overcome the limitations of observational studies, which are well suited for assessing associations, but not causality.1 The goal of this review is to provide readers of this journal with an overview of MR, how it is used and its limitations, and especially how to read MR studies (see Table 1). When possible, we draw on examples from the rheumatology field. This review is not meant to provide an extensive review of MR methodology. For this latter purpose, we refer the interested reader to more extensive reviews on this topic.2,3

Table 1.

Glossary of key terms used in this review*

Key terms Definition
Mendelian randomization Uses genetic variants as a proxy for an exposure and tests the causal effect of the variants (IV) on an outcome. Based on Mendel’s principle of independent assortment, MR mimics randomized trials and minimizes some of the biases inherent in traditional epidemiologic studies.
Genetic instrument/IV The set of genetic variants used to proxy (or “instrument”) the exposure.
Polygenic risk score An individual’s genetic liability to a specific trait. Typically calculated as the weighted sum of risk alleles.
Weak instrument bias Bias due to the IV explaining only a small proportion of the variation of the exposure.
1SMR vs 2SMR Refers to the number of data sets used in the MR study. 1SMR estimates the IV effect on exposure and outcome in the same dataset, whereas 2SMR estimates the IV effect on exposure in one dataset and its effect on outcome in a second.
Horizontal pleiotropy The effect of a genetic variant on the outcome does not act exclusively through the exposure (ie, the IV also acts through an independent pathway).
IVW method Meta-analysis method that takes the average of the effect sizes across all variants (IVs), weighted by the inverse of their variances. This allows more precise estimates to have larger weights than less precise estimates.
MR-Egger An alternative causal estimator calculated as a sensitivity analysis to evaluate the independence and exclusion assumptions. MR-Egger regresses the effect size of the IV on the exposure vs its effect size on the outcome. Unlike the IVW method, the intercept is not constrained to zero.
*

1SMR, one-sample Mendelian randomization; 2SMR, two-sample Mendelian randomization; IV, instrumental variable; IVW, inverse-variance weighted; MR, Mendelian randomization.

Goal of MR

Epidemiology is a valuable tool for characterizing exposure–disease relationships, but its utility for making causal inferences is limited because epidemiologic data are observational in nature and subject to biases stemming from undetected confounding and reverse causation. MR is a technique designed to overcome these limitations by using genetic variants as proxies for the exposure. Because alleles are inherited at conception and cannot be modified by disease, the MR estimates are resistant to bias from reverse causation. Similarly, because alleles are inherited randomly at conception, they are largely independent of environmental and lifestyle influences and the estimates are minimally affected by unmeasured confounders. From this perspective, MR may be likened to a randomized clinical trial in which study participants are randomized to a treatment (or exposure) and then observed for disease outcome.

Typically, the exposure in MR studies is a modifiable risk factor so that the analysis addresses the important question of whether disease risk can be altered by changing exposure to the risk factor. The power of MR analysis is its use of available observational data, which are usually summary genetic association results, to infer causality. The following is a compelling example illustrating the power of MR.

Example of an MR study.

Epidemiologic studies have long observed that low levels of high-density lipoprotein (HDL) cholesterol precede the development of coronary heart disease (CHD) and that this association is only partly accounted for by the presence of other risk factors known to be associated with CHD (eg, high levels of low-density lipoprotein [LDL] cholesterol and BMI, diabetes, glucose intolerance, etc). This observation, in addition to HDL cholesterol’s known role in removing other forms of cholesterol from the bloodstream, has led to low HDL being regarded as a risk factor for CHD and to efforts being targeted at increasing HDL cholesterol as a therapeutic strategy to prevent CHD.

Because much of the evidence implicating the causal role of HDL cholesterol in CHD was based on observational epidemiologic data, Voight et al4 used MR to test the causal role of HDL in CHD. From previously conducted large population studies, they first identified a loss-of-function variant in a gene encoding endothelial lipase that was specifically tied to higher (not lower) HDL cholesterol levels but notably did not change any other lipid or nonlipid cardiovascular risk factor. Participants with this variant had ~0.29 SD units higher levels of HDL cholesterol. Based on this degree of HDL cholesterol increase, it was predicted from epidemiologic data that this variant would result in a 13% lower incidence of myocardial infarction (MI). However, individuals with and without this variant showed no difference at all in the incidence of MI. Moreover, when a more comprehensive genetic instrument was developed that included 14 common single-nucleotide polymorphisms (SNPs) associated with variation in HDL cholesterol levels, the authors again found no association of this HDL genetic score with MI. This powerful analysis challenged existing views about the role of HDL cholesterol levels in mediating CHD risk and reinforced emerging views that strategies to increase HDL levels would not be a fruitful therapeutic approach for CHD prevention.

Basics of MR

MR estimates the causal effect of an exposure, typically a modifiable risk factor, on a disease or trait outcome. This estimate is generally expressed as an odds ratio if the outcome is binary, ie, presence or absence of disease (eg, OA, arthroscopy, etc), or as a beta coefficient if the outcome is a continuous measure (eg, joint space narrowing, pain score, etc). The use of genetic variants as a proxy for the exposure allows for an estimate that is minimally biased by factors such as reverse causality and unmeasured confounders, which can impact traditional epidemiologic analyses. The genetic variants serve as the instrumental variable (IV) or genetic instrument, which, as illustrated in Figure 1, is a variable associated with both exposure and outcome, but its association with outcome is only through the exposure. IVs comprise a set of genetic variants that are known, typically through a previous genome-wide association study (GWAS), to be robustly associated with the exposure. As described below, the validity of an MR analysis relies on careful selection of the genetic instruments, including an assessment as to whether they conform to three core assumptions that underlie MR analysis.

Figure 1.

Figure 1.

The three MR assumptions, in which Z is the IV associated with the exposure, X is the exposure, Y is the outcome, and U is a confounder. 1) Relevance assumption: The IV is strongly associated with the exposure of interest. 2) Independence assumption: There are no variables confounding the association between the IVs and the outcome. 3) Exclusion restriction assumption: The IV is related to the outcome only via the exposure. IV, instrumental variable; MR, Mendelian randomization.

Polygenic risk scores and MR.

MR analysis is related to the concept of polygenic risk scores (PRSs), which are a measure of genetic susceptibility. One can envision a simple measure of genetic susceptibility to a trait (say, OA) as being whether an individual carries a known OA risk allele at a particular locus. Because a single locus does not capture genetic susceptibility very comprehensively, a PRS extends this concept by combining risk alleles across many trait-associated loci. These risk alleles are often identified through a GWAS. The cumulated burden of risk alleles would constitute a PRS for OA. The PRS can then be calculated as a weighted sum of all risk alleles, with the weights proportional to each allele’s effect on the outcome.

A PRS is a reasonable first approximation for an MR genetic instrument but is generally only a blunt instrument because many variants in the instrument, although associated with the exposure, may not have a direct causal effect on the exposure. Variants used in an MR analysis have the additional restriction that they have a causal effect on the exposure and are not just associated with it. Guidelines for selecting valid instruments are discussed in the following section.

Assumptions underlying MR.

The most critical component of MR studies lies in the selection of genetic instruments. Valid genetic instruments must meet three core assumptions (Figure 1):

  1. The IV must be strongly associated with the exposure (relevance),

  2. The IV should not be associated with the outcome through confounding pathways (independence), and

  3. The IV should not affect the outcome except through the exposure (exclusion).

The following are guidelines for evaluating whether these assumptions hold. Notably, and particularly for the independence and exclusion assumptions, it is only possible to demonstrate they have been violated, not that they have not been violated.

Conducting MR analysis

Exposures considered for MR analysis must have a known genetic basis. There should also be previous evidence linking the exposure to the outcome, such as, for example, an established epidemiologic association. Also to be considered is the need to define or operationalize the exposure when selecting the genetic instrument(s) to use; ie, for “adiposity” or “obesity,” one can choose from genetically proxied BMI, waist circumference, or percent body fat. Depending on available data, the MR analysis can be conducted in several different ways. The following sections describe the major features of an MR study.

Study design: one-sample versus two-sample MR.

MR studies can be distinguished according to whether the genetic instrument proxying the exposure and the outcome data are derived from the same dataset (one-sample MR [1SMR]) or from separate datasets (two-sample MR [2SMR]). Consider a study testing the causal effect of parathyroid hormone on OA. An example of a 1SMR design would be if the UK Biobank was used as the data source for both the GWAS generating the genetic instrument (ie, parathyroid hormone) and the source of the outcome data (OA status). On the other hand, if the genetic instrument was constructed from a parathyroid hormone GWAS in a totally separate population and then tested for its association with OA in the UK Biobank, as was done by Huang et al,5 this would constitute a 2SMR.

The 1SMR and 2SMR designs each have pros and cons. A strength of 1SMR designs is that, because the same population is used for both construction of the genetic instrument and testing for association with outcome, the potential for confounding due to differences in population ancestry and population composition (eg, age, sex, education, socioeconomic status, etc) is minimized. A limitation of this design is the potential for a weak genetic instrument if the exposure GWAS was not very large and the IV does not include robustly associated variants. Conversely, a major advantage of the 2SMR design is the opportunity for generating a much more powerful genetic instrument that is based on a very large GWAS, as from a previous meta-analysis, and for which the variant effect size estimates are based on data collected across many different studies. A reason for caution for 2SMR designs is that careful attention must be given to the comparability of the two populations (the exposure and outcome populations) to ensure that the interpretation and validity of the causal estimate are not affected by unmeasured confounders. For example, an exposure population that is of predominantly European ancestry and an outcome population that has significant non-European ancestry representation may produce invalid causal estimates because instruments taken from European-ancestry populations have been shown to have poor transferability and predictive power when applied to non–European-ancestry populations.6

Selection of genetic instruments.

Variants for MR analysis can be selected based on purely statistical criteria or according to biologic considerations. For variants chosen on the basis of statistical criteria, the conventional approach is to use very large GWASs and require genome-wide thresholds for statistical significance (ie, P < 5 × 10−8) to reduce the likelihood of including weak- and nonrobust-associated variants (thus addressing the relevance assumption). The set of variants included in the IV should also be uncorrelated; typically, the peak variant at a locus is included in the IV, and all variants in linkage disequilibrium (or correlated) with the peak variant are excluded.

One caveat for including variants chosen solely by statistical criteria (eg, genome-wide significant on a previous GWAS) is that some of these variants may affect not only the exposure, but also other traits; ie, they may have pleiotropic effects. For MR, it is important to distinguish between horizontal pleiotropy and vertical pleiotropy. Horizontal pleiotropy is a potential bias for MR studies that occurs when the exposure-associated genetic variant also influences another trait, and this second trait influences the outcome through a different pathway than the exposure. This would be a violation of the third core assumption of MR (exclusion). As an example of horizontal pleiotropy, consider an MR analysis test in which the inflammatory factor, tumor necrosis factor alpha (TNF-α), “causes” OA. Many TNF-α–associated variants may be associated with adipose tissue, which would not be surprising because TNF-α is secreted by adipocytes. These variants would therefore not serve as a good proxy for TNF-α exposure because any association of this variant with the outcome (eg, OA) could reflect a causal effect of BMI on OA that is independent of TNF-α exposure. For this reason, valid MR estimates require some pruning of the genetic variants selected so that “noncausal” variants are excluded. Ways to access the potential for horizontal pleiotropy and violation of the exclusion assumption are discussed in the section entitled “Assessment of MR assumptions.”

In contrast, vertical pleiotropy occurs when the exposure-associated genetic variant alters another trait, and it is this second trait that is the proximal “cause” of the outcome. Vertical pleiotropy does not violate the MR assumptions. For example, in a study to evaluate the causal effect of BMI on OA, if a BMI-associated variant caused changes in adipokine levels and these levels had a causal effect on OA, this would not alter our conclusion that BMI is a “cause” of OA, even though the BMI effect may be mediated through adipokines (among other factors).

One way of getting around the potential problem of horizontal pleiotropy is to select variants for the IV according to a known biologic link to the exposure. For example, MR studies linking vitamin D levels to some rheumatic diseases have been equivocal, but after restricting IVs in their studies to include only variants in the vitamin D synthetic pathway, Bergink et al found no effect of vitamin D levels on OA,7 and Clarke et al found no evidence of its association with juvenile idiopathic arthritis.8 Other examples of biologically selected variants include variants that have previously been associated with expression of a biologically relevant gene (ie, an expression quantitative trait locus) or variants associated with circulating levels of a biologically relevant protein (ie, a protein quantitative trait locus).9

Obtaining the causal effect of exposure on outcome.

The genetic instrument can be used to calculate the causal estimate in several ways, depending on whether individual level data are available for participants and controls or only summary genetic association results are available for each IV and the outcome. If individual-level data are available for participants and controls, then the genotypes can be used to construct a PRS that reflects the burden of risk alleles across all variables in the IV, and the PRS can be regressed against the study outcome to produce an estimate of the causal effect of exposure on the outcome. This causal effect estimate is mathematically equivalent to the ratio of the effect size of the PRS on the outcome divided by the effect of the PRS on the exposure. Thus, for a causal variant, the effect size of the instrument on the exposure should be proportional to its effect size on the outcome; ie, variants with larger effect sizes on the exposure should have larger effect sizes on the outcome. The procedure for calculating the causal estimate that uses individual level data to compute the PRS for each individual corresponds to a two-stage least-squares (2SLS) regression,10,11 in which the first stage is a regression to develop the IV and the second stage is the regression of the IV on the outcome. A benefit of using individual-level data in MR analysis is the opportunity to estimate causal effects in subsets of the data, such as in men versus women and in early- versus late-onset disease.

Frequently, however, individual-level data are not available, in which case MR analysis can be conducted using only the genetic association summary results for the individual IV variants with disease outcome. For each variant (rather than the PRS as a whole), the ratio of the variant effect size on the exposure divided by its effect size on the outcome is calculated. This ratio falls under a general class of statistical parameters called a Wald ratio. The variant-specific Wald ratios are then combined as a weighted average of the individual ratios using the inverse-variance weighted (IVW) method,11 which assigns greater weight to values with greater precision or smaller variances (ie, the estimates are weighted by the inverse of their variances). More powerful variations of the IVW method have recently been developed, including the generalized summary data-based MR method,12 which, unlike the conventional IVW, allows for multiple correlated variants at each locus by accounting for linkage disequilibrium to provide more powerful estimates of each Wald ratio.

Before obtaining the weighted average of the Wald ratios, it is important to evaluate the degree of heterogeneity among these estimates. Although some heterogeneity can be expected due to sampling error, excessive heterogeneity could indicate an invalid instrument that may bias the causal estimate. Although there can be biologic reasons for heterogeneity among estimates (ie, the effect of BMI on outcome may depend on the different genetic drivers of BMI), nonbiological or noncausal factors (eg, confounding due to horizontal pleiotropy) can also contribute to heterogeneity. Presence of heterogeneity can be evaluated using statistical procedures, such as the Q statistic, which evaluates whether there is more heterogeneity than would be expected by chance alone, and the I2 statistic,13 which quantifies the proportion of total variation across studies due to heterogeneity. If heterogeneity analysis reveals no evidence for excess heterogeneity, then a fixed-effects model that assumes similar effect size among estimates is employed for the IVW analysis; otherwise, a random effects model is used to allow for varying effect sizes among estimates.

Results of an IVW MR analysis are often displayed as a scatter plot showing the effect size of each variant on the exposure plotted against its effect size on the outcome (see Figure 2). This allows the reader to assess the degree of heterogeneity among the individual variants comprising the genetic instrument and potential violations of MR assumptions.

Figure 2.

Figure 2.

Scatter plot of selected IVs contrasting the variant’s effect on outcome divided by its effect on exposure (Wald’s ratio). Causal estimates from the IVW, MR-Egger, simple median, and weighted median estimators are overlayed to provide a visual representation of the consistency of causal effects generated methods using different assumptions. Note that the IVW and median estimators are constrained to have an intercept of zero. IV, instrumental variable; IVW, inverse-variance weighted; MR, Mendelian randomization. Color figure can be viewed in the online issue, which is available at http://onlinelibrary.wiley.com/doi/10.1002/acr.25400/abstract.

Assessment of MR assumptions.

The validity of an MR study depends largely on the validity of the genetic instrument used to proxy the exposure. There are multiple procedures for evaluating instrument validity that address adherence to the three core MR assumptions: relevance, independence, and exclusion. Of these three, only the first assumption can be explicitly proven to be held.

The relevance assumption relates to the strength of the instrument or its magnitude of association with the exposure. Weak IVs lack the statistical power necessary to test the hypothesis and are prone to bias arising from horizontal pleiotropy. If weak instruments are used, causal estimates in a 1SMR setting are biased away from the null, whereas in 2SMR settings, the MR estimates are biased toward the null. Strategies to avoid weak instrument bias are to select instruments that are strongly associated with the exposure, eg, at genome-wide statistical thresholds of significance, or to use variants known to be biologically associated with the variant. When selecting variants from previous GWASs, large, well-powered studies are preferred because they are less prone to false positive results.

A metric commonly used to assess the strength of the genetic variants used in the IV is the F statistic.14 The F statistic is a measurement that scales with the sample size of the exposure GWAS and the estimated effect size of the associated variant. A general rule of thumb is that the F statistic should be >10 to ensure the IV does not suffer from weak instrument bias, although this does not necessarily guarantee enough statistical power to test causal relationships.

The independence assumption is that the IV is not associated with the outcome through confounding pathways, and the exclusion assumption asserts that the IV does not affect the outcome except through the exposure. These assumptions are more difficult to address because there are no tests that can explicitly demonstrate that the two assumptions hold.

One common potential source of confounding in genetic association studies (relevant to the independence assumption) is a spurious association between genotype and exposure that occurs entirely as a result of differences in population substructure between participants and controls. If allele frequencies differ between participants and controls simply by virtue of ancestry differences and the exposure also tracks with ancestry, then any ancestry-associated allele may appear associated with the exposure simply because both occur more frequently in one group (participants) than the other (controls). To minimize the potential for this type of confounding, participants and controls of the same ancestry group should be used, or the analysis should be stratified by ancestry group. If individual level data (eg, genotypes) and the 2SLS method is employed for the MR, then a measurement of ancestry composition can be included in the association analysis regression as a covariate. Principal components calculated from genetic variants throughout the genome are often used for this purpose. Other sources of confounding may be more difficult to identify and control for.

The potential for horizontal pleiotropy (and potential for a violation of the exclusion assumption) can be particularly difficult to foresee. This is illustrated in the previous example of an MR analysis testing the causal effect of TNF-α on OA, in which many TNF-α–associated variants being considered for selection in the IV may be associated with BMI, but their effects on OA may be through a pathway independent of the exposure (TNF-α). Because our knowledge of pleiotropic effects is incomplete, it is therefore conventional practice for MR studies to include a set of sensitivity analyses designed to detect deviations from the independence and exclusion assumptions. Deviations from these two assumptions can result in IVs that result in biased estimates of the causal effect of the exposure on the outcome.

“Standard” sensitivity analysis.

Sensitivity analyses should be performed, particularly for any analyses in which the estimated causal effect differs significantly from zero. Comparisons between the IVW causal estimator and those obtained from the sensitivity analyses should be used to guide the interpretation of the MR analysis.

The first set of sensitivity analyses evaluates the use of the weighted mean of the Wald ratios as the optimal summary measure for the primary IVW analysis as opposed to other summary measures, such as the median and mode, which are less sensitive to outlier observations. The median method uses the median Wald ratio, which will be a valid estimator as long as at least 50% of the IVs are valid estimators.15 The simple median method treats all genetic variants equally and ranks them by the magnitude of their ratios, whereas the weighted median method assigns different weights to individual causal estimates based on their precision (inverse of the variance). Similarly, the weighted mode method uses the mode (or most commonly observed) Wald ratio as the causal estimate because this measure will be valid as long as the most common group of variants are valid estimates. Although the power of the median and mode MR analyses is generally lower than that for the weighted mean analysis, resulting in broader confidence intervals, comparability of their causal estimate effect sizes with those obtained from the primary IVW analysis provides some assurance against large violations of MR assumptions. Especially worrisome would be estimates from the median or mode analyses that are directionally inconsistent from those obtained from the primary IVW analysis.

A second commonly used method to detect pleiotropy is MR-Egger regression.16 This method performs a weighted regression of the variant–outcome association on the variant–exposure association, but unlike the IVW method, it allows for an intercept term that corresponds to the average pleiotropic effect. If the intercept were exactly equal to zero, the IVW and MR-Egger causal estimates would be identical. An intercept that has a value that differs significantly from zero can be interpreted as evidence for directional pleiotropy. Notably, the MR-Egger intercept test is underpowered to detect pleiotropy, and caution must be exercised in relying solely on this test. MR-Egger regression also requires that the pleiotropic effect be independent of the variant–exposure association; that is, the size of the pleiotropic effect is not proportional to the size of the variants’ effect on the exposure.

Additional approaches are also available to detect outliers in the IV. The MR-PRESSO (Mendelian Randomization Pleiotropy RESidual Sum and Outlier) method is used to detect and correct for pleiotropy by employing a “leave-one-out” methodology in which the IVW method is repeatedly refitted to exclude one variant.17 For each step, the level of distortion that is caused by a variant’s removal is calculated and used to identify and remove horizontal pleiotropic outliers. After removal, the IVW is recalculated with the remaining IVs.

Supplementary sensitivity analyses.

Supplemental sensitivity analyses should be considered, when possible, to complement the “standard” sets of analyses. The types of analyses will depend on the study question and available data.

One strategy, for example, might be to consider a more specific measure of the exposure, as was done by Tan et al, whose sensitivity analysis used an instrument specific to seropositive rheumatoid arthritis (RA).18 A second strategy would be to remove variants known to have pleotropic effects on other traits that are associated with the outcome. For example, in their MR analysis of type 2 diabetes and OA risk, Funck-Brentano et al removed variants known to be associated with BMI in their instrument for diabetes risk because BMI affects OA risk through pathways other than diabetes risk.19 Similarly, in their study of gout and risk of cardiometabolic disease, Keenan et al first observed an association with their 28 SNP gout instrument, but when they considered the most strongly associated variant in the uric acid transporter, SLC2A9, they found a strong association with odds of gout (as expected) but no association with any of the cardiometabolic outcomes, implying that decreasing serum urate levels may not translate into risk reduction for cardiometabolic diseases.20

Because genetic variants can have pleiotropic effects on traits that may not be apparent, a more generalized strategy is to perform a phenotype-wide association study (PheWAS) to test for association of the genetic instrument against many different phenotypes that could be potential confounders between the IV and the exposure. There are multiple online resources available to conduct such PheWASs (eg, Phenoscanner).21 Association of the IV with potential confounders could prompt the removal of individual variants from the IV or, at the very least, prompt a more guarded interpretation of the MR results.

Yet a third strategy, if individual-level data are available, would be to perform subgroup or stratified analyses to address the possibilities of confounding by environmental factors or biologic plausibility. To this latter point, Johnsen et al constructed an IV for smoking quantity and tested its causal effect on total joint replacement. They observed higher smoking quantity to be associated with lower total joint replacement risk in smokers but found no effect in nonsmokers or former smokers, adding to the biologic plausibility of the result.22

Extensions of MR

Detection of a causal effect by an MR analysis often raises additional questions about the exposure–outcome relationship. For example, Shirai et al observed a causal association of habitual coffee consumption with reduced risk of gout but found no association with serum uric acid levels, implying that coffee consumption may reduce gout risk independently of serum uric acid.23 Another question that often arises is whether the relationship between exposure and outcome may be bidirectional. This hypothesis can be tested through bidirectional MR, in which two separate MR analyses are conducted, one treating trait X as the exposure and trait Y as the outcome, and then vice versa.24 This requires a separate IV for each trait to be tested against its respective outcome. In their study of major depressive disorder (MDD) and OA, Zhang et al observed a causal effect of MDD on risk of OA and also a causal effect of OA on risk of MDD.25 In their study of RA and MI, Guo et al observed a causal effect of genetic susceptibility to RA on MI, but not the other way around.26

Multivariable Mendelian randomization (MVMR) is an extension to MR analysis that is used to identify intervening variables that might mediate the association of the exposure with the outcome.27,28 An example of this methodology is the study by Gill et al, who first used MR to identify a causal effect of lower educational status on OA and then performed an MVMR from which they estimated that higher educational status was protective against OA, with 35% of the effect mediated by genetically predicted BMI and smoking.29

Other MR use cases

Although the previous discussion has focused principally on MR in the context of evaluating the effects of modifiable risk factors on an outcome, there are other use cases. For example, MR can be used to evaluate causal relationships between diseases. Yuan et al used MR to assess the effects of genetic liability to RA on cardiovascular diseases.30 They found that a 1-unit increase in log odds of RA was linked to 1.02 increased odds of coronary artery disease (CAD) and 1.05 increased odds of intracerebral hemorrhage. Using multivariable mediation analysis, they then showed that the association with CAD was attenuated after adjustment for C-reactive protein levels, implying that the heightened cardiovascular risk in patients with RA could be ameliorated by dampening systemic inflammation.

A second use case for MR analysis is for identification of potential drug targets. Biologic drugs work by perturbing gene expression or protein targets to induce a desired clinical outcome. Application of MR for investigation of drug targets begins with constructing a genetic instrument in relation to the gene reflecting the drug target as opposed to considering genetic variants across the genome. By selecting genetic variants within or neighboring the gene of interest, the instrument more specifically captures assessment of gene expression or protein function, thus enabling MR to more precisely assess causality between the intended drug target and outcome.31 Relevant to rheumatic disease, Zhao et al evaluated the potential of A disintegrin and metalloproteinase with thrombospondin motifs 5 (ADAMTS5) as a therapeutic target for OA.32 OA is characterized by articular cartilage degradation, in which the major proteoglycan is cleaved by ADAMTS5. The potential of ADAMTS5 as a therapeutic target for OA was motivated by previous studies showing that knockout of ADAMTS5 in mice resulted in less severe cartilage destruction. In their study, Zhao et al used two missense variants in the ADAMTS5 gene to instrument ADAMTS5 function and found genetically proxied reduced ADAMTS5 activity levels to be significantly associated with reduced risk of all OA types.

Interpreting MR results

MR studies require careful scrutiny because causal inferences depend on adherence of the instrument to the core MR assumptions. One should first consider how much of a trait is genetically determined or can be proxied. For example, there may be some genetically determined aspects of dietary preferences or other behaviors, but nongenetic aspects such as social, cultural, and economic factors may predominate. Glycemic traits and other biomarkers can often be better proxied, especially where there is a direct gene–protein relationship as with C-reactive protein. Complications for making casual estimates can also arise when applying MR for a binary exposure that is a dichotomization of a continuous risk factor (eg, hypertension is a dichotomization of blood pressure). See Burgess and Labrecque for a discussion of this issue.33

Results of the standard sensitivity analyses performed should be consistent with the primary causal effects presented. It must be emphasized that, even though sensitivity and supplementary analyses may have been performed to assess the independence and exclusion assumptions, it is not possible to establish that these assumptions have not been violated. That is, failure to detect a violation of these assumptions in sensitivity analysis does not mean that pleiotropy or confounding is not present. Inclusion of supplemental types of sensitivity analyses, when possible, can help inform about the presence of potential biases. In Table 2 and the following section, we present guidelines for reading and interpreting MR studies.

Table 2.

Guide to reading and evaluating an MR study*

Things to consider Description Tools
Hypothesis, motivation, scope of study • Do the researchers discuss the biologic relationship between exposure/outcome?
• Are there previous epidemiologic data to support exposure-disease association?
N/A
Exposure and outcome data source • What type of data is used? Individual or summary level?
• Is the study design 1SMR or 2SMR?
GWAS/MR data repositories, ie, “GWAS catalog” or “MR-Base”
Genetic variant selection • Rationale for selection? Biologic vs statistical?
• If statistical, what is the sample size of the exposure GWAS?
P value threshold, clumping parameters
Relevance assumption • Is the IV strongly associated with the exposure? F statistics
Independence assumption • Could measured (or unmeasured) variables confound the association between the IV and the outcome? Stratification of sample to account for population differences, socioeconomic influences, etc
Exclusion assumption • Does the IV affect the outcome only through the exposure? Sensitivity analysis, ie, MR-Egger, weighted median, MR-PRESSO, mediation analysis
Results and data representation Presentation of the causal estimate IVW (estimate, confidence interval, P value), heterogeneity (Q statistics or I2), scatter plots
Discussion and interpretation • Comparison with observational studies?
• Do authors provide a balanced assessment of their evaluation of the MR assumptions?
• Is the estimate clinically meaningful?
• Are findings compared with relevant mechanistic studies or randomized trial data, and where do these exist?
N/A
*

1SMR, one-sample Mendelian randomization; 2SMR, two-sample Mendelian randomization; IV, instrumental variable; IVW, inverse-variance weighted; GWAS, genome-wide association study; MR, Mendelian randomization; MR-PRESSO, Mendelian Randomization Pleiotropy RESidual Sum and Outlier; N/A, not applicable.

Guide to reading and evaluating an MR study

Hypothesis, motivation, and scope of study.

What is the rationale for the study? Is there a specific hypothesis? Is there previous evidence supporting an association between the exposure and the outcome? Is gene–environment equivalence34 plausible; that is, does it make sense that genetic-predicted levels of the exposure affect the outcome? For example, would a lower genetic predisposition to LDL cholesterol levels result in the same level of CHD risk reduction as a lowering of LDL levels that would be accomplished through diet or statin therapy?

Exposure/outcome.

What is the modifiable exposure and the outcome? Is the outcome continuous or binary? Is there a genetic basis for the (modifiable) exposure?

Study design.

Does the study use a 1SMR or 2SMR design? What populations are the exposure and outcomes measured in? Does the study use individual level data (2SLS regression approach) or summary level data only (IVW approach)?

IVs.

What is the source of the IVs? Are the IVs robustly associated with the exposure (ie, from a GWAS with large sample size)? How are variants selected, biologically or statistically? If statistically, what P value threshold is used? Were only independent, or uncorrelated, variants used? Does the IV have sufficient strength to proxy the exposure (relevance assumption) and how was IV strength evaluated (eg, by F statistics or genome-wide significant P values)?

Causal estimate.

Is the estimate statistically significant? Is a confidence interval and P value presented for the causal estimate? Is the estimate presented as an odds ratio? If the IVW method was used, is a scatter plot of the Wald ratios provided? Are the Wald ratios roughly consistent across all variants in the IV?

Sensitivity analyses.

Are there other relevant exposures (eg, BMI) that the IV could be associated with? If so, how does the study address this? What techniques are used to assess the independence and exclusion assumptions? Do the sensitivity analyses suggest the presence of pleiotropy? Are the median and mode estimators directionally consistent with the primary IV estimate? Do the point estimates differ substantially from each other? Is a scatter plot provided showing agreement of the median estimator and MR-Egger regression estimator with the primary IV estimate? Does the MR-Egger intercept deviate significantly from zero? If pleiotropy is detected, are outliers removed systematically, eg, by MR-PRESSO?

Discussion.

If a causal effect is detected, is it put in biologic context with a discussion of possible mechanisms? Do the authors indicate caution that unaccounted for pleiotropy and bias may still affect the results? Have the authors provided a balanced assessment of their evaluation of the MR assumptions? What are the clinical implications of the results?

Reporting MR results.

STROBE-MR (Strengthening the Reporting of Observational Studies in Epidemiology using Mendelian Randomization) is a thorough checklist of what should be included when conducting or reporting a MR study.35 STROBE-MR includes a 20-point checklist of items that the authors should confirm were addressed in their study. These items pertain to the motivation and scope of the MR analysis, a description of the chosen dataset, justification for the genetic variants used, how the genetic variants satisfy the three core assumptions, sensitivity analysis to test the robustness of the primary findings, a graphical presentation of the data, and a discussion of the results. However, although STROBE-MR is helpful for comprehensive analysis and reporting, it is still up to the reader to interpret whether the decisions the authors made in performing the analyses are consistent with best practices.

Conclusions

MR has become a very useful tool for inferring the causal effects of exposures on health outcomes. The popularity of this method arises from the growing availability of GWAS summary results for both exposures and health outcomes. However, selection of the IV depends on three core assumptions, and the causal estimates of the effect of exposure on outcome will be valid only to the extent that these assumptions are met. In this review, we have provided an overview on how MR works and how the three core assumptions can be evaluated. We have also provided guidelines for readers to evaluate the quality of MR studies.

Supplementary Material

supp 1
supp 2

Acknowledgments

Supported by the NIH (grant P30-AG-028747). Mr Nguyen’s work was supported by the Epidemiology of Aging Training Program (grant T32-AG000262).

Footnotes

Additional supplementary information cited in this article can be found online in the Supporting Information section (https://acrjournals.onlinelibrary.wiley.com/doi/10.1002/acr.25400).

Author disclosures are available at https://onlinelibrary.wiley.com/doi/10.1002/acr.25400.

REFERENCES

  • 1.Lawlor DA, Harbord RM, Sterne JAC, et al. Mendelian randomization: using genes as instruments for making causal inferences in epidemiology. Stat Med 2008;27(8):1133–1163. [DOI] [PubMed] [Google Scholar]
  • 2.Burgess S, Davey Smith G, Davies NM, et al. Guidelines for performing Mendelian randomization investigations: update for summer 2023. Wellcome Open Res 2023;4:186. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Davies NM, Holmes MV, Davey Smith G. Reading Mendelian randomisation studies: a guide, glossary, and checklist for clinicians BMJ 2018;362:k601. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Voight BF, Peloso GM, Orho-Melander M, et al. Plasma HDL cholesterol and risk of myocardial infarction: a mendelian randomisation study. Lancet 2012;380(9841):572–580. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Huang G, Zhong Y, Li W, et al. Causal relationship between parathyroid hormone and the risk of osteoarthritis: a Mendelian randomization study. Front Genet 2021;12:686939. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Martin AR, Kanai M, Kamatani Y, et al. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat Genet 2019;51(4): 584–591. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Bergink AP, Trajanoska K, Uitterlinden AG, et al. Mendelian randomization study on vitamin D levels and osteoarthritis risk: a concise report. Rheumatology (Oxford) 2021;60(7):3409–3412. [DOI] [PubMed] [Google Scholar]
  • 8.Clarke SLN, Mitchell RE, Sharp GC, et al. Vitamin D levels and risk of juvenile idiopathic arthritis: a Mendelian randomization study. Arthritis Care Res (Hoboken) 2023;75(3):674–681. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Mishra A, Malik R, Hachiya T, et al. Stroke genetics informs drug discovery and risk prediction across ancestries. Nature 2022; 611(7934):115–123. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Baum CF, Schaffer ME, Stillman S. Instrumental variables and GMM: estimation and testing. Stata J 2003;3(1):1–31. [Google Scholar]
  • 11.Burgess S, Butterworth A, Thompson SG. Mendelian randomization analysis with multiple genetic variants using summarized data. Genet Epidemiol 2013;37(7):658–665. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Zhu Z, Zheng Z, Zhang F, et al. Causal associations between risk factors and common diseases inferred from GWAS summary data. Nat Commun 2018;9(1):224. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Greco FDM, Minelli C, Sheehan NA, et al. Detecting pleiotropy in Mendelian randomisation studies with summary data and a continuous outcome. Stat Med 2015;34(21):2926–2940. [DOI] [PubMed] [Google Scholar]
  • 14.Burgess S, Thompson SG; CRP CHD Genetics Collaboration. Avoiding bias from weak instruments in Mendelian randomization studies. Int J Epidemiol 2011;40(3):755–764. [DOI] [PubMed] [Google Scholar]
  • 15.Bowden J, Davey Smith G, Haycock PC, et al. Consistent estimation in Mendelian randomization with some invalid instruments using a weighted median estimator. Genet Epidemiol 2016;40(4):304–314. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Bowden J, Davey Smith G, Burgess S. Mendelian randomization with invalid instruments: effect estimation and bias detection through Egger regression. Int J Epidemiol 2015;44(2):512–525. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Verbanck M, Chen CY, Neale B, et al. Detection of widespread horizontal pleiotropy in causal relationships inferred from Mendelian randomization between complex traits and diseases. Nat Genet 2018; 50(5):693–698. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Tan Y, Yao H, Lin C, et al. Investigating the bidirectional association of rheumatoid arthritis and thyroid function: a methodologic assessment of Mendelian randomization. Arthritis Care Res 2024;76(8): 1162–1172. [DOI] [PubMed] [Google Scholar]
  • 19.Funck-Brentano T, Nethander M, Movérare-Skrtic S, et al. Causal factors for knee, hip, and hand osteoarthritis: a mendelian randomization study in the UK biobank. Arthritis Rheumatol 2019; 71(10):1634–1641. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Keenan T, Zhao W, Rasheed A, et al. Causal assessment of serum urate levels in cardiometabolic diseases through a Mendelian randomization study. J Am Coll Cardiol 2016;67(4):407–416. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Kamat MA, Blackshaw JA, Young R, et al. PhenoScanner V2: an expanded tool for searching human genotype-phenotype associations. Bioinformatics 2019;35(22):4851–4853. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Johnsen MB, Vie GÅ, Winsvold BS, et al. The causal role of smoking on the risk of hip or knee replacement due to primary osteoarthritis: a Mendelian randomisation analysis of the HUNT study. Osteoarthritis Cartilage 2017;25(6):817–823. [DOI] [PubMed] [Google Scholar]
  • 23.Shirai Y, Nakayama A, Kawamura Y, et al. Coffee consumption reduces gout risk Independently of serum uric acid levels: mendelian randomization analyses across ancestry populations. ACR Open Rheumatol 2022;4(6):534–539. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Davey Smith G, Hemani G. Mendelian randomization: genetic anchors for causal inference in epidemiological studies. Hum Mol Genet 2014; 23(R1):R89–R98. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Zhang F, Rao S, Baranova A. Shared genetic liability between major depressive disorder and osteoarthritis. Bone Joint Res 2022;11(1):12–22. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Guo HY, Wang W, Peng H, et al. Bidirectional two-sample Mendelian randomization study of causality between rheumatoid arthritis and myocardial infarction. Front Immunol 2022;13:1017444. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Sanderson E. Multivariable Mendelian randomization and mediation. Cold Spring Harb Perspect Med 2021;11(2):a038984. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Burgess S, Thompson SG. Multivariable Mendelian randomization: the use of pleiotropic genetic variants to estimate causal effects. Am J Epidemiol 2015;181(4):251–260. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Gill D, Karhunen V, Malik R, et al. Cardiometabolic traits mediating the effect of education on osteoarthritis risk: a Mendelian randomization study. Osteoarthritis Cartilage 2021;29(3):365–371. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Yuan S, Carter P, Mason AM, et al. Genetic liability to rheumatoid arthritis in relation to coronary artery disease and stroke risk. Arthritis Rheumatol 2022;74(10):1638–1647. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Gill D, Georgakis MK, Walker VM, et al. Mendelian randomization for studying the effects of perturbing drug targets. Wellcome Open Res 2021;6:16. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Zhao SS, Karhunen V, Morris AP, et al. ADAMTS5 as a therapeutic target for osteoarthritis: mendelian randomisation study. Ann Rheum Dis 2022;81(6):903–904. [DOI] [PubMed] [Google Scholar]
  • 33.Burgess S, Labrecque JA. Mendelian randomization with a binary exposure variable: interpretation and presentation of causal estimates. Eur J Epidemiol 2018;33(10):947–952. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Davey Smith G. Epigenesis for epidemiologists: does evo-devo have implications for population health research and practice? Int J Epidemiol 2012;41(1):236–247. [DOI] [PubMed] [Google Scholar]
  • 35.Skrivankova VW, Richmond RC, Woolf BAR, et al. Strengthening the reporting of observational studies in epidemiology using mendelian randomisation (STROBE-MR): explanation and elaboration. BMJ 2021;375(2233):n2233. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

supp 1
supp 2

RESOURCES