. 2021 Oct 26;375:n2233. doi: 10.1136/bmj.n2233

Table 2.

Glossary of commonly used terms in mendelian randomisation (MR)

Term	Explanation
MR	A method that uses genetic variation to strengthen causal inference regarding modifiable exposures (eg, body mass index, alcohol consumption, plasma lipoprotein, time spent in education, C reactive protein level, or serum 25-hydroxyvitamin D) influencing risk of disease or other outcomes. Most MR studies are implemented within an instrumental variable framework, using genetic variants as instrumental variables.
One sample MR	A type of MR study in which one sample of individuals is used to estimate the genetic variant-exposure and genetic variant-outcome associations. This approach requires that the genetic variants, exposures, and outcomes are all measured in the same sample and that individual level data are available on all participants.
Two sample MR	A type of MR study in which the genetic variant-exposure and genetic variant-outcome associations are estimated in different samples and combined using meta-analysis tools. This approach requires summary level statistics of the association of each genetic variant in the two samples. It does not require individual level data.
Bidirectional MR	A type of MR study in which one set of instrumental variables is used to test the effect of the exposure on the outcome and a separate set of instrumental variables is used to test the effect of the outcome on the exposure. This approach allows for a better understanding of the direction of the causal effect.
Instrumental variables	Variables associated with the exposure of interest that are not related to confounders, and that affect the outcome only through the exposure.
Instrumental variable assumptions (core assumptions in MR studies)	Include assumptions for relevance (the genetic variants are associated with the exposure of interest); independence (the genetic variants share no unmeasured cause with the outcome); and exclusion restriction (the genetic variants do not affect the outcome except through their potential effect on the exposure of interest).
Assessment of instrumental variable assumptions	Various tests can assess the plausibility of instrumental variable assumptions (eg, a test of whether potential confounders or pleiotropic mechanisms are associated with the genetic variant; see below for more examples). Only the first assumption (relevance) can be tested conventionally; the validity of the other assumptions cannot be guaranteed. However, tests can provide evidence that they are unlikely to hold (that is, these assumptions cannot be verified, but sometimes can be falsified).
Gene environment equivalence	The notion that differences in an exposure induced by genetic variation will produce the same downstream effects on health outcomes as differences in the exposure produced by environmental influences.
Genetic variant	A variation in the DNA sequence that is found within a population. Typically, a single nucleotide polymorphism (SNP).
Single nucleotide polymorphism (SNP)	A genetic variant in which a single base pair in the DNA varies across the population, at an appreciable frequency. SNPs typically have two alleles (eg, adenine, cytosine, guanine, or thiamine). If the SNP is associated with the trait, then one allele will be associated with a higher value of the trait, the other with a lower value. In MR studies, SNPs are the most common genetic variants used as instrumental variables for a modifiable exposure.
Strand alignment	Strand alignment ensures that the alleles in the exposure GWAS (genome wide association study) and the outcome GWAS are measured on the same DNA strand. An issue could arise when the SNPs are palindromic (that is, guanine/cytosine and adenine/thymine SNPs), which would look the same on both DNA strands. Without ensuring that the exposure and outcome GWAS report the same strand, such SNPs can introduce ambiguity as to whether both the exposure and outcome GWAS are reporting the association with the same effect allele.
Allele score	A single variable produced by combining information from several SNPs that are associated with a trait or phenotype (eg, blood pressure), which can be used to predict the exposure in a MR study. An allele score is sometimes also referred to as genetic risk score, polygenic score, or genetic prediction score.
Linkage disequilibrium	The non-random association of alleles at two or more loci, which normally occurs within a small region of the genome in the general population and is a potential source of bias in MR studies.
r²	A measure of the linkage disequilibrium between two genetic loci to quantify their correlation (value of 1 denotes perfect correlation). This measure should not be confused with the R² value (representing the proportion of variation in the exposure variable explained by the genetic variant), which can be used to calculate instrument strength.
Test of instrument strength	Measure of the association between the genetic variant and the exposure. The strength is typically tested using the partial F statistic or the R².
Test for difference	Assessment of the difference between the multivariable adjusted phenotypic association and MR estimates (eg, Hausman test). These tests indicate whether there is any evidence that the estimates differ, over and above estimation error.
Horizontal pleiotropy	When genetic variants affect the outcome via pathways independent of the exposure. This event is a violation of the exclusion restriction assumption and a source of bias in MR studies.
Weak instrument bias	If genetic variants used as instrumental variables are only weakly associated with the exposure of interest, they are said to be “weak instruments,” and then the MR estimates can be biased. Although a partial F statistic is commonly used as an indicator of potential weak instrument bias (when F is <10 in an analysis of one sample), weak instrument bias can still occur at values greater than 10. This rule of thumb is analogous to the false dichotomisation of P values as either significant or not significant at an arbitrary cut-off value such as P=0.05.²⁰
Collider bias	Bias that can occur when conditioning on a common effect of the genetic variant and another key variable, such as the outcome. This conditioning can either occur statistically (eg, including a covariate that is caused by both the variant and outcome) or through the study sampling (eg, analysing a sample of patients in hospital, where admission is influenced by the variant and other factors).
Winner’s curse	When the discovery estimates of the SNP-exposure associations tend to be over-estimated, which occurs when the statistically strongest associations—usually using a P value threshold—are selected from the discovery sample.
Data	Can refer to either individual level data, such as measurements of participants’ phenotypes such as body mass index and genetic data, or SNP level phenotype association estimates (summary level data).